http://www.lucidimagination.com/search/document/3ae15062f35420cf/lda_for_multi_label_classification_was_mahout_book

<http://www.lucidimagination.com/search/document/3ae15062f35420cf/lda_for_multi_label_classification_was_mahout_book>David
gave me a very nice paper which talks about tag-document correlation. If you
start with named labels, it does end up being naive bayes classifier.


On Mon, Jan 11, 2010 at 2:23 AM, Grant Ingersoll <[email protected]>wrote:

> A couple of things strike me about LDA, and I wanted to hear others
> thoughts:
>
> 1. The LDA implementation (and seems to be reinforced by my reading on
> topic models in general) is that the topic themselves don't have "names".  I
> can see why this is difficult (in some ways, your summarizing a summary),
> but am curious whether anyone has done any work on such a thing as w/o them
> it still requires a fair amount by the human to infer what the topics are.
>  I suppose you could just pick the top few terms, but seems like a common
> phrase or something would go further.  Also, I believe someone in the past
> mentioned some more recent work by Blei and Lafferty (Blei and Lafferty.
> Visualizing Topics with Multi-Word Expressions. stat (2009) vol. 1050 pp. 6)
> to alleviate that.
>
> 2. We get the words in the topic, but how do we know which documents have
> those topics?  I think, based on reading the paper, that the answer is "You
> don't get to know", but I'm not sure.
>
If I am correct, You do get to know based on the words in the document
 which of those un-labelled topics are in the documents with an affinity
score to eacj. You can sort it or do some form of testing to filter out the
ones with significance.


>
> -Grant

Reply via email to