A couple of things strike me about LDA, and I wanted to hear others thoughts:
1. The LDA implementation (and seems to be reinforced by my reading on topic models in general) is that the topic themselves don't have "names". I can see why this is difficult (in some ways, your summarizing a summary), but am curious whether anyone has done any work on such a thing as w/o them it still requires a fair amount by the human to infer what the topics are. I suppose you could just pick the top few terms, but seems like a common phrase or something would go further. Also, I believe someone in the past mentioned some more recent work by Blei and Lafferty (Blei and Lafferty. Visualizing Topics with Multi-Word Expressions. stat (2009) vol. 1050 pp. 6) to alleviate that. 2. We get the words in the topic, but how do we know which documents have those topics? I think, based on reading the paper, that the answer is "You don't get to know", but I'm not sure. -Grant
