This may help in the margins, but it is surprising how good simpler methods work.
tf-idf is, btw, an approximation of the LLR score. There some interesting edge conditions where the approximation breaks, notably when there are several occurrences in the text of interest. On Fri, Sep 25, 2009 at 5:37 AM, Isabel Drost <[email protected]> wrote: > So I think, POS tags and TFIDF should be features determining whether > a phrase should be considered as key phrase or not - maybe even key > indicators to generate a key phrase candidate set. But there may be many > more features. Lastly it might be easier to come up with a > training set of good and bad phrases (plus their feature vectors) and > let a classifier do the selection compared to manually hand coding the > rules and feature weights for phrase selection. > -- Ted Dunning, CTO DeepDyve
