On Fri, 25 Sep 2009 17:43:38 +0530 Shashikant Kore <[email protected]> wrote:
> If that's not the case, how are the phrases identified in the first > place? Is it based on shingles? I am curious to know ways to get all > the meaningful phrases from the text other than POS tagging. I guess the point is that analysis should not stop after POS tagging and TFIDF computation. There may be use cases where users prefer phrases that contain named entities, that have a particular length, that contain terms seen at an increasing rate in the recent past, that were found in the title of the document vs. body... So I think, POS tags and TFIDF should be features determining whether a phrase should be considered as key phrase or not - maybe even key indicators to generate a key phrase candidate set. But there may be many more features. Lastly it might be easier to come up with a training set of good and bad phrases (plus their feature vectors) and let a classifier do the selection compared to manually hand coding the rules and feature weights for phrase selection. Isabel
