Re: newbie intro

Isabel Drost Fri, 25 Sep 2009 05:35:18 -0700

On Fri, 25 Sep 2009 17:43:38 +0530
Shashikant Kore <[email protected]> wrote:


> If that's not the case, how are the phrases identified in the first
> place? Is it based on shingles?  I am curious to know  ways to get all
> the meaningful phrases from the text other than POS tagging.

I guess the point is that analysis should not stop after POS tagging
and TFIDF computation. There may be use cases where users prefer phrases
that contain named entities, that have a particular length, that
contain terms seen at an increasing rate in the recent past, that were
found in the title of the document vs. body...

So I think, POS tags and TFIDF should be features determining whether
a phrase should be considered as key phrase or not - maybe even key
indicators to generate a key phrase candidate set. But there may be many
more features. Lastly it might be easier to come up with a
training set of good and bad phrases (plus their feature vectors) and
let a classifier do the selection compared to manually hand coding the
rules and feature weights for phrase selection.

Isabel

Re: newbie intro

Reply via email to