On Fri, Sep 25, 2009 at 1:30 PM, jakobitsch juergen <[email protected]> wrote: > first thanks for taking the time! > > that is correct - i'm not trying to do POT (part of speech tagging). > > actually i know that it must work also with mahout : kea uses > a trained classifier (from weka) and tfXidf (= term frequenzy - inverse > document frequenzy) > to identify keyphrase candidates. it is then possible to check these > candidates > against a controlled vocabulary (i.e. skos thesaurus). >
I might be off from what you are looking, but after identifying the (noun/verb) phrases from the text with a POS tagger, you could run the TF-IDF analysis. If that's not the case, how are the phrases identified in the first place? Is it based on shingles? I am curious to know ways to get all the meaningful phrases from the text other than POS tagging. --shashi > > anyway thanks! > > ;) i'm smelling an opportunity to get famous! > > wkr www.turnguard.com > > > > > ----- Original Message ---- > From: Isabel Drost <[email protected]> > To: [email protected] > Sent: Friday, September 25, 2009 9:37:36 AM > Subject: Re: newbie intro > > On Fri, 25 Sep 2009 10:04:10 +0530 > Shashikant Kore <[email protected]> wrote: > >> On Wed, Sep 23, 2009 at 8:18 PM, Ted Dunning <[email protected]> >> wrote: >> > One of the clustering algorithms has a patch that should have some >> > at-least-ok key phrase extraction. Shashi was digging into that. > >> Extracting phrases (noun/verb) could be done with OpenNLP, Gate, >> LingPipe, and many other similar tools. > > I think the phrases extracted by OpenNLP are different from what kea > does in that kea sort of tries to find phrases that best represent the > topic of the text. Some sort of automatic tagging of texts with topics. > > Isabel > > > > >
