first thanks for taking the time!
that is correct - i'm not trying to do POT (part of speech tagging). actually i know that it must work also with mahout : kea uses a trained classifier (from weka) and tfXidf (= term frequenzy - inverse document frequenzy) to identify keyphrase candidates. it is then possible to check these candidates against a controlled vocabulary (i.e. skos thesaurus). anyway thanks! ;) i'm smelling an opportunity to get famous! wkr www.turnguard.com ----- Original Message ---- From: Isabel Drost <[email protected]> To: [email protected] Sent: Friday, September 25, 2009 9:37:36 AM Subject: Re: newbie intro On Fri, 25 Sep 2009 10:04:10 +0530 Shashikant Kore <[email protected]> wrote: > On Wed, Sep 23, 2009 at 8:18 PM, Ted Dunning <[email protected]> > wrote: > > One of the clustering algorithms has a patch that should have some > > at-least-ok key phrase extraction. Shashi was digging into that. > Extracting phrases (noun/verb) could be done with OpenNLP, Gate, > LingPipe, and many other similar tools. I think the phrases extracted by OpenNLP are different from what kea does in that kea sort of tries to find phrases that best represent the topic of the text. Some sort of automatic tagging of texts with topics. Isabel
