I have some sample code for doing relevance feedback across multiple documents at http://www.cnlp.org/apachecon2005

It could be modified to provide more of the MoreLikeThis functionality (i.e. determining important terms via tf/idf) for now it just takes the top X terms

-Grant

On Jul 25, 2007, at 3:04 PM, Jens Grivolla wrote:

Hello,

I'm looking to extract significant terms characterizing a set of documents (which in turn relate to a topic).

This basically comes down to functionality similar to determining the terms with the greatest offer weight (as used for blind relevance feedback), or maximizing tf.idf (as is done in MoreLikeThis).

Is there anything like this already implemented, or do I need to iterate through all documents in the set "manually", re-tokenize each one (or maybe use TermVectors), and then calculate the weight for each term?

Thanks,
   Jens

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to