I have some sample code for doing relevance feedback across multiple
documents at http://www.cnlp.org/apachecon2005
It could be modified to provide more of the MoreLikeThis
functionality (i.e. determining important terms via tf/idf) for now
it just takes the top X terms
-Grant
On Jul 25, 2007, at 3:04 PM, Jens Grivolla wrote:
Hello,
I'm looking to extract significant terms characterizing a set of
documents (which in turn relate to a topic).
This basically comes down to functionality similar to determining
the terms with the greatest offer weight (as used for blind
relevance feedback), or maximizing tf.idf (as is done in
MoreLikeThis).
Is there anything like this already implemented, or do I need to
iterate through all documents in the set "manually", re-tokenize
each one (or maybe use TermVectors), and then calculate the weight
for each term?
Thanks,
Jens
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]