Hello,
I'm looking to extract significant terms characterizing a set of
documents (which in turn relate to a topic).
This basically comes down to functionality similar to determining the
terms with the greatest offer weight (as used for blind relevance
feedback), or maximizing tf.idf (as is done in MoreLikeThis).
Is there anything like this already implemented, or do I need to iterate
through all documents in the set "manually", re-tokenize each one (or
maybe use TermVectors), and then calculate the weight for each term?
Thanks,
Jens
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]