Thank you but I still have have no clue of how to do that by using Weka
after taking a look at its API. Let me reformulate my problem :

I have a collection of vector of terms (actually each vector of terms
represents the list of tokens extracted from a file) and I do not have the
original files. I would like to calculate TF as well as TFIDF of each term
and sorted them by these value respectively. As suggested by Grant
Ingersoll, I could index those vectors of terms again using Lucene and then
use its API to measure TF and TFIDF. However I guess there should be a
simpler way or API just fit-in this case.

Thanks once again everyone.

Best regards,

Sengly


On 3/28/07, karl wettin <[EMAIL PROTECTED]> wrote:


28 mar 2007 kl. 10.36 skrev Sengly Heng:

> Does anyone of you know any Java API that directly handle this
> problem?
> or I have to implement from scratch.

You can also try
weka.filters.unsupervised.attribute.StringToWordVector, it has many
neat features you might be interested in. And if applicable to what
you attempt to do, the feature selection algorithms of the same
project (Weka) does a great job reducing the data set.

http://www.cs.waikato.ac.nz/ml/weka/

It is GPL.

--
karl


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to