Thank you but I still have have no clue of how to do that by using Weka after taking a look at its API. Let me reformulate my problem :
I have a collection of vector of terms (actually each vector of terms represents the list of tokens extracted from a file) and I do not have the original files. I would like to calculate TF as well as TFIDF of each term and sorted them by these value respectively. As suggested by Grant Ingersoll, I could index those vectors of terms again using Lucene and then use its API to measure TF and TFIDF. However I guess there should be a simpler way or API just fit-in this case. Thanks once again everyone. Best regards, Sengly On 3/28/07, karl wettin <[EMAIL PROTECTED]> wrote:
28 mar 2007 kl. 10.36 skrev Sengly Heng: > Does anyone of you know any Java API that directly handle this > problem? > or I have to implement from scratch. You can also try weka.filters.unsupervised.attribute.StringToWordVector, it has many neat features you might be interested in. And if applicable to what you attempt to do, the feature selection algorithms of the same project (Weka) does a great job reducing the data set. http://www.cs.waikato.ac.nz/ml/weka/ It is GPL. -- karl --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]