I was looking into running the Reuters tests with Mahout, just to see how things go, and, being efficient (lazy?) I wanted to avoid having to do all the "pre" work if I could, so I found that Weka has an ARFF formatted set of the files (http://sourceforge.net/projects/weka/ files/) for the ModApte tests. In looking at the format and http://www.cs.waikato.ac.nz/~ml/weka/arff.html it seems that this corresponds fairly well to our vectors (which makes sense) in that the attributes at the top are our Vector labels, and, of course, the data section corresponds to the data.

So, I thought I would see if I could crank out a simple converter. Does anyone else have such a need?

Reply via email to