Oops. The ARFF Driver writes only vectors not the tab separated format the Bayes Classifier reads. I will try to add that as a flag
@Grant: For batch classification,yes we can go with vectors, But I dont see how we can classify documents on the fly if the dictionary cant fit in the memory. Maybe, randomizers can help. We will have to wait for that. @Ted. Waiting to pounce upon the randomizers :) Robin On Tue, Feb 9, 2010 at 9:08 PM, Grant Ingersoll <[email protected]> wrote: > > On Feb 8, 2010, at 7:54 AM, Martin Häger wrote: > > > Hi, > > > > We're experimenting a bit with Weka and Mahout. Our input data is a > > relation in ARFF format (see attached data.training.arff), and we'd > > like to classify it using Mahout. However, it seems (to us, at first) > > that the Mahout classifier.bayes.interfaces.Algorithm interface is > > centered around documents of text, and not general attribute data. > > Thus, running the classifier causes our ARFF data to be interpreted as > > a document of words, with not very useful results (see attached > > mahout.log). > > I think we still need to get our Bayes stuff to run off of Vectors instead > of text, then it should be easy to go from ARFF to Vector format and then > run all of the Mahout tools. > > -Grant
