Martin, I saw only one attachment here. The other may have been stripped by the mailing list which prefers not to have attachments.
I have filed an issue for this at https://issues.apache.org/jira/browse/MAHOUT-286 Can you attach your data files there so that we can work on getting a better resolution for you? On Mon, Feb 8, 2010 at 5:35 AM, Martin Häger <[email protected]>wrote: > Hi Robin, > > The attached data.arff contains the test data, data.training.arff > contains the training data. We're running the svn trunk (r906954) of > Mahout. The attached script run.sh shows how we run it. > Should it be possible to run Mahout's NaiveBayes classifier on this > data in this way or is it limited to text documents only? > > Side note: We're expecting Weka to report 100% incorrect > classification since all test data belongs to the class "unknown", > whereas the training data is either "valid" or "invalid" (in fact, the > test data is the entire "invalid" set, so Weka manages to classify > everything correctly). We're not yet sure what class to put on the > test data, as we of course can't know anything about it (hence the > "unknown"). > > 2010/2/8 Robin Anil <[email protected]>: > > Can you send the train and test data to me. Are you using 0.2 release or > the > > trunk? > > > > Seems model wasnt built as there was an error Exception in thread "main" > > org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: > > file:/tmp/hadoop/model/trainer-termDocCount > > Input path does not exist: file:/tmp/hadoop/model/trainer-wordFreq > > Input path does not exist: file:/tmp/hadoop/model/trainer-featureCount > > > > So there is no point running the classifier > > > > Weka also seems not to be doing good either. > > > > > > > > On Mon, Feb 8, 2010 at 6:24 PM, Martin Häger <[email protected] > >wrote: > > > >> Hi, > >> > >> We're experimenting a bit with Weka and Mahout. Our input data is a > >> relation in ARFF format (see attached data.training.arff), and we'd > >> like to classify it using Mahout. However, it seems (to us, at first) > >> that the Mahout classifier.bayes.interfaces.Algorithm interface is > >> centered around documents of text, and not general attribute data. > >> Thus, running the classifier causes our ARFF data to be interpreted as > >> a document of words, with not very useful results (see attached > >> mahout.log). > >> > >> With Weka, we're able to get the results we want (see attached > weka.log). > >> > >> Any suggestions for how to get this working? > >> > >> Thanks! > >> > > > -- Ted Dunning, CTO DeepDyve
