I went ahead and attached everything I sent to Robin to MAHOUT-286.
2010/2/9 Robin Anil <[email protected]>: > I have the data. I will upload shortly > > > On Wed, Feb 10, 2010 at 12:10 AM, Ted Dunning <[email protected]> wrote: > >> Martin, >> >> I saw only one attachment here. The other may have been stripped by the >> mailing list which prefers not to have attachments. >> >> I have filed an issue for this at >> https://issues.apache.org/jira/browse/MAHOUT-286 >> >> Can you attach your data files there so that we can work on getting a >> better >> resolution for you? >> >> On Mon, Feb 8, 2010 at 5:35 AM, Martin Häger <[email protected] >> >wrote: >> >> > Hi Robin, >> > >> > The attached data.arff contains the test data, data.training.arff >> > contains the training data. We're running the svn trunk (r906954) of >> > Mahout. The attached script run.sh shows how we run it. >> > Should it be possible to run Mahout's NaiveBayes classifier on this >> > data in this way or is it limited to text documents only? >> > >> > Side note: We're expecting Weka to report 100% incorrect >> > classification since all test data belongs to the class "unknown", >> > whereas the training data is either "valid" or "invalid" (in fact, the >> > test data is the entire "invalid" set, so Weka manages to classify >> > everything correctly). We're not yet sure what class to put on the >> > test data, as we of course can't know anything about it (hence the >> > "unknown"). >> > >> > 2010/2/8 Robin Anil <[email protected]>: >> > > Can you send the train and test data to me. Are you using 0.2 release >> or >> > the >> > > trunk? >> > > >> > > Seems model wasnt built as there was an error Exception in thread >> "main" >> > > org.apache.hadoop.mapred.InvalidInputException: Input path does not >> > exist: >> > > file:/tmp/hadoop/model/trainer-termDocCount >> > > Input path does not exist: file:/tmp/hadoop/model/trainer-wordFreq >> > > Input path does not exist: file:/tmp/hadoop/model/trainer-featureCount >> > > >> > > So there is no point running the classifier >> > > >> > > Weka also seems not to be doing good either. >> > > >> > > >> > > >> > > On Mon, Feb 8, 2010 at 6:24 PM, Martin Häger <[email protected] >> > >wrote: >> > > >> > >> Hi, >> > >> >> > >> We're experimenting a bit with Weka and Mahout. Our input data is a >> > >> relation in ARFF format (see attached data.training.arff), and we'd >> > >> like to classify it using Mahout. However, it seems (to us, at first) >> > >> that the Mahout classifier.bayes.interfaces.Algorithm interface is >> > >> centered around documents of text, and not general attribute data. >> > >> Thus, running the classifier causes our ARFF data to be interpreted as >> > >> a document of words, with not very useful results (see attached >> > >> mahout.log). >> > >> >> > >> With Weka, we're able to get the results we want (see attached >> > weka.log). >> > >> >> > >> Any suggestions for how to get this working? >> > >> >> > >> Thanks! >> > >> >> > > >> > >> >> >> >> -- >> Ted Dunning, CTO >> DeepDyve >> >
