Hi Robin,

The attached data.arff contains the test data, data.training.arff
contains the training data. We're running the svn trunk (r906954) of
Mahout. The attached script run.sh shows how we run it.
Should it be possible to run Mahout's NaiveBayes classifier on this
data in this way or is it limited to text documents only?

Side note: We're expecting Weka to report 100% incorrect
classification since all test data belongs to the class "unknown",
whereas the training data is either "valid" or "invalid" (in fact, the
test data is the entire "invalid" set, so Weka manages to classify
everything correctly). We're not yet sure what class to put on the
test data, as we of course can't know anything about it (hence the
"unknown").

2010/2/8 Robin Anil <[email protected]>:
> Can you send the train and test data to me. Are you using 0.2 release or the
> trunk?
>
> Seems model wasnt built as there was an error Exception in thread "main"
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
> file:/tmp/hadoop/model/trainer-termDocCount
> Input path does not exist: file:/tmp/hadoop/model/trainer-wordFreq
> Input path does not exist: file:/tmp/hadoop/model/trainer-featureCount
>
> So there is no point running the classifier
>
> Weka also seems not to be doing good either.
>
>
>
> On Mon, Feb 8, 2010 at 6:24 PM, Martin Häger <[email protected]>wrote:
>
>> Hi,
>>
>> We're experimenting a bit with Weka and Mahout. Our input data is a
>> relation in ARFF format (see attached data.training.arff), and we'd
>> like to classify it using Mahout. However, it seems (to us, at first)
>> that the Mahout classifier.bayes.interfaces.Algorithm interface is
>> centered around documents of text, and not general attribute data.
>> Thus, running the classifier causes our ARFF data to be interpreted as
>> a document of words, with not very useful results (see attached
>> mahout.log).
>>
>> With Weka, we're able to get the results we want (see attached weka.log).
>>
>> Any suggestions for how to get this working?
>>
>> Thanks!
>>
>

Attachment: run.sh
Description: Bourne shell script

Reply via email to