Re: Classifying general Attribute-Relation data using Mahout

Ted Dunning Tue, 09 Feb 2010 10:41:05 -0800

Martin,

I saw only one attachment here.  The other may have been stripped by the
mailing list which prefers not to have attachments.


I have filed an issue for this at
https://issues.apache.org/jira/browse/MAHOUT-286

Can you attach your data files there so that we can work on getting a better
resolution for you?

On Mon, Feb 8, 2010 at 5:35 AM, Martin Häger <[email protected]>wrote:

> Hi Robin,
>
> The attached data.arff contains the test data, data.training.arff
> contains the training data. We're running the svn trunk (r906954) of
> Mahout. The attached script run.sh shows how we run it.
> Should it be possible to run Mahout's NaiveBayes classifier on this
> data in this way or is it limited to text documents only?
>
> Side note: We're expecting Weka to report 100% incorrect
> classification since all test data belongs to the class "unknown",
> whereas the training data is either "valid" or "invalid" (in fact, the
> test data is the entire "invalid" set, so Weka manages to classify
> everything correctly). We're not yet sure what class to put on the
> test data, as we of course can't know anything about it (hence the
> "unknown").
>
> 2010/2/8 Robin Anil <[email protected]>:
> > Can you send the train and test data to me. Are you using 0.2 release or
> the
> > trunk?
> >
> > Seems model wasnt built as there was an error Exception in thread "main"
> > org.apache.hadoop.mapred.InvalidInputException: Input path does not
> exist:
> > file:/tmp/hadoop/model/trainer-termDocCount
> > Input path does not exist: file:/tmp/hadoop/model/trainer-wordFreq
> > Input path does not exist: file:/tmp/hadoop/model/trainer-featureCount
> >
> > So there is no point running the classifier
> >
> > Weka also seems not to be doing good either.
> >
> >
> >
> > On Mon, Feb 8, 2010 at 6:24 PM, Martin Häger <[email protected]
> >wrote:
> >
> >> Hi,
> >>
> >> We're experimenting a bit with Weka and Mahout. Our input data is a
> >> relation in ARFF format (see attached data.training.arff), and we'd
> >> like to classify it using Mahout. However, it seems (to us, at first)
> >> that the Mahout classifier.bayes.interfaces.Algorithm interface is
> >> centered around documents of text, and not general attribute data.
> >> Thus, running the classifier causes our ARFF data to be interpreted as
> >> a document of words, with not very useful results (see attached
> >> mahout.log).
> >>
> >> With Weka, we're able to get the results we want (see attached
> weka.log).
> >>
> >> Any suggestions for how to get this working?
> >>
> >> Thanks!
> >>
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Classifying general Attribute-Relation data using Mahout

Reply via email to