[ https://issues.apache.org/jira/browse/MAHOUT-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831912#action_12831912 ]
Robin Anil commented on MAHOUT-286: ----------------------------------- I will have to move this to 0.4. Bayes classifier only supports binary features(word exists or not). It definitely needs to be able to support numeric features. That will happen only after converting the classifier to SparseVector format. I could give a patch which extracts only the binary features for this release. > Need to be able to run classifiers from non-text input (such as ARFF data) > -------------------------------------------------------------------------- > > Key: MAHOUT-286 > URL: https://issues.apache.org/jira/browse/MAHOUT-286 > Project: Mahout > Issue Type: Bug > Reporter: Ted Dunning > Attachments: data.arff, data.training.arff, mahout.log, run.sh, > weka.log > > > Martin Haeger wrote this: > {quote} > We're experimenting a bit with Weka and Mahout. Our input data is a > relation in ARFF format (see attached data.training.arff), and we'd > like to classify it using Mahout. However, it seems (to us, at first) > that the Mahout classifier.bayes.interfaces.Algorithm interface is > centered around documents of text, and not general attribute data. > Thus, running the classifier causes our ARFF data to be interpreted as > a document of words, with not very useful results (see attached > mahout.log). > With Weka, we're able to get the results we want (see attached weka.log). > Any suggestions for how to get this working? > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.