[ 
https://issues.apache.org/jira/browse/MAHOUT-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659764#comment-13659764
 ] 

Angel Martinez Gonzalez commented on MAHOUT-1179:
-------------------------------------------------

Hi again,
With the goal of modifying all classifiers to use the formats proposed above, 
I've started to work with Naive Bayes. In particular, I've moved the code 
related to evaluation (summary statistics, confusion matrix) that was executed 
at the end of TestNaiveBayesDriver to a separate ClassifierEvaluationJob. The 
benefit of this is that ClassifierEvaluationJob should be able in the future to 
take input from any classifier tester.
The current state of the work may be reviewed here: 
https://github.com/amartgon/mahout/commit/519ae529e9932d1e1d0803d0731a7396daaa603b

There are still modifications to be made on Naive Bayes, such as:
-Modifying document id format from Text to IntWritable.
-Moving the "label index" out of TrainNaiveBayesJob.
Should I create a JIRA issue and submit this part? Or go on with the work at 
least till everything related to Naive Bayes is complete? I'd like to have some 
feedback before going on, to have an idea of whether there is 
agreement/interest in this before investing a lot of time into possibly useless 
work.

                
> GSOC 2013: Refactor and improve the classification APIs
> -------------------------------------------------------
>
>                 Key: MAHOUT-1179
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1179
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Dan Filimon
>              Labels: gsoc2013, mentor
>
> [via Andy Twigg]
> Improve and unify the Mahout classification API. Also related to the 
> refactoring of the clustering APIs MAHOUT-1177.
> The two APIs should be roughly the same, at least in
> terms of input/output so that pipelining etc is easier. (cf
> scikit-learn clustering/classifier/regression API)
> Currently Mahout support:
> - logistic regression
> - Naive Bayes
> - Random Forests

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to