Please lay out a plan before coding.  The key questions will be

a) can you serialize a model efficiently?

b) can you deal with the random forest and SGD models?

c) what are the real changes to the API needed?




On Thu, May 16, 2013 at 10:51 AM, Angel Martinez Gonzalez (JIRA) <
j...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659764#comment-13659764]
>
> Angel Martinez Gonzalez commented on MAHOUT-1179:
> -------------------------------------------------
>
> Hi again,
> With the goal of modifying all classifiers to use the formats proposed
> above, I've started to work with Naive Bayes. In particular, I've moved the
> code related to evaluation (summary statistics, confusion matrix) that was
> executed at the end of TestNaiveBayesDriver to a separate
> ClassifierEvaluationJob. The benefit of this is that
> ClassifierEvaluationJob should be able in the future to take input from any
> classifier tester.
> The current state of the work may be reviewed here:
> https://github.com/amartgon/mahout/commit/519ae529e9932d1e1d0803d0731a7396daaa603b
>
> There are still modifications to be made on Naive Bayes, such as:
> -Modifying document id format from Text to IntWritable.
> -Moving the "label index" out of TrainNaiveBayesJob.
> Should I create a JIRA issue and submit this part? Or go on with the work
> at least till everything related to Naive Bayes is complete? I'd like to
> have some feedback before going on, to have an idea of whether there is
> agreement/interest in this before investing a lot of time into possibly
> useless work.
>
>
> > GSOC 2013: Refactor and improve the classification APIs
> > -------------------------------------------------------
> >
> >                 Key: MAHOUT-1179
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1179
> >             Project: Mahout
> >          Issue Type: New Feature
> >            Reporter: Dan Filimon
> >              Labels: gsoc2013, mentor
> >
> > [via Andy Twigg]
> > Improve and unify the Mahout classification API. Also related to the
> refactoring of the clustering APIs MAHOUT-1177.
> > The two APIs should be roughly the same, at least in
> > terms of input/output so that pipelining etc is easier. (cf
> > scikit-learn clustering/classifier/regression API)
> > Currently Mahout support:
> > - logistic regression
> > - Naive Bayes
> > - Random Forests
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>

Reply via email to