[ 
https://issues.apache.org/jira/browse/MAHOUT-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655029#comment-13655029
 ] 

Ted Dunning commented on MAHOUT-1179:
-------------------------------------

{quote}
Input format for trainers: SequenceFile<IntWritable, VectorWritable> (just as 
in clustering) plus and optional metadata text file (to mark numerical, 
categorical and ignored columns, just as the ones used in Random Forests).
{quote}

Just a peripheral note on this.

There should be an easy option to say that all or a lot of the fields in the 
input are numerical.  This is valuable for hashed input.


                
> GSOC 2013: Refactor and improve the classification APIs
> -------------------------------------------------------
>
>                 Key: MAHOUT-1179
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1179
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Dan Filimon
>              Labels: gsoc2013, mentor
>
> [via Andy Twigg]
> Improve and unify the Mahout classification API. Also related to the 
> refactoring of the clustering APIs MAHOUT-1177.
> The two APIs should be roughly the same, at least in
> terms of input/output so that pipelining etc is easier. (cf
> scikit-learn clustering/classifier/regression API)
> Currently Mahout support:
> - logistic regression
> - Naive Bayes
> - Random Forests

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to