[
https://issues.apache.org/jira/browse/MAHOUT-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655029#comment-13655029
]
Ted Dunning commented on MAHOUT-1179:
-------------------------------------
{quote}
Input format for trainers: SequenceFile<IntWritable, VectorWritable> (just as
in clustering) plus and optional metadata text file (to mark numerical,
categorical and ignored columns, just as the ones used in Random Forests).
{quote}
Just a peripheral note on this.
There should be an easy option to say that all or a lot of the fields in the
input are numerical. This is valuable for hashed input.
> GSOC 2013: Refactor and improve the classification APIs
> -------------------------------------------------------
>
> Key: MAHOUT-1179
> URL: https://issues.apache.org/jira/browse/MAHOUT-1179
> Project: Mahout
> Issue Type: New Feature
> Reporter: Dan Filimon
> Labels: gsoc2013, mentor
>
> [via Andy Twigg]
> Improve and unify the Mahout classification API. Also related to the
> refactoring of the clustering APIs MAHOUT-1177.
> The two APIs should be roughly the same, at least in
> terms of input/output so that pipelining etc is easier. (cf
> scikit-learn clustering/classifier/regression API)
> Currently Mahout support:
> - logistic regression
> - Naive Bayes
> - Random Forests
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira