I take it back ... partially. There are some cases where there are a few special fields (categories and such) and then a bunch of encoded data. It would be nice (not necessary) to be able to say fields 5-900 are numerical.
On Mon, May 13, 2013 at 8:00 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > That should be fine. > > > On Mon, May 13, 2013 at 12:51 AM, Angel Martinez Gonzalez (JIRA) < > j...@apache.org> wrote: > >> >> [ >> https://issues.apache.org/jira/browse/MAHOUT-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655834#comment-13655834] >> >> Angel Martinez Gonzalez commented on MAHOUT-1179: >> ------------------------------------------------- >> >> Ted, thanks for your comment. >> If the metadata file is optional, all fields will be treated as numerical >> when it is not provided. Would that be enough or do you have something else >> in mind? >> >> > GSOC 2013: Refactor and improve the classification APIs >> > ------------------------------------------------------- >> > >> > Key: MAHOUT-1179 >> > URL: https://issues.apache.org/jira/browse/MAHOUT-1179 >> > Project: Mahout >> > Issue Type: New Feature >> > Reporter: Dan Filimon >> > Labels: gsoc2013, mentor >> > >> > [via Andy Twigg] >> > Improve and unify the Mahout classification API. Also related to the >> refactoring of the clustering APIs MAHOUT-1177. >> > The two APIs should be roughly the same, at least in >> > terms of input/output so that pipelining etc is easier. (cf >> > scikit-learn clustering/classifier/regression API) >> > Currently Mahout support: >> > - logistic regression >> > - Naive Bayes >> > - Random Forests >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> > >