[ https://issues.apache.org/jira/browse/SYSTEMML-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534023#comment-15534023 ]
Matthias Boehm commented on SYSTEMML-700: ----------------------------------------- thanks for pointing this out [~niketanpansare] - I would recommend to remove this custom code as it breaks our argument of running dml algorithms with consistent semantics through the various APIs. > Inflexible category labels for Multinomial Logistic Regression > -------------------------------------------------------------- > > Key: SYSTEMML-700 > URL: https://issues.apache.org/jira/browse/SYSTEMML-700 > Project: SystemML > Issue Type: Bug > Components: Algorithms > Reporter: Jeremy > Priority: Minor > Original Estimate: 4h > Remaining Estimate: 4h > > The Logistic Regression algorithm requires that category labels be labeled as > 0 up to the number of classes-1. It should be able to handle any set of > category labels provided by the user. B_out should have the appropriate size > regardless of the values of the labels given, and the algorithm should also > preserve the original labeling for the user. > Added detail: > The solution I'm currently using is to transform the labels from whatever > values they are to 0, 1, 2,... before hand, and then transform them back to > their original labels after the algorithm runs. > Currently the algorithm doesn't handle class values that don't start at 0 or > 1, and doesn't handle non-contiguous integers, both of which can come up. For > example, the result for class labels 4,5,6 will return 5 sets of coefficients > (correct number should be 2), and class labels -1, 0, 1 returns just one set > of coefficients (correct number should be 2). > Handling frames with strings would be a really great user experience - that > could look like R's coercion internally. Both glmnet and scikit-learn handle > string label arguments, but both apis are weakly typed as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)