[ 
https://issues.apache.org/jira/browse/SYSTEMML-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547196#comment-15547196
 ] 

Frederick Reiss commented on SYSTEMML-700:
------------------------------------------

I think the main point here is that our scikit-learn-like APIs must provide an 
experience that is as close to the corresponding scikit-learn APIs as possible. 
In the case of mlogreg and classifiers, that means accepting the same types of 
class labels that scikit-learn accepts, including strings and non-contiguous 
ranges of integers. If there is disagreement on this point, please speak up!

I see two possible approaches to meeting the above requirement.
a) Have the Python parts of our scikit-learn-like classifier APIs recode 
strings and non-contiguous category labels prior to invoking their internal DML 
scripts.
b) Build dedicated DML scripts that perform this recoding internally, and wrap 
these scripts instead of the existing canned scripts in the algorithms folders.

I would be fine with either approach. I see Niketan's suggestion in the 
previous comment as a special case of option (a).

> Inflexible category labels for Multinomial Logistic Regression
> --------------------------------------------------------------
>
>                 Key: SYSTEMML-700
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-700
>             Project: SystemML
>          Issue Type: Bug
>          Components: Algorithms
>            Reporter: Jeremy
>            Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The Logistic Regression algorithm requires that category labels be labeled as 
> 0 up to the number of classes-1. It should be able to handle any set of 
> category labels provided by the user. B_out should have the appropriate size 
> regardless of the values of the labels given, and the algorithm should also 
> preserve the original labeling for the user.
> Added detail:
> The solution I'm currently using is to transform the labels from whatever 
> values they are to 0, 1, 2,... before hand, and then transform them back to 
> their original labels after the algorithm runs.
> Currently the algorithm doesn't handle class values that don't start at 0 or 
> 1, and doesn't handle non-contiguous integers, both of which can come up. For 
> example, the result for class labels 4,5,6 will return 5 sets of coefficients 
> (correct number should be 2), and class labels -1, 0, 1 returns just one set 
> of coefficients (correct number should be 2).
> Handling frames with strings would be a really great user experience - that 
> could look like R's coercion internally. Both glmnet and scikit-learn handle 
> string label arguments, but both apis are weakly typed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to