[jira] [Comment Edited] (SPARK-7159) Support multiclass logistic regression in spark.ml

Seth Hendrickson (JIRA) Fri, 20 May 2016 15:18:41 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294372#comment-15294372
 ]


Seth Hendrickson edited comment on SPARK-7159 at 5/20/16 10:17 PM:
-------------------------------------------------------------------

[~dbtsai][~josephkb] I'd like to take this one if it's still open. I have an 
implementation that is functional except for some corner cases, and can have a 
PR submitted before too long. 

One part of the design that needs to be discussed (as far as I can tell), is 
how to handle passing the coefficients/intercepts to the model without breaking 
the API. If we were not concerned about the API compatibility, I'd say the best 
way would be to make the intercept a {{Vector}} and the coefficients a 
{{Vector}} (flattened) or a {{Matrix}}. I can't think of a way that would be 
both easy to use and not break the API. With that in mind, another option may 
be to stick with the same convention used in MLlib where the 
intercept/coefficients follow the obvious convention for binary logistic 
regression, but in the case of multinomial the intercept is always zero 
(meaningless), and the coefficients are a flattened {{Vector}} with the 
intercepts baked in. This is not a user-friendly solution IMO, but it would not 
break the API. Perhaps this has already been discussed? 

Thanks for your input!


was (Author: sethah):
[~dbtsai][~josephkb] I'd like to take this one if it's still open. I have an 
implementation that is functional except for some corner cases, and can have a 
PR submitted before too long. 

One part of the design that needs to be discussed (as far as I can tell), is 
how to handle passing the coefficients/intercepts to the model without breaking 
the API. If we were not concerned about the API compatibility, I'd say the best 
way would be to make the intercept an {{Vector}} and the coefficients a 
{{Vector}} (flattened) or a {{Matrix}}. I can't think of a way that would be 
both easy to use and not break the API. With that in mind, another option may 
be to stick with the same convention used in MLlib where the 
intercept/coefficients follow the obvious convention for binary logistic 
regression, but in the case of multinomial the intercept is always zero 
(meaningless), and the coefficients are a flattened {{Vector}} with the 
intercepts baked in. This is not a user-friendly solution IMO, but it would not 
break the API. Perhaps this has already been discussed? 

Thanks for your input!

> Support multiclass logistic regression in spark.ml
> --------------------------------------------------
>
>                 Key: SPARK-7159
>                 URL: https://issues.apache.org/jira/browse/SPARK-7159
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: DB Tsai
>            Priority: Critical
>
> This should be implemented by checking the input DataFrame's label column for 
> feature metadata specifying the number of classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-7159) Support multiclass logistic regression in spark.ml

Reply via email to