[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434412#comment-15434412 ]
Yanbo Liang edited comment on SPARK-17163 at 8/24/16 7:54 AM: -------------------------------------------------------------- I think it's hard to unify binary and multinomial logistic regression if we do not make any breaking change. * Like [~sethah] said, we need to find a way to unify the representation of {{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is still compromise, the best representation should be matrix for {{coefficients}} and vector for {{intercept}} even it's a binary classification problem. This will more or less consistent with other ML models such as {{NaiveBayesModel}} which is also support multi-class classification. But this will introduce big breaking change. * MLOR and LOR return different result for binary classification when regularization is used. * Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for binary logistic regression and they have some interactions. If we make MLOR and LOR share the old LOR code base, it will also introduce breaking change for these APIs. FYI: SPARK-11834 and SPARK-11543. * Model store/load compatibility. I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly hold my opinion if you have better proposal. Thanks! was (Author: yanboliang): I think it's hard to unify binary and multinomial logistic regression if we do not make any breaking change. * Like [~sethah] said, we need to find a way to unify the representation of {{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is still compromise, the best representation should be matrix for {{coefficients}} and vector for {{intercept}} even it's a binary classification problem. This will more or less consistent with other ML models such as {{NaiveBayesModel}} which is also support multi-class classification. But this will introduce big breaking change. * MLOR and LOR return different result for binary classification when regularization is used. * Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for binary logistic regression and they have some interactions. If we make MLOR and LOR share the old LOR code base, it will also introduce breaking change for these APIs. * Model store/load compatibility. I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly hold my opinion if you have better proposal. Thanks! > Decide on unified multinomial and binary logistic regression interfaces > ----------------------------------------------------------------------- > > Key: SPARK-17163 > URL: https://issues.apache.org/jira/browse/SPARK-17163 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib > Reporter: Seth Hendrickson > > Before the 2.1 release, we should finalize the API for logistic regression. > After SPARK-7159, we have both LogisticRegression and > MultinomialLogisticRegression models. This may be confusing to users and, is > a bit superfluous since MLOR can do basically all of what BLOR does. We > should decide if it needs to be changed and implement those changes before 2.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org