[ https://issues.apache.org/jira/browse/SPARK-17163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434412#comment-15434412 ]
Yanbo Liang edited comment on SPARK-17163 at 8/24/16 7:52 AM: -------------------------------------------------------------- I think it's hard to unify binary and multinomial logistic regression if we do not make any breaking change. * Like [~sethah] said, we need to find a way to unify the representation of {{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is still compromise, the best representation should be matrix for {{coefficients}} and vector for {{intercept}} even it's a binary classification problem. This will more or less consistent with other ML models such as {{NaiveBayesModel}} which is also support multi-class classification. But this will introduce big breaking change. * MLOR and LOR return different result for binary classification when regularization is used. * Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for binary logistic regression and they have some interactions. If we make MLOR and LOR share the old LOR code base, it will also introduce breaking change for these APIs. * Model store/load compatibility. I'm more prefer to keep LOR and MLOR for different APIs, but not very strongly hold my opinion if you have better proposal. Thanks! was (Author: yanboliang): I think it's hard to unify binary and multinomial logistic regression if we do not make any breaking change. * Like [~sethah] said, we need to find a way to unify the representation of {{coefficients}} and {{intercept}}. I think flatten the matrix into a vector is still compromise, the best representation should be matrix for {{coefficients}} and vector for {{intercept}} even it's a binary classification problem. This will more or less consistent with other ML models such as {{NaiveBayesModel}} which is also support multi-class classification. But this will introduce big breaking change. * MLOR and LOR return different result for binary classification when regularization is used. * Current LOR code base provide both {{setThreshold}} and {{setThresholds}} for binary logistic regression and they have some interactions. If we make MLOR and LOR share the old LOR code base, it will also introduce breaking change for these APIs. * Model store/load compatibility. I'm more prefer to keep LOR and MLOR in different APIs, but not very strongly hold my opinion if you have better proposal. Thanks! > Decide on unified multinomial and binary logistic regression interfaces > ----------------------------------------------------------------------- > > Key: SPARK-17163 > URL: https://issues.apache.org/jira/browse/SPARK-17163 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib > Reporter: Seth Hendrickson > > Before the 2.1 release, we should finalize the API for logistic regression. > After SPARK-7159, we have both LogisticRegression and > MultinomialLogisticRegression models. This may be confusing to users and, is > a bit superfluous since MLOR can do basically all of what BLOR does. We > should decide if it needs to be changed and implement those changes before 2.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org