[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856612#comment-15856612 ]
Joseph K. Bradley edited comment on SPARK-17139 at 2/7/17 7:25 PM: ------------------------------------------------------------------- I'll offer a few thoughts first: * A "ClassificationSummary" could be the same as a "MulticlassClassificationSummary" because binary is a special type of multiclass. * Following the structure of abstractions for Prediction is reasonable. * Separating binary and multiclass is reasonable; the separation is more significant for evaluation than for the Prediction abstractions. * Abstract classes have been a pain in the case of Prediction abstractions, so I'd prefer we use traits. The 2 alternatives I see are: 1. BinaryClassificationSummary inherits from ClassificationSummary. No separate MulticlassClassificationSummary. 2. BinaryClassificationSummary and MulticlassClassificationSummary inherit from ClassificationSummary. Both alternatives are semantically reasonable. However, since ClassificationSummary = MulticlassClassificationSummary in terms of functionality, and since the Prediction abstractions combine binary and multiclass, I prefer option 1. If we go with option 1, then we need 4 concrete classes: * LogisticRegressionSummary * LogisticRegressionTrainingSummary * BinaryLogisticRegressionSummary * BinaryLogisticRegressionTrainingSummary We would definitely want binary summaries to inherit from their multiclass counterparts, and for training summaries to inherit from their general counterparts: * LogisticRegressionSummary * LogisticRegressionTrainingSummary: LogisticRegressionSummary * BinaryLogisticRegressionSummary: LogisticRegressionSummary * BinaryLogisticRegressionTrainingSummary: LogisticRegressionTrainingSummary, BinaryLogisticRegressionSummary Of course, this is a problem. But we could solve it by having all of these be traits, with concrete classes inheriting. I.e., {{LogisticRegressionModel.summary}} could return {{trait LogisticRegressionTrainingSummary}}, which could be of concrete type {{LogisticRegressionTrainingSummaryImpl}} (multiclass) or {{BinaryLogisticRegressionTrainingSummaryImpl}} (binary). I suspect MiMa will complain about this, but IIRC it's safe since all of these summaries have private constructors and can't be extended outside of Spark. Btw, we could introduce a set of abstractions matching the Prediction ones, but that should probably happen under a separate JIRA. What do you think? was (Author: josephkb): I'll offer a few thoughts first: * A "ClassificationSummary" could be the same as a "MulticlassClassificationSummary" because binary is a special type of multiclass. * Following the structure of abstractions for Prediction is reasonable. * Separating binary and multiclass is reasonable; the separation is more significant for evaluation than for the Prediction abstractions. * Abstract classes have been a pain in the case of Prediction abstractions, so I'd prefer we use traits. The 2 alternatives I see are: 1. BinaryClassificationSummary inherits from ClassificationSummary. No separate MulticlassClassificationSummary. 2. BinaryClassificationSummary and MulticlassClassificationSummary inherit from ClassificationSummary. Both alternatives are semantically reasonable. However, since ClassificationSummary = MulticlassClassificationSummary in terms of functionality, and since the Prediction abstractions combine binary and multiclass, I prefer option 1. If we go with option 1, then we need 4 concrete classes: * LogisticRegressionSummary * LogisticRegressionTrainingSummary * BinaryLogisticRegressionSummary * BinaryLogisticRegressionTrainingSummary We would definitely want binary summaries to inherit from their multiclass counterparts, and for training summaries to inherit from their general counterparts: * LogisticRegressionSummary * LogisticRegressionTrainingSummary: LogisticRegressionSummary * BinaryLogisticRegressionSummary: LogisticRegressionSummary * BinaryLogisticRegressionTrainingSummary: LogisticRegressionTrainingSummary, BinaryLogisticRegressionSummary Of course, this is a problem. But we could solve it by having all of these be traits, with concrete classes inheriting. I.e., {{LogisticRegressionModel.summary}} could return {{trait LogisticRegressionTrainingSummary}}, which could be of concrete type {{LogisticRegressionTrainingSummaryImpl}} (multiclass) or {{BinaryLogisticRegressionTrainingSummaryImpl}} (binary). I suspect MiMa will complain about this, but IIRC it's safe since all of these summaries have private constructors and can't be extended outside of Spark. What do you think? > Add model summary for MultinomialLogisticRegression > --------------------------------------------------- > > Key: SPARK-17139 > URL: https://issues.apache.org/jira/browse/SPARK-17139 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Seth Hendrickson > > Add model summary to multinomial logistic regression using same interface as > in other ML models. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org