[
https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856612#comment-15856612
]
Joseph K. Bradley edited comment on SPARK-17139 at 2/7/17 7:25 PM:
---
I'll offer a few thoughts first:
* A "ClassificationSummary" could be the same as a
"MulticlassClassificationSummary" because binary is a special type of
multiclass.
* Following the structure of abstractions for Prediction is reasonable.
* Separating binary and multiclass is reasonable; the separation is more
significant for evaluation than for the Prediction abstractions.
* Abstract classes have been a pain in the case of Prediction abstractions, so
I'd prefer we use traits.
The 2 alternatives I see are:
1. BinaryClassificationSummary inherits from ClassificationSummary. No
separate MulticlassClassificationSummary.
2. BinaryClassificationSummary and MulticlassClassificationSummary inherit from
ClassificationSummary.
Both alternatives are semantically reasonable. However, since
ClassificationSummary = MulticlassClassificationSummary in terms of
functionality, and since the Prediction abstractions combine binary and
multiclass, I prefer option 1.
If we go with option 1, then we need 4 concrete classes:
* LogisticRegressionSummary
* LogisticRegressionTrainingSummary
* BinaryLogisticRegressionSummary
* BinaryLogisticRegressionTrainingSummary
We would definitely want binary summaries to inherit from their multiclass
counterparts, and for training summaries to inherit from their general
counterparts:
* LogisticRegressionSummary
* LogisticRegressionTrainingSummary: LogisticRegressionSummary
* BinaryLogisticRegressionSummary: LogisticRegressionSummary
* BinaryLogisticRegressionTrainingSummary: LogisticRegressionTrainingSummary,
BinaryLogisticRegressionSummary
Of course, this is a problem. But we could solve it by having all of these be
traits, with concrete classes inheriting. I.e.,
{{LogisticRegressionModel.summary}} could return {{trait
LogisticRegressionTrainingSummary}}, which could be of concrete type
{{LogisticRegressionTrainingSummaryImpl}} (multiclass) or
{{BinaryLogisticRegressionTrainingSummaryImpl}} (binary).
I suspect MiMa will complain about this, but IIRC it's safe since all of these
summaries have private constructors and can't be extended outside of Spark.
Btw, we could introduce a set of abstractions matching the Prediction ones, but
that should probably happen under a separate JIRA.
What do you think?
was (Author: josephkb):
I'll offer a few thoughts first:
* A "ClassificationSummary" could be the same as a
"MulticlassClassificationSummary" because binary is a special type of
multiclass.
* Following the structure of abstractions for Prediction is reasonable.
* Separating binary and multiclass is reasonable; the separation is more
significant for evaluation than for the Prediction abstractions.
* Abstract classes have been a pain in the case of Prediction abstractions, so
I'd prefer we use traits.
The 2 alternatives I see are:
1. BinaryClassificationSummary inherits from ClassificationSummary. No
separate MulticlassClassificationSummary.
2. BinaryClassificationSummary and MulticlassClassificationSummary inherit from
ClassificationSummary.
Both alternatives are semantically reasonable. However, since
ClassificationSummary = MulticlassClassificationSummary in terms of
functionality, and since the Prediction abstractions combine binary and
multiclass, I prefer option 1.
If we go with option 1, then we need 4 concrete classes:
* LogisticRegressionSummary
* LogisticRegressionTrainingSummary
* BinaryLogisticRegressionSummary
* BinaryLogisticRegressionTrainingSummary
We would definitely want binary summaries to inherit from their multiclass
counterparts, and for training summaries to inherit from their general
counterparts:
* LogisticRegressionSummary
* LogisticRegressionTrainingSummary: LogisticRegressionSummary
* BinaryLogisticRegressionSummary: LogisticRegressionSummary
* BinaryLogisticRegressionTrainingSummary: LogisticRegressionTrainingSummary,
BinaryLogisticRegressionSummary
Of course, this is a problem. But we could solve it by having all of these be
traits, with concrete classes inheriting. I.e.,
{{LogisticRegressionModel.summary}} could return {{trait
LogisticRegressionTrainingSummary}}, which could be of concrete type
{{LogisticRegressionTrainingSummaryImpl}} (multiclass) or
{{BinaryLogisticRegressionTrainingSummaryImpl}} (binary).
I suspect MiMa will complain about this, but IIRC it's safe since all of these
summaries have private constructors and can't be extended outside of Spark.
What do you think?
> Add model summary for MultinomialLogisticRegression
> ---
>
> Key: SPARK-17139
> URL: https://issues.apache.org/jira/browse/SPARK-17139
>