[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20332 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164654897 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/LogisticRegressionTrainingSummary.html) provides a summary for a [`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html). -Currently, only binary classification is supported and the -summary must be explicitly cast to -[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html). -Support for multiclass model summaries will be added in the future. +In the case of binary classification, certain additional metrics are --- End diff -- I'm ambivalent - I think it is fairly clear through the phrasing "additional metrics are available...", and in the API doc link provided. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164531329 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/LogisticRegressionTrainingSummary.html) provides a summary for a [`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html). -Currently, only binary classification is supported and the -summary must be explicitly cast to -[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html). -Support for multiclass model summaries will be added in the future. +In the case of binary classification, certain additional metrics are --- End diff -- Oh no. Just add a sentence to make it more clear, like: "In the case of binary classification, `BinaryLogisticRegressionTrainingSummary` inherits all metrics in `LogisticRegressionSummary`, and certain additional metrics are added ..." Just a minor suggestion -:) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164479596 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +123,8 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary) provides a summary for a [`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel). -Currently, only binary classification is supported. Support for multiclass model summaries will be added in the future. +In the case of binary classification, certain additional metrics are --- End diff -- Ah right! Missed that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164476639 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +123,8 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary) provides a summary for a [`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel). -Currently, only binary classification is supported. Support for multiclass model summaries will be added in the future. +In the case of binary classification, certain additional metrics are --- End diff -- There isn't a `binarySummary` method for python --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164387272 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +123,8 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary) provides a summary for a [`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel). -Currently, only binary classification is supported. Support for multiclass model summaries will be added in the future. +In the case of binary classification, certain additional metrics are --- End diff -- Missing " The binary summary can be accessed via the ..." sentence in this one --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164384660 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/LogisticRegressionTrainingSummary.html) provides a summary for a [`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html). -Currently, only binary classification is supported and the -summary must be explicitly cast to -[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html). -Support for multiclass model summaries will be added in the future. +In the case of binary classification, certain additional metrics are --- End diff -- What do you mean exactly? Do you propose to list the metrics in the user guide? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164237753 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/LogisticRegressionTrainingSummary.html) provides a summary for a [`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html). -Currently, only binary classification is supported and the -summary must be explicitly cast to -[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html). -Support for multiclass model summaries will be added in the future. +In the case of binary classification, certain additional metrics are --- End diff -- Now `BinaryLogisticRegressionTrainingSummary` inherits `LogisticRegressionSummary` so that inherits all metrics in `LogisticRegressionSummary`. We'd better mark them in doc. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164151869 --- Diff: docs/ml-classification-regression.md --- @@ -97,10 +97,6 @@ only available on the driver. [`LogisticRegressionTrainingSummary`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionTrainingSummary) provides a summary for a [`LogisticRegressionModel`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionModel). -Currently, only binary classification is supported and the --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164151796 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +117,6 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary) provides a summary for a [`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel). --- End diff -- Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164151687 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala --- @@ -49,6 +49,48 @@ object MulticlassLogisticRegressionWithElasticNetExample { // Print the coefficients and intercept for multinomial logistic regression println(s"Coefficients: \n${lrModel.coefficientMatrix}") println(s"Intercepts: \n${lrModel.interceptVector}") + +val trainingSummary = lrModel.summary + +val objectiveHistory = trainingSummary.objectiveHistory --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164151731 --- Diff: examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py --- @@ -43,6 +43,43 @@ # Print the coefficients and intercept for multinomial logistic regression print("Coefficients: \n" + str(lrModel.coefficientMatrix)) print("Intercept: " + str(lrModel.interceptVector)) + +trainingSummary = lrModel.summary + +# Obtain the objective per iteration +objectiveHistory = trainingSummary.objectiveHistory +print("objectiveHistory:") +for objective in objectiveHistory: +print(objective) + +print("False positive rate by label:") --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873193 --- Diff: examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py --- @@ -43,6 +43,43 @@ # Print the coefficients and intercept for multinomial logistic regression print("Coefficients: \n" + str(lrModel.coefficientMatrix)) print("Intercept: " + str(lrModel.interceptVector)) + +trainingSummary = lrModel.summary + +# Obtain the objective per iteration +objectiveHistory = trainingSummary.objectiveHistory +print("objectiveHistory:") +for objective in objectiveHistory: +print(objective) + +print("False positive rate by label:") --- End diff -- Do we want to have a consistent comment as per the Java version above?: `// for multiclass, we can inspect metrics on a per-label basis` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873036 --- Diff: docs/ml-classification-regression.md --- @@ -97,10 +97,6 @@ only available on the driver. [`LogisticRegressionTrainingSummary`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionTrainingSummary) provides a summary for a [`LogisticRegressionModel`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionModel). -Currently, only binary classification is supported and the --- End diff -- Should we add a note reflecting the difference between the summary and binary summary? Perhaps indicating the usage of `binarySummary` or `asBinary` method? I know it's done in the example but perhaps a short line about that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162872261 --- Diff: docs/ml-classification-regression.md --- @@ -125,7 +117,6 @@ Continuing the earlier example: [`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary) provides a summary for a [`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel). --- End diff -- Shall we just add a short line to the `Example` section of MLoR: "The following example shows how to train a multiclass logistic regression model with elastic net regularization, as well as extract the multiclass training summary." or something like that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r162873388 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala --- @@ -49,6 +49,48 @@ object MulticlassLogisticRegressionWithElasticNetExample { // Print the coefficients and intercept for multinomial logistic regression println(s"Coefficients: \n${lrModel.coefficientMatrix}") println(s"Intercepts: \n${lrModel.interceptVector}") + +val trainingSummary = lrModel.summary + +val objectiveHistory = trainingSummary.objectiveHistory --- End diff -- ditto here for the comment to be consistent with Java / Python versions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org