[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20332


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164654897
  
--- Diff: docs/ml-classification-regression.md ---
@@ -111,10 +110,9 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/LogisticRegressionTrainingSummary.html)
 provides a summary for a
 
[`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html).
-Currently, only binary classification is supported and the
-summary must be explicitly cast to

-[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html).
 
-Support for multiclass model summaries will be added in the future.
+In the case of binary classification, certain additional metrics are
--- End diff --

I'm ambivalent - I think it is fairly clear through the phrasing 
"additional metrics are available...", and in the API doc link provided.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164531329
  
--- Diff: docs/ml-classification-regression.md ---
@@ -111,10 +110,9 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/LogisticRegressionTrainingSummary.html)
 provides a summary for a
 
[`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html).
-Currently, only binary classification is supported and the
-summary must be explicitly cast to

-[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html).
 
-Support for multiclass model summaries will be added in the future.
+In the case of binary classification, certain additional metrics are
--- End diff --

Oh no. Just add a sentence to make it more clear, like:
"In the case of binary classification, 
`BinaryLogisticRegressionTrainingSummary` inherits all metrics in 
`LogisticRegressionSummary`, and certain additional metrics are added ..."
Just a minor suggestion -:)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164479596
  
--- Diff: docs/ml-classification-regression.md ---
@@ -125,7 +123,8 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel).
-Currently, only binary classification is supported. Support for multiclass 
model summaries will be added in the future.
+In the case of binary classification, certain additional metrics are
--- End diff --

Ah right! Missed that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164476639
  
--- Diff: docs/ml-classification-regression.md ---
@@ -125,7 +123,8 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel).
-Currently, only binary classification is supported. Support for multiclass 
model summaries will be added in the future.
+In the case of binary classification, certain additional metrics are
--- End diff --

There isn't a `binarySummary` method for python


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164387272
  
--- Diff: docs/ml-classification-regression.md ---
@@ -125,7 +123,8 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel).
-Currently, only binary classification is supported. Support for multiclass 
model summaries will be added in the future.
+In the case of binary classification, certain additional metrics are
--- End diff --

Missing " The binary summary can be accessed via the ..." sentence in this 
one


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-29 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164384660
  
--- Diff: docs/ml-classification-regression.md ---
@@ -111,10 +110,9 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/LogisticRegressionTrainingSummary.html)
 provides a summary for a
 
[`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html).
-Currently, only binary classification is supported and the
-summary must be explicitly cast to

-[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html).
 
-Support for multiclass model summaries will be added in the future.
+In the case of binary classification, certain additional metrics are
--- End diff --

What do you mean exactly? Do you propose to list the metrics in the user 
guide?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164237753
  
--- Diff: docs/ml-classification-regression.md ---
@@ -111,10 +110,9 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/LogisticRegressionTrainingSummary.html)
 provides a summary for a
 
[`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html).
-Currently, only binary classification is supported and the
-summary must be explicitly cast to

-[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html).
 
-Support for multiclass model summaries will be added in the future.
+In the case of binary classification, certain additional metrics are
--- End diff --

Now `BinaryLogisticRegressionTrainingSummary` inherits 
`LogisticRegressionSummary` so that inherits all metrics in 
`LogisticRegressionSummary`.  We'd better mark them in doc. :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-26 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164151869
  
--- Diff: docs/ml-classification-regression.md ---
@@ -97,10 +97,6 @@ only available on the driver.
 
[`LogisticRegressionTrainingSummary`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionTrainingSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionModel).
-Currently, only binary classification is supported and the
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-26 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164151796
  
--- Diff: docs/ml-classification-regression.md ---
@@ -125,7 +117,6 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel).
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-26 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164151687
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala
 ---
@@ -49,6 +49,48 @@ object MulticlassLogisticRegressionWithElasticNetExample 
{
 // Print the coefficients and intercept for multinomial logistic 
regression
 println(s"Coefficients: \n${lrModel.coefficientMatrix}")
 println(s"Intercepts: \n${lrModel.interceptVector}")
+
+val trainingSummary = lrModel.summary
+
+val objectiveHistory = trainingSummary.objectiveHistory
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-26 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r164151731
  
--- Diff: 
examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py 
---
@@ -43,6 +43,43 @@
 # Print the coefficients and intercept for multinomial logistic 
regression
 print("Coefficients: \n" + str(lrModel.coefficientMatrix))
 print("Intercept: " + str(lrModel.interceptVector))
+
+trainingSummary = lrModel.summary
+
+# Obtain the objective per iteration
+objectiveHistory = trainingSummary.objectiveHistory
+print("objectiveHistory:")
+for objective in objectiveHistory:
+print(objective)
+
+print("False positive rate by label:")
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r162873193
  
--- Diff: 
examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py 
---
@@ -43,6 +43,43 @@
 # Print the coefficients and intercept for multinomial logistic 
regression
 print("Coefficients: \n" + str(lrModel.coefficientMatrix))
 print("Intercept: " + str(lrModel.interceptVector))
+
+trainingSummary = lrModel.summary
+
+# Obtain the objective per iteration
+objectiveHistory = trainingSummary.objectiveHistory
+print("objectiveHistory:")
+for objective in objectiveHistory:
+print(objective)
+
+print("False positive rate by label:")
--- End diff --

Do we want to have a consistent comment as per the Java version above?: `// 
for multiclass, we can inspect metrics on a per-label basis` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r162873036
  
--- Diff: docs/ml-classification-regression.md ---
@@ -97,10 +97,6 @@ only available on the driver.
 
[`LogisticRegressionTrainingSummary`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionTrainingSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegressionModel).
-Currently, only binary classification is supported and the
--- End diff --

Should we add a note reflecting the difference between the summary and 
binary summary? Perhaps indicating the usage of `binarySummary` or `asBinary` 
method?

I know it's done in the example but perhaps a short line about that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r162872261
  
--- Diff: docs/ml-classification-regression.md ---
@@ -125,7 +117,6 @@ Continuing the earlier example:
 
[`LogisticRegressionTrainingSummary`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionSummary)
 provides a summary for a
 
[`LogisticRegressionModel`](api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegressionModel).
--- End diff --

Shall we just add a short line to the `Example` section of MLoR:

"The following example shows how to train a multiclass logistic regression 
model with elastic net regularization, as well as extract the multiclass 
training summary." 

or something like that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-22 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20332#discussion_r162873388
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala
 ---
@@ -49,6 +49,48 @@ object MulticlassLogisticRegressionWithElasticNetExample 
{
 // Print the coefficients and intercept for multinomial logistic 
regression
 println(s"Coefficients: \n${lrModel.coefficientMatrix}")
 println(s"Intercepts: \n${lrModel.interceptVector}")
+
+val trainingSummary = lrModel.summary
+
+val objectiveHistory = trainingSummary.objectiveHistory
--- End diff --

ditto here for the comment to be consistent with Java / Python versions


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org