Repository: spark
Updated Branches:
  refs/heads/master 6a05eb24d -> 9df54f532


[SPARK-17239][ML][DOC] Update user guide for multiclass logistic regression

## What changes were proposed in this pull request?
Updates user guide to reflect that LogisticRegression now supports multiclass. 
Also adds new examples to show multiclass training.

## How was this patch tested?
Ran locally using spark-submit, run-example, and copy/paste from user guide 
into shells. Generated docs and verified correct output.

Author: sethah <seth.hendrickso...@gmail.com>

Closes #15349 from sethah/SPARK-17239.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9df54f53
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9df54f53
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9df54f53

Branch: refs/heads/master
Commit: 9df54f5325c2942bb77008ff1810e2fb5f6d848b
Parents: 6a05eb2
Author: sethah <seth.hendrickso...@gmail.com>
Authored: Wed Oct 5 18:28:21 2016 +0000
Committer: DB Tsai <dbt...@dbtsai.com>
Committed: Wed Oct 5 18:28:21 2016 +0000

----------------------------------------------------------------------
 docs/ml-classification-regression.md            | 65 +++++++++++++++++---
 ...LogisticRegressionWithElasticNetExample.java | 14 +++++
 ...LogisticRegressionWithElasticNetExample.java | 55 +++++++++++++++++
 .../ml/logistic_regression_with_elastic_net.py  | 10 +++
 ...lass_logistic_regression_with_elastic_net.py | 48 +++++++++++++++
 ...ogisticRegressionWithElasticNetExample.scala | 13 ++++
 ...ogisticRegressionWithElasticNetExample.scala | 57 +++++++++++++++++
 7 files changed, 255 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/9df54f53/docs/ml-classification-regression.md
----------------------------------------------------------------------
diff --git a/docs/ml-classification-regression.md 
b/docs/ml-classification-regression.md
index 7c2437e..bb2e404 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -34,17 +34,22 @@ discussing specific classes of algorithms, such as linear 
methods, trees, and en
 
 ## Logistic regression
 
-Logistic regression is a popular method to predict a binary response. It is a 
special case of [Generalized Linear 
models](https://en.wikipedia.org/wiki/Generalized_linear_model) that predicts 
the probability of the outcome.
-For more background and more details about the implementation, refer to the 
documentation of the [logistic regression in 
`spark.mllib`](mllib-linear-methods.html#logistic-regression). 
+Logistic regression is a popular method to predict a categorical response. It 
is a special case of [Generalized Linear 
models](https://en.wikipedia.org/wiki/Generalized_linear_model) that predicts 
the probability of the outcomes.
+In `spark.ml` logistic regression can be used to predict a binary outcome by 
using binomial logistic regression, or it can be used to predict a multiclass 
outcome by using multinomial logistic regression. Use the `family`
+parameter to select between these two algorithms, or leave it unset and Spark 
will infer the correct variant.
 
-  > The current implementation of logistic regression in `spark.ml` only 
supports binary classes. Support for multiclass regression will be added in the 
future.
+  > Multinomial logistic regression can be used for binary classification by 
setting the `family` param to "multinomial". It will produce two sets of 
coefficients and two intercepts.
 
   > When fitting LogisticRegressionModel without intercept on dataset with 
constant nonzero column, Spark MLlib outputs zero coefficients for constant 
nonzero columns. This behavior is the same as R glmnet but different from 
LIBSVM.
 
+### Binomial logistic regression
+
+For more background and more details about the implementation of binomial 
logistic regression, refer to the documentation of [logistic regression in 
`spark.mllib`](mllib-linear-methods.html#logistic-regression). 
+
 **Example**
 
-The following example shows how to train a logistic regression model
-with elastic net regularization. `elasticNetParam` corresponds to
+The following example shows how to train binomial and multinomial logistic 
regression 
+models for binary classification with elastic net regularization. 
`elasticNetParam` corresponds to
 $\alpha$ and `regParam` corresponds to $\lambda$.
 
 <div class="codetabs">
@@ -92,8 +97,8 @@ provides a summary for a
 
[`LogisticRegressionModel`](api/java/org/apache/spark/ml/classification/LogisticRegressionModel.html).
 Currently, only binary classification is supported and the
 summary must be explicitly cast to
-[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html).
-This will likely change when multiclass classification is supported.
+[`BinaryLogisticRegressionTrainingSummary`](api/java/org/apache/spark/ml/classification/BinaryLogisticRegressionTrainingSummary.html).
 
+Support for multiclass model summaries will be added in the future.
 
 Continuing the earlier example:
 
@@ -107,6 +112,52 @@ Logistic regression model summary is not yet supported in 
Python.
 
 </div>
 
+### Multinomial logistic regression
+
+Multiclass classification is supported via multinomial logistic (softmax) 
regression. In multinomial logistic regression,
+the algorithm produces $K$ sets of coefficients, or a matrix of dimension $K 
\times J$ where $K$ is the number of outcome
+classes and $J$ is the number of features. If the algorithm is fit with an 
intercept term then a length $K$ vector of
+intercepts is available.
+
+  > Multinomial coefficients are available as `coefficientMatrix` and 
intercepts are available as `interceptVector`.
+ 
+  > `coefficients` and `intercept` methods on a logistic regression model 
trained with multinomial family are not supported. Use `coefficientMatrix` and 
`interceptVector` instead.
+
+The conditional probabilities of the outcome classes $k \in \{1, 2, ..., K\}$ 
are modeled using the softmax function.
+
+`\[
+   P(Y=k|\mathbf{X}, \boldsymbol{\beta}_k, \beta_{0k}) =  
\frac{e^{\boldsymbol{\beta}_k \cdot \mathbf{X}  + 
\beta_{0k}}}{\sum_{k'=0}^{K-1} e^{\boldsymbol{\beta}_{k'} \cdot \mathbf{X}  + 
\beta_{0k'}}}
+\]`
+
+We minimize the weighted negative log-likelihood, using a multinomial response 
model, with elastic-net penalty to control for overfitting.
+
+`\[
+\min_{\beta, \beta_0} -\left[\sum_{i=1}^L w_i \cdot \log P(Y = 
y_i|\mathbf{x}_i)\right] + \lambda \left[\frac{1}{2}\left(1 - 
\alpha\right)||\boldsymbol{\beta}||_2^2 + \alpha ||\boldsymbol{\beta}||_1\right]
+\]`
+
+For a detailed derivation please see 
[here](https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_log-linear_model).
+
+**Example**
+
+The following example shows how to train a multiclass logistic regression 
+model with elastic net regularization.
+
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+{% include_example 
scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala
 %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example 
java/org/apache/spark/examples/ml/JavaMulticlassLogisticRegressionWithElasticNetExample.java
 %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example 
python/ml/multiclass_logistic_regression_with_elastic_net.py %}
+</div>
+
+</div>
+
 
 ## Decision tree classifier
 

http://git-wip-us.apache.org/repos/asf/spark/blob/9df54f53/examples/src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java
----------------------------------------------------------------------
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java
index 6101c79..b8fb597 100644
--- 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java
@@ -48,6 +48,20 @@ public class JavaLogisticRegressionWithElasticNetExample {
     // Print the coefficients and intercept for logistic regression
     System.out.println("Coefficients: "
       + lrModel.coefficients() + " Intercept: " + lrModel.intercept());
+
+    // We can also use the multinomial family for binary classification
+    LogisticRegression mlr = new LogisticRegression()
+            .setMaxIter(10)
+            .setRegParam(0.3)
+            .setElasticNetParam(0.8)
+            .setFamily("multinomial");
+
+    // Fit the model
+    LogisticRegressionModel mlrModel = mlr.fit(training);
+
+    // Print the coefficients and intercepts for logistic regression with 
multinomial family
+    System.out.println("Multinomial coefficients: "
+            + lrModel.coefficientMatrix() + "\nMultinomial intercepts: " + 
mlrModel.interceptVector());
     // $example off$
 
     spark.stop();

http://git-wip-us.apache.org/repos/asf/spark/blob/9df54f53/examples/src/main/java/org/apache/spark/examples/ml/JavaMulticlassLogisticRegressionWithElasticNetExample.java
----------------------------------------------------------------------
diff --git 
a/examples/src/main/java/org/apache/spark/examples/ml/JavaMulticlassLogisticRegressionWithElasticNetExample.java
 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaMulticlassLogisticRegressionWithElasticNetExample.java
new file mode 100644
index 0000000..da410cb
--- /dev/null
+++ 
b/examples/src/main/java/org/apache/spark/examples/ml/JavaMulticlassLogisticRegressionWithElasticNetExample.java
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+// $example on$
+import org.apache.spark.ml.classification.LogisticRegression;
+import org.apache.spark.ml.classification.LogisticRegressionModel;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SparkSession;
+// $example off$
+
+public class JavaMulticlassLogisticRegressionWithElasticNetExample {
+    public static void main(String[] args) {
+        SparkSession spark = SparkSession
+                .builder()
+                
.appName("JavaMulticlassLogisticRegressionWithElasticNetExample")
+                .getOrCreate();
+
+        // $example on$
+        // Load training data
+        Dataset<Row> training = spark.read().format("libsvm")
+                .load("data/mllib/sample_multiclass_classification_data.txt");
+
+        LogisticRegression lr = new LogisticRegression()
+                .setMaxIter(10)
+                .setRegParam(0.3)
+                .setElasticNetParam(0.8);
+
+        // Fit the model
+        LogisticRegressionModel lrModel = lr.fit(training);
+
+        // Print the coefficients and intercept for multinomial logistic 
regression
+        System.out.println("Coefficients: \n"
+                + lrModel.coefficientMatrix() + " \nIntercept: " + 
lrModel.interceptVector());
+        // $example off$
+
+        spark.stop();
+    }
+}

http://git-wip-us.apache.org/repos/asf/spark/blob/9df54f53/examples/src/main/python/ml/logistic_regression_with_elastic_net.py
----------------------------------------------------------------------
diff --git 
a/examples/src/main/python/ml/logistic_regression_with_elastic_net.py 
b/examples/src/main/python/ml/logistic_regression_with_elastic_net.py
index 33d0689..d095fbd 100644
--- a/examples/src/main/python/ml/logistic_regression_with_elastic_net.py
+++ b/examples/src/main/python/ml/logistic_regression_with_elastic_net.py
@@ -40,6 +40,16 @@ if __name__ == "__main__":
     # Print the coefficients and intercept for logistic regression
     print("Coefficients: " + str(lrModel.coefficients))
     print("Intercept: " + str(lrModel.intercept))
+
+    # We can also use the multinomial family for binary classification
+    mlr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8, 
family="multinomial")
+
+    # Fit the model
+    mlrModel = mlr.fit(training)
+
+    # Print the coefficients and intercepts for logistic regression with 
multinomial family
+    print("Multinomial coefficients: " + str(mlrModel.coefficientMatrix))
+    print("Multinomial intercepts: " + str(mlrModel.interceptVector))
     # $example off$
 
     spark.stop()

http://git-wip-us.apache.org/repos/asf/spark/blob/9df54f53/examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py
----------------------------------------------------------------------
diff --git 
a/examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py
 
b/examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py
new file mode 100644
index 0000000..bb9cd82
--- /dev/null
+++ 
b/examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py
@@ -0,0 +1,48 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import print_function
+
+# $example on$
+from pyspark.ml.classification import LogisticRegression
+# $example off$
+from pyspark.sql import SparkSession
+
+if __name__ == "__main__":
+    spark = SparkSession \
+        .builder \
+        .appName("MulticlassLogisticRegressionWithElasticNet") \
+        .getOrCreate()
+
+    # $example on$
+    # Load training data
+    training = spark \
+        .read \
+        .format("libsvm") \
+        .load("data/mllib/sample_multiclass_classification_data.txt")
+
+    lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
+
+    # Fit the model
+    lrModel = lr.fit(training)
+
+    # Print the coefficients and intercept for multinomial logistic regression
+    print("Coefficients: \n" + str(lrModel.coefficientMatrix))
+    print("Intercept: " + str(lrModel.interceptVector))
+    # $example off$
+
+    spark.stop()

http://git-wip-us.apache.org/repos/asf/spark/blob/9df54f53/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionWithElasticNetExample.scala
----------------------------------------------------------------------
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionWithElasticNetExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionWithElasticNetExample.scala
index 616263b..1847104 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionWithElasticNetExample.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionWithElasticNetExample.scala
@@ -45,6 +45,19 @@ object LogisticRegressionWithElasticNetExample {
 
     // Print the coefficients and intercept for logistic regression
     println(s"Coefficients: ${lrModel.coefficients} Intercept: 
${lrModel.intercept}")
+
+    // We can also use the multinomial family for binary classification
+    val mlr = new LogisticRegression()
+      .setMaxIter(10)
+      .setRegParam(0.3)
+      .setElasticNetParam(0.8)
+      .setFamily("multinomial")
+
+    val mlrModel = mlr.fit(training)
+
+    // Print the coefficients and intercepts for logistic regression with 
multinomial family
+    println(s"Multinomial coefficients: ${mlrModel.coefficientMatrix}")
+    println(s"Multinomial intercepts: ${mlrModel.interceptVector}")
     // $example off$
 
     spark.stop()

http://git-wip-us.apache.org/repos/asf/spark/blob/9df54f53/examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala
----------------------------------------------------------------------
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala
 
b/examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala
new file mode 100644
index 0000000..42f0ace
--- /dev/null
+++ 
b/examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.classification.LogisticRegression
+// $example off$
+import org.apache.spark.sql.SparkSession
+
+object MulticlassLogisticRegressionWithElasticNetExample {
+
+  def main(args: Array[String]): Unit = {
+    val spark = SparkSession
+      .builder
+      .appName("MulticlassLogisticRegressionWithElasticNetExample")
+      .getOrCreate()
+
+    // $example on$
+    // Load training data
+    val training = spark
+      .read
+      .format("libsvm")
+      .load("data/mllib/sample_multiclass_classification_data.txt")
+
+    val lr = new LogisticRegression()
+      .setMaxIter(10)
+      .setRegParam(0.3)
+      .setElasticNetParam(0.8)
+
+    // Fit the model
+    val lrModel = lr.fit(training)
+
+    // Print the coefficients and intercept for multinomial logistic regression
+    println(s"Coefficients: \n${lrModel.coefficientMatrix}")
+    println(s"Intercepts: ${lrModel.interceptVector}")
+    // $example off$
+
+    spark.stop()
+  }
+}
+// scalastyle:on println


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to