Github user feynmanliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8197#discussion_r37255388
  
    --- Diff: docs/ml-guide.md ---
    @@ -801,6 +801,153 @@ jsc.stop();
     
     </div>
     
    +## Examples: Summaries for LogisticRegression.
    +
    +Once 
[`LogisticRegression`](api/scala/index.html#org.apache.spark.ml.classification.LogisticRegression)
    +is run on data, it is useful to extract statistics such as the
    +loss per iteration which will provide an intuition on overfitting and 
metrics to understand
    +how well the model has performed on training and test data.
    +
    
+[`LogisticRegressionTrainingSummary`](api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionTrainingSummary)
    +provides an interface to access such relevant information. i.e the 
`objectiveHistory` and metrics
    +to evaluate the performance on the training data directly with very less 
code to be rewritten by
    +the user.
    +
    +This examples illustrates the use of `LogisticRegressionTrainingSummary` 
on some toy data.
    +
    +<div class="codetabs">
    +<div data-lang="scala">
    +{% highlight scala %}
    +import org.apache.spark.{SparkConf, SparkContext}
    +import org.apache.spark.ml.classification.{LogisticRegression, 
BinaryLogisticRegressionSummary}
    +import org.apache.spark.mllib.regression.LabeledPoint
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.sql.{Row, SQLContext}
    +
    +val conf = new SparkConf().setAppName("LogisticRegressionSummary")
    +val sc = new SparkContext(conf)
    +val sqlContext = new SQLContext(sc)
    +import sqlContext.implicits._
    +
    +// Use some random data for demonstration.
    +// Note that the RDD of LabeledPoints can be converted to a dataframe 
directly.
    +val data = sc.parallelize(Array(
    +  LabeledPoint(0.0, Vectors.dense(0.2, 4.5, 1.6)),
    +  LabeledPoint(1.0, Vectors.dense(3.1, 6.8, 3.6)),
    +  LabeledPoint(0.0, Vectors.dense(2.4, 0.9, 1.9)),
    +  LabeledPoint(1.0, Vectors.dense(9.1, 3.1, 3.6)),
    +  LabeledPoint(0.0, Vectors.dense(2.5, 1.9, 9.1)))
    +)
    +val logRegDataFrame = data.toDF()
    +
    +// Run Logistic Regression on your toy data.
    +// Since LogisticRegression is an estimator, it returns an instance of 
LogisticRegressionModel
    +// which is a transformer.
    +val logReg = new LogisticRegression()
    +logReg.setMaxIter(5)
    +logReg.setRegParam(0.01)
    +val logRegModel = logReg.fit(logRegDataFrame)
    +
    +// Extract the summary directly from the returned LogisticRegressionModel 
instance.
    +val trainingSummary = logRegModel.summary
    +
    +// Obtain the loss per iteration. This should decrease upto a certain 
point and
    +// then increase or show negligible change after this.
    +val objectiveHistory = trainingSummary.objectiveHistory
    +objectiveHistory.foreach(loss => println(loss))
    +
    +// Obtain the metrics useful to judge performance on test data.
    +val binarySummary = 
trainingSummary.asInstanceOf[BinaryLogisticRegressionSummary]
    --- End diff --
    
    Ditto for java example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to