[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

jkbradley Thu, 17 Aug 2017 14:22:33 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15435#discussion_r133824486
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -1324,90 +1350,136 @@ private[ml] class MultiClassSummarizer extends 
Serializable {
     }
     
     /**
    - * Abstraction for multinomial Logistic Regression Training results.
    - * Currently, the training summary ignores the training weights except
    - * for the objective trace.
    - */
    -sealed trait LogisticRegressionTrainingSummary extends 
LogisticRegressionSummary {
    -
    -  /** objective function (scaled loss + regularization) at each iteration. 
*/
    -  def objectiveHistory: Array[Double]
    -
    -  /** Number of training iterations until termination */
    -  def totalIterations: Int = objectiveHistory.length
    -
    -}
    -
    -/**
    - * Abstraction for Logistic Regression Results for a given model.
    + * Abstraction for logistic regression results for a given model.
      */
     sealed trait LogisticRegressionSummary extends Serializable {
     
       /**
        * Dataframe output by the model's `transform` method.
        */
    +  @Since("2.3.0")
       def predictions: DataFrame
     
       /** Field in "predictions" which gives the probability of each class as 
a vector. */
    +  @Since("2.3.0")
       def probabilityCol: String
     
    +  /** Field in "predictions" which gives the prediction of each class. */
    +  @Since("2.3.0")
    +  def predictionCol: String
    +
       /** Field in "predictions" which gives the true label of each instance 
(if available). */
    +  @Since("2.3.0")
       def labelCol: String
     
       /** Field in "predictions" which gives the features of each instance as 
a vector. */
    +  @Since("2.3.0")
       def featuresCol: String
     
    +  @transient private val multiclassMetrics = {
    --- End diff --
    
    MulticlassMetrics provides a ```labels``` field which returns the list of 
labels.  In most cases, this will be values ```{0.0, 1.0, ..., 
numClasses-1}```.  However, if the training set is missing a label, then all of 
the arrays over labels (e.g., from ```truePositiveRateByLabel```) will be of 
length numClasses-1 instead of the expected numClasses.  In the future, it'd be 
nice to fix this by having them always be of length numClasses.  For now, how 
about we provide the labels field with this kind of explanation?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15435: [SPARK-17139][ML] Add model summary for Multinomi...

Reply via email to