[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

WeichenXu123 Thu, 15 Mar 2018 21:39:38 -0700

Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20837#discussion_r174996170
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -517,6 +517,9 @@ class LogisticRegression @Since("1.2.0") (
             (new MultivariateOnlineSummarizer, new MultiClassSummarizer)
           )(seqOp, combOp, $(aggregationDepth))
         }
    +    instr.logNamedValue(Instrumentation.loggerTags.numExamples, 
summarizer.count)
    +    instr.logNamedValue("lowestLabelWeight", 
labelSummarizer.histogram.min.toString)
    +    instr.logNamedValue("highestLabelWeight", 
labelSummarizer.histogram.min.toString)
    --- End diff --
    
    Why not log the whole histogram ( each label -> its weightSum ).
    Only log min/max weightSum seems useless and user even do not know they 
related to which label.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20837: [SPARK-23686][ML][WIP] Better instrumentation

Reply via email to