Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17084#discussion_r123924688
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
 ---
    @@ -36,12 +36,18 @@ import org.apache.spark.sql.types.DoubleType
     @Since("1.2.0")
     @Experimental
     class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") 
override val uid: String)
    -  extends Evaluator with HasRawPredictionCol with HasLabelCol with 
DefaultParamsWritable {
    +  extends Evaluator with HasRawPredictionCol with HasLabelCol
    +    with HasWeightCol with DefaultParamsWritable {
     
       @Since("1.2.0")
       def this() = this(Identifiable.randomUID("binEval"))
     
       /**
    +   * Default number of bins to use for binary classification evaluation.
    +   */
    +  val defaultNumberOfBins = 1000
    --- End diff --
    
    It seemed like a good default value to use - for graphing ROC curve, it's 
not too large for most plots, but it's not so small that the graph would be 
jagged.  The user can always specify a value to override the default.  However, 
it's usually not a good idea to sort over the entire label/score values, since 
the dataset will probably be very large, the operation will be very slow, and 
when visualizing the data there won't be any difference, so by default we 
should try to discourage the user from not down-sampling the number of bins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to