Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17084#discussion_r123924688 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -36,12 +36,18 @@ import org.apache.spark.sql.types.DoubleType @Since("1.2.0") @Experimental class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0") override val uid: String) - extends Evaluator with HasRawPredictionCol with HasLabelCol with DefaultParamsWritable { + extends Evaluator with HasRawPredictionCol with HasLabelCol + with HasWeightCol with DefaultParamsWritable { @Since("1.2.0") def this() = this(Identifiable.randomUID("binEval")) /** + * Default number of bins to use for binary classification evaluation. + */ + val defaultNumberOfBins = 1000 --- End diff -- It seemed like a good default value to use - for graphing ROC curve, it's not too large for most plots, but it's not so small that the graph would be jagged. The user can always specify a value to override the default. However, it's usually not a good idea to sort over the entire label/score values, since the dataset will probably be very large, the operation will be very slow, and when visualizing the data there won't be any difference, so by default we should try to discourage the user from not down-sampling the number of bins.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org