[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68470611 Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3702 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68286086 @srowen Sorry for the delay! I'm really starting to wonder about this JIRA, though. The collect() should return one BinaryLabelCounter per partition. I'd assume people would have enough memory to store at least a few million BinaryLabelCounter instances on the driver. Does that mean they have more than a few million partitions? Sorry I didn't think about this earlier, and perhaps I'm just confusing myself now---let me know what you think. Is there an issue to solve here? Previously, I'd have said: With the update, this LGTM Also, I did think of one use case which may change things: We've been talking about people using these methods to make plots. Do you think people ever use them to choose thresholds? If so, then people might want much finer-grained ROC curves than we've been thinking, and it might be worthwhile to do a fancy implementation which avoids binning. At any rate, apologies for so much back-and-forth. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68295668 @jkbradley You'll end up with one `BinaryLabelCounter` per partition _per distinct key_ though. That's where the problem may occur. I think this would definitely be used to pick thresholds, and for that, you don't need to download the curve, just find the optimal point. There, I'd say you simply don't bin at all, since it means the curve is approximate and down-sampled, and there's probably not much value in approximation. I'm not terribly wedded to the change. It helps in the niche use case that one does want to down-sample. It adds some complexity though, and complexity adds up. It's also possible to down-sample the final curve, later. That could just be a utility function somewhere instead of injected into here. Would that be better? I think you lose a bit of information that way but it's an approximation to begin with. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68299469 @srowen Right; I was indeed just confused. I think it's fine. LGTM CC: @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22232433 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,28 @@ import org.apache.spark.rdd.{RDD, UnionRDD} * Evaluator for binary classification. * * @param scoreAndLabels an RDD of (score, label) pairs. + * @param numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally + * will be down-sampled to this many bins. This is useful because the curve contains a --- End diff -- Indent comment? Also, looking at this, I like having the explanation, but it might be nice to format as: Quick description. Default value. Caveat about approximation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67993599 @srowen I added one last comment; sorry, I should have thought of it earlier. This took longer to do that I expected, but I think it's a good solution. Thanks for the PR! Except for the last comment, I think it's ready. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68004357 @jkbradley No problem, take a look at what I did to the scaladoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68004777 [Test build #24748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24748/consoleFull) for PR 3702 at commit [`1d34d05`](https://github.com/apache/spark/commit/1d34d05c499ea7be6327da68c9bd2457ffe5aa59). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68010995 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24748/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-68010993 [Test build #24748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24748/consoleFull) for PR 3702 at commit [`1d34d05`](https://github.com/apache/spark/commit/1d34d05c499ea7be6327da68c9bd2457ffe5aa59). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BinaryClassificationMetrics(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183573 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,23 @@ import org.apache.spark.rdd.{RDD, UnionRDD} * Evaluator for binary classification. * * @param scoreAndLabels an RDD of (score, label) pairs. + * @param numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally + * will be down-sampled to this many bins. This is useful because the curve contains a + * point for each distinct score in the input, and this could be as large as the input itself -- + * millions of points or more, when thousands may be entirely sufficient to summarize the curve. + * After down-sampling, the curves will instead be made of approximately `numBins` points instead. + * Points are made from bins of equal numbers of consecutive points. The size of each bin + * is `floor(scoreAndLabels.count() / numBins)`, which means the resulting number of bins + * may not exactly equal numBins. The last bin in each partition may be smaller as a result, + * meaning there may be an extra sample at partition boundaries. + * If `numBins` is 0, no down-sampling will occur. */ @Experimental -class BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)]) extends Logging { +class BinaryClassificationMetrics( +val scoreAndLabels: RDD[(Double, Double)], +val numBins: Int = 0) extends Logging { --- End diff -- I'm not sure if this class has ever been used with Java, but doesn't this break binary compatibility with Java (b/c of the default parameter)? Should you add a separate 1-argument constructor just taking scoreAndLabels? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183579 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -124,4 +124,36 @@ class BinaryClassificationMetricsSuite extends FunSuite with MLlibTestSparkConte validateMetrics(metrics, thresholds, rocCurve, prCurve, f1, f2, precisions, recalls) } + + test(binary evaluation metrics with downsampling) { +val scoreAndLabels = Seq( + (0.1, 0.0), (0.2, 0.0), (0.3, 1.0), (0.4, 0.0), (0.5, 0.0), + (0.6, 1.0), (0.7, 1.0), (0.8, 0.0), (0.9, 1.0)) + +val scoreAndLabelsRDD = sc.parallelize(scoreAndLabels, 1) + +val original = new BinaryClassificationMetrics(scoreAndLabelsRDD) +val originalROC = original.roc().collect().sorted.toList +// Add 2 for (0,0) and (1,1) appended at either end +assert(2 + scoreAndLabels.size == originalROC.size) +assert( + List( +(0.0,0.0),(0.0,0.25),(0.2,0.25),(0.2,0.5),(0.2,0.75),(0.4,0.75), --- End diff -- Scala style (spaces) (not sure how strict this is when it's a list of values like this in a test) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183575 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -103,7 +117,37 @@ class BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)]) extends mergeValue = (c: BinaryLabelCounter, label: Double) = c += label, mergeCombiners = (c1: BinaryLabelCounter, c2: BinaryLabelCounter) = c1 += c2 ).sortByKey(ascending = false) -val agg = counts.values.mapPartitions { iter = + +val binnedCounts = + // Only down-sample if bins is 0 + if (numBins == 0) { +// Use original directly +counts + } else { +val countsSize = counts.count() +// Group the iterator into chunks of about countsSize / numBins points, +// so that the resulting number of bins is about numBins +val grouping = countsSize / numBins +if (grouping 2) { + // numBins was more than half of the size; no real point in down-sampling to bins + logInfo(sCurve is too small ($countsSize) for $numBins bins to be useful) + counts +} else if (grouping = Int.MaxValue) { + logWarning(sCurve is too large ($countsSize) for $numBins bins; ignoring) --- End diff -- I think this should set grouping to Int.MaxValue (and print a warning) since it is these really big datasets which cause problems. The default behavior should avoid failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22183580 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala --- @@ -124,4 +124,36 @@ class BinaryClassificationMetricsSuite extends FunSuite with MLlibTestSparkConte validateMetrics(metrics, thresholds, rocCurve, prCurve, f1, f2, precisions, recalls) } + + test(binary evaluation metrics with downsampling) { +val scoreAndLabels = Seq( + (0.1, 0.0), (0.2, 0.0), (0.3, 1.0), (0.4, 0.0), (0.5, 0.0), + (0.6, 1.0), (0.7, 1.0), (0.8, 0.0), (0.9, 1.0)) + +val scoreAndLabelsRDD = sc.parallelize(scoreAndLabels, 1) + +val original = new BinaryClassificationMetrics(scoreAndLabelsRDD) +val originalROC = original.roc().collect().sorted.toList +// Add 2 for (0,0) and (1,1) appended at either end +assert(2 + scoreAndLabels.size == originalROC.size) +assert( + List( +(0.0,0.0),(0.0,0.25),(0.2,0.25),(0.2,0.5),(0.2,0.75),(0.4,0.75), +(0.6,0.75),(0.6,1.0),(0.8,1.0),(1.0,1.0),(1.0,1.0) + ) == + originalROC) + +val numBins = 4 + +val downsampled = new BinaryClassificationMetrics(scoreAndLabelsRDD, numBins) +val downsampledROC = downsampled.roc().collect().sorted.toList +assert( + // May have to add 1 if the sample factor didn't divide evenly + 2 + (numBins + (if (scoreAndLabels.size % numBins == 0) 0 else 1)) == + downsampledROC.size) +assert( + List((0.0,0.0),(0.2,0.25),(0.2,0.75),(0.6,0.75),(0.8,1.0),(1.0,1.0),(1.0,1.0)) == --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67876179 @srowen The logic test look fine; I just added a couple of comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22193166 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,23 @@ import org.apache.spark.rdd.{RDD, UnionRDD} * Evaluator for binary classification. * * @param scoreAndLabels an RDD of (score, label) pairs. + * @param numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally + * will be down-sampled to this many bins. This is useful because the curve contains a + * point for each distinct score in the input, and this could be as large as the input itself -- + * millions of points or more, when thousands may be entirely sufficient to summarize the curve. + * After down-sampling, the curves will instead be made of approximately `numBins` points instead. + * Points are made from bins of equal numbers of consecutive points. The size of each bin + * is `floor(scoreAndLabels.count() / numBins)`, which means the resulting number of bins + * may not exactly equal numBins. The last bin in each partition may be smaller as a result, + * meaning there may be an extra sample at partition boundaries. + * If `numBins` is 0, no down-sampling will occur. */ @Experimental -class BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)]) extends Logging { +class BinaryClassificationMetrics( +val scoreAndLabels: RDD[(Double, Double)], +val numBins: Int = 0) extends Logging { --- End diff -- Ah probably. What about just adding a setter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3702#discussion_r22194994 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala --- @@ -28,9 +28,23 @@ import org.apache.spark.rdd.{RDD, UnionRDD} * Evaluator for binary classification. * * @param scoreAndLabels an RDD of (score, label) pairs. + * @param numBins if greater than 0, then the curves (ROC curve, PR curve) computed internally + * will be down-sampled to this many bins. This is useful because the curve contains a + * point for each distinct score in the input, and this could be as large as the input itself -- + * millions of points or more, when thousands may be entirely sufficient to summarize the curve. + * After down-sampling, the curves will instead be made of approximately `numBins` points instead. + * Points are made from bins of equal numbers of consecutive points. The size of each bin + * is `floor(scoreAndLabels.count() / numBins)`, which means the resulting number of bins + * may not exactly equal numBins. The last bin in each partition may be smaller as a result, + * meaning there may be an extra sample at partition boundaries. + * If `numBins` is 0, no down-sampling will occur. */ @Experimental -class BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)]) extends Logging { +class BinaryClassificationMetrics( +val scoreAndLabels: RDD[(Double, Double)], +val numBins: Int = 0) extends Logging { --- End diff -- That sounds good to me too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67908060 @jkbradley Changed my mind and went with a 2nd constructor. The reason is that it wouldn't actually make any sense to change the num bins after the curve is computed; it won't recompute. Best to just fix it once at construction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67908132 [Test build #24717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24717/consoleFull) for PR 3702 at commit [`692d825`](https://github.com/apache/spark/commit/692d825a3a6cc5d4b11395bd1d00943f18973348). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67913401 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24717/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3702#issuecomment-67913395 [Test build #24717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24717/consoleFull) for PR 3702 at commit [`692d825`](https://github.com/apache/spark/commit/692d825a3a6cc5d4b11395bd1d00943f18973348). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BinaryClassificationMetrics(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org