[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-31 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68470611
  
Merged into master. Thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3702


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-29 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68286086
  
@srowen Sorry for the delay!  I'm really starting to wonder about this 
JIRA, though.  The collect() should return one BinaryLabelCounter per 
partition.  I'd assume people would have enough memory to store at least a few 
million BinaryLabelCounter instances on the driver.  Does that mean they have 
more than a few million partitions?

Sorry I didn't think about this earlier, and perhaps I'm just confusing 
myself now---let me know what you think.  Is there an issue to solve here?

Previously, I'd have said: With the update, this LGTM

Also, I did think of one use case which may change things: We've been 
talking about people using these methods to make plots.  Do you think people 
ever use them to choose thresholds?  If so, then people might want much 
finer-grained ROC curves than we've been thinking, and it might be worthwhile 
to do a fancy implementation which avoids binning.

At any rate, apologies for so much back-and-forth.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-29 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68295668
  
@jkbradley You'll end up with one `BinaryLabelCounter` per partition _per 
distinct key_ though. That's where the problem may occur.

I think this would definitely be used to pick thresholds, and for that, you 
don't need to download the curve, just find the optimal point. There, I'd say 
you simply don't bin at all, since it means the curve is approximate and 
down-sampled, and there's probably not much value in approximation.

I'm not terribly wedded to the change. It helps in the niche use case that 
one does want to down-sample. It adds some complexity though, and complexity 
adds up. 

It's also possible to down-sample the final curve, later. That could just 
be a utility function somewhere instead of injected into here. Would that be 
better? I think you lose a bit of information that way but it's an 
approximation to begin with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-29 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68299469
  
@srowen Right; I was indeed just confused.  I think it's fine.

LGTM

CC: @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3702#discussion_r22232433
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
 ---
@@ -28,9 +28,28 @@ import org.apache.spark.rdd.{RDD, UnionRDD}
  * Evaluator for binary classification.
  *
  * @param scoreAndLabels an RDD of (score, label) pairs.
+ * @param numBins if greater than 0, then the curves (ROC curve, PR curve) 
computed internally
+ *  will be down-sampled to this many bins. This is useful because the 
curve contains a
--- End diff --

Indent comment?  Also, looking at this, I like having the explanation, but 
it might be nice to format as:
Quick description.
Default value.
Caveat about approximation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-67993599
  
@srowen  I added one last comment; sorry, I should have thought of it 
earlier.  This took longer to do that I expected, but I think it's a good 
solution.  Thanks for the PR!  Except for the last comment, I think it's ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68004357
  
@jkbradley No problem, take a look at what I did to the scaladoc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68004777
  
  [Test build #24748 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24748/consoleFull)
 for   PR 3702 at commit 
[`1d34d05`](https://github.com/apache/spark/commit/1d34d05c499ea7be6327da68c9bd2457ffe5aa59).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68010995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24748/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-68010993
  
  [Test build #24748 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24748/consoleFull)
 for   PR 3702 at commit 
[`1d34d05`](https://github.com/apache/spark/commit/1d34d05c499ea7be6327da68c9bd2457ffe5aa59).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BinaryClassificationMetrics(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3702#discussion_r22183573
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
 ---
@@ -28,9 +28,23 @@ import org.apache.spark.rdd.{RDD, UnionRDD}
  * Evaluator for binary classification.
  *
  * @param scoreAndLabels an RDD of (score, label) pairs.
+ * @param numBins if greater than 0, then the curves (ROC curve, PR curve) 
computed internally
+ *  will be down-sampled to this many bins. This is useful because the 
curve contains a
+ *  point for each distinct score in the input, and this could be as large 
as the input itself --
+ *  millions of points or more, when thousands may be entirely sufficient 
to summarize the curve.
+ *  After down-sampling, the curves will instead be made of approximately 
`numBins` points instead.
+ *  Points are made from bins of equal numbers of consecutive points. The 
size of each bin
+ *  is `floor(scoreAndLabels.count() / numBins)`, which means the 
resulting number of bins
+ *  may not exactly equal numBins. The last bin in each partition may be 
smaller as a result,
+ *  meaning there may be an extra sample at partition boundaries.
+ *  If `numBins` is 0, no down-sampling will occur.
  */
 @Experimental
-class BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)]) 
extends Logging {
+class BinaryClassificationMetrics(
+val scoreAndLabels: RDD[(Double, Double)],
+val numBins: Int = 0) extends Logging {
--- End diff --

I'm not sure if this class has ever been used with Java, but doesn't this 
break binary compatibility with Java (b/c of the default parameter)?  Should 
you add a separate 1-argument constructor just taking scoreAndLabels?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3702#discussion_r22183579
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala
 ---
@@ -124,4 +124,36 @@ class BinaryClassificationMetricsSuite extends 
FunSuite with MLlibTestSparkConte
 
 validateMetrics(metrics, thresholds, rocCurve, prCurve, f1, f2, 
precisions, recalls)
   }
+
+  test(binary evaluation metrics with downsampling) {
+val scoreAndLabels = Seq(
+  (0.1, 0.0), (0.2, 0.0), (0.3, 1.0), (0.4, 0.0), (0.5, 0.0),
+  (0.6, 1.0), (0.7, 1.0), (0.8, 0.0), (0.9, 1.0))
+
+val scoreAndLabelsRDD = sc.parallelize(scoreAndLabels, 1)
+
+val original = new BinaryClassificationMetrics(scoreAndLabelsRDD)
+val originalROC = original.roc().collect().sorted.toList
+// Add 2 for (0,0) and (1,1) appended at either end
+assert(2 + scoreAndLabels.size == originalROC.size)
+assert(
+  List(
+(0.0,0.0),(0.0,0.25),(0.2,0.25),(0.2,0.5),(0.2,0.75),(0.4,0.75),
--- End diff --

Scala style (spaces)  (not sure how strict this is when it's a list of 
values like this in a test)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3702#discussion_r22183575
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
 ---
@@ -103,7 +117,37 @@ class BinaryClassificationMetrics(scoreAndLabels: 
RDD[(Double, Double)]) extends
   mergeValue = (c: BinaryLabelCounter, label: Double) = c += label,
   mergeCombiners = (c1: BinaryLabelCounter, c2: BinaryLabelCounter) = 
c1 += c2
 ).sortByKey(ascending = false)
-val agg = counts.values.mapPartitions { iter =
+
+val binnedCounts =
+  // Only down-sample if bins is  0
+  if (numBins == 0) {
+// Use original directly
+counts
+  } else {
+val countsSize = counts.count()
+// Group the iterator into chunks of about countsSize / numBins 
points,
+// so that the resulting number of bins is about numBins
+val grouping = countsSize / numBins
+if (grouping  2) {
+  // numBins was more than half of the size; no real point in 
down-sampling to bins
+  logInfo(sCurve is too small ($countsSize) for $numBins bins to 
be useful)
+  counts
+} else if (grouping = Int.MaxValue) {
+  logWarning(sCurve is too large ($countsSize) for $numBins bins; 
ignoring)
--- End diff --

I think this should set grouping to Int.MaxValue  (and print a warning) 
since it is these really big datasets which cause problems.  The default 
behavior should avoid failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3702#discussion_r22183580
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala
 ---
@@ -124,4 +124,36 @@ class BinaryClassificationMetricsSuite extends 
FunSuite with MLlibTestSparkConte
 
 validateMetrics(metrics, thresholds, rocCurve, prCurve, f1, f2, 
precisions, recalls)
   }
+
+  test(binary evaluation metrics with downsampling) {
+val scoreAndLabels = Seq(
+  (0.1, 0.0), (0.2, 0.0), (0.3, 1.0), (0.4, 0.0), (0.5, 0.0),
+  (0.6, 1.0), (0.7, 1.0), (0.8, 0.0), (0.9, 1.0))
+
+val scoreAndLabelsRDD = sc.parallelize(scoreAndLabels, 1)
+
+val original = new BinaryClassificationMetrics(scoreAndLabelsRDD)
+val originalROC = original.roc().collect().sorted.toList
+// Add 2 for (0,0) and (1,1) appended at either end
+assert(2 + scoreAndLabels.size == originalROC.size)
+assert(
+  List(
+(0.0,0.0),(0.0,0.25),(0.2,0.25),(0.2,0.5),(0.2,0.75),(0.4,0.75),
+(0.6,0.75),(0.6,1.0),(0.8,1.0),(1.0,1.0),(1.0,1.0)
+  ) ==
+  originalROC)
+
+val numBins = 4
+
+val downsampled = new BinaryClassificationMetrics(scoreAndLabelsRDD, 
numBins)
+val downsampledROC = downsampled.roc().collect().sorted.toList
+assert(
+  // May have to add 1 if the sample factor didn't divide evenly
+  2 + (numBins + (if (scoreAndLabels.size % numBins == 0) 0 else 1)) ==
+  downsampledROC.size)
+assert(
+  
List((0.0,0.0),(0.2,0.25),(0.2,0.75),(0.6,0.75),(0.8,1.0),(1.0,1.0),(1.0,1.0)) 
==
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-67876179
  
@srowen The logic  test look fine; I just added a couple of comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3702#discussion_r22193166
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
 ---
@@ -28,9 +28,23 @@ import org.apache.spark.rdd.{RDD, UnionRDD}
  * Evaluator for binary classification.
  *
  * @param scoreAndLabels an RDD of (score, label) pairs.
+ * @param numBins if greater than 0, then the curves (ROC curve, PR curve) 
computed internally
+ *  will be down-sampled to this many bins. This is useful because the 
curve contains a
+ *  point for each distinct score in the input, and this could be as large 
as the input itself --
+ *  millions of points or more, when thousands may be entirely sufficient 
to summarize the curve.
+ *  After down-sampling, the curves will instead be made of approximately 
`numBins` points instead.
+ *  Points are made from bins of equal numbers of consecutive points. The 
size of each bin
+ *  is `floor(scoreAndLabels.count() / numBins)`, which means the 
resulting number of bins
+ *  may not exactly equal numBins. The last bin in each partition may be 
smaller as a result,
+ *  meaning there may be an extra sample at partition boundaries.
+ *  If `numBins` is 0, no down-sampling will occur.
  */
 @Experimental
-class BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)]) 
extends Logging {
+class BinaryClassificationMetrics(
+val scoreAndLabels: RDD[(Double, Double)],
+val numBins: Int = 0) extends Logging {
--- End diff --

Ah probably. What about just adding a setter? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3702#discussion_r22194994
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
 ---
@@ -28,9 +28,23 @@ import org.apache.spark.rdd.{RDD, UnionRDD}
  * Evaluator for binary classification.
  *
  * @param scoreAndLabels an RDD of (score, label) pairs.
+ * @param numBins if greater than 0, then the curves (ROC curve, PR curve) 
computed internally
+ *  will be down-sampled to this many bins. This is useful because the 
curve contains a
+ *  point for each distinct score in the input, and this could be as large 
as the input itself --
+ *  millions of points or more, when thousands may be entirely sufficient 
to summarize the curve.
+ *  After down-sampling, the curves will instead be made of approximately 
`numBins` points instead.
+ *  Points are made from bins of equal numbers of consecutive points. The 
size of each bin
+ *  is `floor(scoreAndLabels.count() / numBins)`, which means the 
resulting number of bins
+ *  may not exactly equal numBins. The last bin in each partition may be 
smaller as a result,
+ *  meaning there may be an extra sample at partition boundaries.
+ *  If `numBins` is 0, no down-sampling will occur.
  */
 @Experimental
-class BinaryClassificationMetrics(scoreAndLabels: RDD[(Double, Double)]) 
extends Logging {
+class BinaryClassificationMetrics(
+val scoreAndLabels: RDD[(Double, Double)],
+val numBins: Int = 0) extends Logging {
--- End diff --

That sounds good to me too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-67908060
  
@jkbradley Changed my mind and went with a 2nd constructor. The reason is 
that it wouldn't actually make any sense to change the num bins after the curve 
is computed; it won't recompute. Best to just fix it once at construction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-67908132
  
  [Test build #24717 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24717/consoleFull)
 for   PR 3702 at commit 
[`692d825`](https://github.com/apache/spark/commit/692d825a3a6cc5d4b11395bd1d00943f18973348).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-67913401
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24717/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4547 [MLLIB] OOM when making bins in Bin...

2014-12-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3702#issuecomment-67913395
  
  [Test build #24717 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24717/consoleFull)
 for   PR 3702 at commit 
[`692d825`](https://github.com/apache/spark/commit/692d825a3a6cc5d4b11395bd1d00943f18973348).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class BinaryClassificationMetrics(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org