[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-30 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
OK, weight has been removed when calculating.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-29 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
The bucketing is trying to to bucket into buckets of equal P(x). It's a 
condition on P(y | x). That said the right point isn't knowable from the 
training data, and splitting to balance P(x) on either side of the split within 
the bucket is perhaps the next-most principled thing to do.

To reach a conclusion though: if we have slightly more net preference for a 
simple average, we could merge that change for now and decide later to make it 
weighted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-28 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
By the way, it's safe to use mean value as it is match the other libraries. 
If requested, I'd like to modify the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-28 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
For a (train) sample of continuous series, say {x0, x1, x2, x3, ..., x100}. 
Now spark select quantile as split point. 

Suppose 10-quantiles is used, and x2 is 1st quantile, and x10 is 2nd 
quantile. It's believed that P(x < x2) ~= P(x2 < x < x10). However, x2 is not 
perfect. As the data is continuous, there exits one point z is the real point 
who satisfy P(x < z) == P(z < x < x10).

And it's reasonable that averaged midpoint between x2 and x3 is more 
appropriate, in my option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-28 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
Ah OK I should think about this more first. Say you have a continuous 
predictor x and binary output y. Say the optimal split is found to be between 
0.1 and 0.2, with 1 observation of 0.1 and 99 of 0.2. Right now the algorithm 
would pick a split value of 0.2; it certainly can't be > 0.2 or < 0.1 but it's 
highly unlikely that 0.1 or 0.2 are the actual optimal split value.

A weighted mean says the best split is at 0.199, really. It makes sense if 
you're attempting to make sure that P(0.1 <= x < 0.199) ~= P(0.199 <= x <= 0.2) 
-- about half the cases in this critical range fall above and below the split. 
But really the goal is to find x such that P(y=1 | x) is about 0.5. It's not 
the same thing but it's also not knowable from the training data.

But 0.15 isn't obviously better either. It would mean that, probably, 
almost all test values in this critical range are classified as positive, not 
about half.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-28 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
@sethah what's the issue there ... train/test ought to be from the same 
distribution, in theory. The empirical distribution of the test data will of 
course be a little different, but what is the issue with that w.r.t. this 
change? From a theoretical perspective, picking the midpoint seems more 
justified than picking an endpoint, and a weighted mean moreso than a midpoint.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-27 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17556
  
I don't mind the weighted midpoints. However, if for a continuous feature 
we find that many points have the exact same value, we are assuming we may find 
data points in the test set that are close to but not these same values. But 
since our train data was clustered at these particular values, perhaps it's not 
a good assumption. I could live with either method, but maybe a slight 
preference to match the other libraries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3677/testReport)**
 for PR 17556 at commit 
[`031c61a`](https://github.com/apache/spark/commit/031c61a60d0638dc75133c60c045be2c9204b64b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3677/testReport)**
 for PR 17556 at commit 
[`031c61a`](https://github.com/apache/spark/commit/031c61a60d0638dc75133c60c045be2c9204b64b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-25 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
fix failed case, please retest it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3673/testReport)**
 for PR 17556 at commit 
[`19eab3a`](https://github.com/apache/spark/commit/19eab3aea2cc15448eb7cac2f08f190fae1e0033).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3673/testReport)**
 for PR 17556 at commit 
[`19eab3a`](https://github.com/apache/spark/commit/19eab3aea2cc15448eb7cac2f08f190fae1e0033).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-23 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
I scanned split critical of sklearn and xgboost.

1. sklearn
count all continuous values and split at mean value.

commit 5147fd09c6a063188efde444f47bd006fa5f95f0
sklearn/tree/_splitter.pyx: 484:
```python
current.threshold = (Xf[p - 1] + Xf[p]) / 2.0
```

2. xgboost: 
commit 49bdb5c97fccd81b1fdf032eab4599a065c6c4f6

+ If all continuous values are used as candidate, it uses mean value.

   src/tree/updater_colmaker.cc: 555:
   ```c++
   e.best.Update(loss_chg, fid, (fvalue + e.last_fvalue) * 0.5f, d_step 
== -1);
   ```
+ If continuous feature are quantized, it uses `cut`. I'm not familiar 
with C++ and update_histmaker.cc is a little complicate, hence I don't know 
what is `cut` indeed. However, it should be the same with current spark's split 
critical, I guess.

   src/tree/updater_histmaker.cc: 194:
   ```c++
   if (best->Update(static_cast(loss_chg), fid, hist.cut[i], 
false)) {
   ```

Anyway,  weighted mean is more reasonable than mean or cut value in my 
option.  And the PR is trivial enhancement for tree module, and it's not worth 
to spend much time because of obvious conclusion. 

However, we will be more confident if more feedback of experts are 
collected.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-23 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
That's good info. It's a tough call -- matching a known package is always 
nice. However I agree that a weighted split is a little more theoretically 
sound (don't have a reference on that though). I'd support this change, myself. 
It sounds like we won't find an exact match to the R GBM behavior except when 
each split has equal numbers of classes on either side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-22 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
Hi, I has checked R GBM's code and found that:
R's gbm uses mean value $(x + y) / 2$,  not weighted mean $(c_x * x + c_y * 
y) / (c_x + c_y)$ described in [JIRA 
SPARK-16957](https://issues.apache.org/jira/browse/SPARK-16957), for split 
point.

1. code snippet:
[gbm-developers/gbm](https://github.com/gbm-developers/gbm)
commit a1defa382a629f8b97bf9f552dcd821ee7ac9dac
src/node_search.cpp:145:
```c++
  else if(cCurrentVarClasses == 0)   // variable is continuous
  {
// Evaluate the current split
dCurrentSplitValue = 0.5*(dLastXValue + dX);
  }
```

2. test
To verify it, I create a toy dataset and take a test on R. 
```R
> f = c(0.0, 0.0, 1.0, 1.0, 1.0, 1.0)
> l = c(0,   0,   1,   1,   1,   1)
> df = data.frame(l, f)
> sapply(df, class)
l f
"numeric" "numeric"
> mod <- gbm(l~f, data=df, n.trees=1, bag.fraction=1, n.minobsinnode=1, 
distribution = "bernoulli")
> pretty.gbm.tree(mod)
  SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction 
Weight
00  5.00e-011 2   3   1.33  
6
1   -1 -3.00e-03   -1-1  -1   0.00  
2
2   -1  1.50e-03   -1-1  -1   0.00  
4
3   -1  1.480297e-19   -1-1  -1   0.00  
6
 Prediction
0  1.480297e-19
1 -3.00e-03
2  1.50e-03
3  1.480297e-19
```
As expected,
the root's split point is 5.00e-01, namely mean value `0.5 = (0 + 1) / 
2`, not weighted mean `0.7 = (0 * 2 + 1 * 4) / 6`.

3. conclusion
I prefer to using weighted mean for split point in the PR, rather than mean 
value in R's gbm package. How about you? @sethah @srowen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-14 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
@sethah Perhaps it's hard to compare R with Spark's behavior, since many 
factors involved. I'd like to read R GBM's code, and verify consistency of both 
side's design on split criteria. Is it OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17556
  
Seems like a reasonable change. Just left some minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17556
  
If we are attempting to match R GBM, it would be great to show, at least on 
the PR, that we get the same results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
many thanks, @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
It's looking good, and the R tests pass. I'll also ask @mengxr or maybe 
@dbtsai if they have any concerns about this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3662/testReport)**
 for PR 17556 at commit 
[`b74702a`](https://github.com/apache/spark/commit/b74702afa958fa3552e494cbe77590d9940bf1fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3662 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3662/testReport)**
 for PR 17556 at commit 
[`b74702a`](https://github.com/apache/spark/commit/b74702afa958fa3552e494cbe77590d9940bf1fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-12 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
I have ran all unit test case of MLlib in Python. However, I am not 
familiar with R, and I don't want waste too many time on deploying R's 
environment. 

Could CI retest the pr?  We can check if some unit tests are still broken. 
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-11 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
http://spark.apache.org/docs/latest/building-spark.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
@srowen Hi, I forget unit tests in python and R. Where can I find document 
about creating develop environment? thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3655 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3655/testReport)**
 for PR 17556 at commit 
[`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3655 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3655/testReport)**
 for PR 17556 at commit 
[`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3654 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3654/testReport)**
 for PR 17556 at commit 
[`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3654 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3654/testReport)**
 for PR 17556 at commit 
[`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
Just a flaky test. Can't be related


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
```
Test Result (1 failure / +1)

org.apache.spark.storage.TopologyAwareBlockReplicationPolicyBehavior.Peers in 2 
racks
```

Does anyone know what is this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-10 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
is there something wrong with spark CI? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3652 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3652/testReport)**
 for PR 17556 at commit 
[`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-07 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
It seems OK to me but @sethah or @jkbradley might be good as a second set 
of eyes. It does slightly alter behavior, but, it does seem like something that 
should work better in general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17556
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org