[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 OK, weight has been removed when calculating. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 The bucketing is trying to to bucket into buckets of equal P(x). It's a condition on P(y | x). That said the right point isn't knowable from the training data, and splitting to balance P(x) on either side of the split within the bucket is perhaps the next-most principled thing to do. To reach a conclusion though: if we have slightly more net preference for a simple average, we could merge that change for now and decide later to make it weighted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 By the way, it's safe to use mean value as it is match the other libraries. If requested, I'd like to modify the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 For a (train) sample of continuous series, say {x0, x1, x2, x3, ..., x100}. Now spark select quantile as split point. Suppose 10-quantiles is used, and x2 is 1st quantile, and x10 is 2nd quantile. It's believed that P(x < x2) ~= P(x2 < x < x10). However, x2 is not perfect. As the data is continuous, there exits one point z is the real point who satisfy P(x < z) == P(z < x < x10). And it's reasonable that averaged midpoint between x2 and x3 is more appropriate, in my option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 Ah OK I should think about this more first. Say you have a continuous predictor x and binary output y. Say the optimal split is found to be between 0.1 and 0.2, with 1 observation of 0.1 and 99 of 0.2. Right now the algorithm would pick a split value of 0.2; it certainly can't be > 0.2 or < 0.1 but it's highly unlikely that 0.1 or 0.2 are the actual optimal split value. A weighted mean says the best split is at 0.199, really. It makes sense if you're attempting to make sure that P(0.1 <= x < 0.199) ~= P(0.199 <= x <= 0.2) -- about half the cases in this critical range fall above and below the split. But really the goal is to find x such that P(y=1 | x) is about 0.5. It's not the same thing but it's also not knowable from the training data. But 0.15 isn't obviously better either. It would mean that, probably, almost all test values in this critical range are classified as positive, not about half. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 @sethah what's the issue there ... train/test ought to be from the same distribution, in theory. The empirical distribution of the test data will of course be a little different, but what is the issue with that w.r.t. this change? From a theoretical perspective, picking the midpoint seems more justified than picking an endpoint, and a weighted mean moreso than a midpoint. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17556 I don't mind the weighted midpoints. However, if for a continuous feature we find that many points have the exact same value, we are assuming we may find data points in the test set that are close to but not these same values. But since our train data was clustered at these particular values, perhaps it's not a good assumption. I could live with either method, but maybe a slight preference to match the other libraries. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3677/testReport)** for PR 17556 at commit [`031c61a`](https://github.com/apache/spark/commit/031c61a60d0638dc75133c60c045be2c9204b64b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3677 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3677/testReport)** for PR 17556 at commit [`031c61a`](https://github.com/apache/spark/commit/031c61a60d0638dc75133c60c045be2c9204b64b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 fix failed case, please retest it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3673/testReport)** for PR 17556 at commit [`19eab3a`](https://github.com/apache/spark/commit/19eab3aea2cc15448eb7cac2f08f190fae1e0033). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3673 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3673/testReport)** for PR 17556 at commit [`19eab3a`](https://github.com/apache/spark/commit/19eab3aea2cc15448eb7cac2f08f190fae1e0033). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 I scanned split critical of sklearn and xgboost. 1. sklearn count all continuous values and split at mean value. commit 5147fd09c6a063188efde444f47bd006fa5f95f0 sklearn/tree/_splitter.pyx: 484: ```python current.threshold = (Xf[p - 1] + Xf[p]) / 2.0 ``` 2. xgboost: commit 49bdb5c97fccd81b1fdf032eab4599a065c6c4f6 + If all continuous values are used as candidate, it uses mean value. src/tree/updater_colmaker.cc: 555: ```c++ e.best.Update(loss_chg, fid, (fvalue + e.last_fvalue) * 0.5f, d_step == -1); ``` + If continuous feature are quantized, it uses `cut`. I'm not familiar with C++ and update_histmaker.cc is a little complicate, hence I don't know what is `cut` indeed. However, it should be the same with current spark's split critical, I guess. src/tree/updater_histmaker.cc: 194: ```c++ if (best->Update(static_cast(loss_chg), fid, hist.cut[i], false)) { ``` Anyway, weighted mean is more reasonable than mean or cut value in my option. And the PR is trivial enhancement for tree module, and it's not worth to spend much time because of obvious conclusion. However, we will be more confident if more feedback of experts are collected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 That's good info. It's a tough call -- matching a known package is always nice. However I agree that a weighted split is a little more theoretically sound (don't have a reference on that though). I'd support this change, myself. It sounds like we won't find an exact match to the R GBM behavior except when each split has equal numbers of classes on either side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 Hi, I has checked R GBM's code and found that: R's gbm uses mean value $(x + y) / 2$, not weighted mean $(c_x * x + c_y * y) / (c_x + c_y)$ described in [JIRA SPARK-16957](https://issues.apache.org/jira/browse/SPARK-16957), for split point. 1. code snippet: [gbm-developers/gbm](https://github.com/gbm-developers/gbm) commit a1defa382a629f8b97bf9f552dcd821ee7ac9dac src/node_search.cpp:145: ```c++ else if(cCurrentVarClasses == 0) // variable is continuous { // Evaluate the current split dCurrentSplitValue = 0.5*(dLastXValue + dX); } ``` 2. test To verify it, I create a toy dataset and take a test on R. ```R > f = c(0.0, 0.0, 1.0, 1.0, 1.0, 1.0) > l = c(0, 0, 1, 1, 1, 1) > df = data.frame(l, f) > sapply(df, class) l f "numeric" "numeric" > mod <- gbm(l~f, data=df, n.trees=1, bag.fraction=1, n.minobsinnode=1, distribution = "bernoulli") > pretty.gbm.tree(mod) SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight 00 5.00e-011 2 3 1.33 6 1 -1 -3.00e-03 -1-1 -1 0.00 2 2 -1 1.50e-03 -1-1 -1 0.00 4 3 -1 1.480297e-19 -1-1 -1 0.00 6 Prediction 0 1.480297e-19 1 -3.00e-03 2 1.50e-03 3 1.480297e-19 ``` As expected, the root's split point is 5.00e-01, namely mean value `0.5 = (0 + 1) / 2`, not weighted mean `0.7 = (0 * 2 + 1 * 4) / 6`. 3. conclusion I prefer to using weighted mean for split point in the PR, rather than mean value in R's gbm package. How about you? @sethah @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 @sethah Perhaps it's hard to compare R with Spark's behavior, since many factors involved. I'd like to read R GBM's code, and verify consistency of both side's design on split criteria. Is it OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17556 Seems like a reasonable change. Just left some minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/17556 If we are attempting to match R GBM, it would be great to show, at least on the PR, that we get the same results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 many thanks, @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 It's looking good, and the R tests pass. I'll also ask @mengxr or maybe @dbtsai if they have any concerns about this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3662/testReport)** for PR 17556 at commit [`b74702a`](https://github.com/apache/spark/commit/b74702afa958fa3552e494cbe77590d9940bf1fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3662 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3662/testReport)** for PR 17556 at commit [`b74702a`](https://github.com/apache/spark/commit/b74702afa958fa3552e494cbe77590d9940bf1fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 I have ran all unit test case of MLlib in Python. However, I am not familiar with R, and I don't want waste too many time on deploying R's environment. Could CI retest the pr? We can check if some unit tests are still broken. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 http://spark.apache.org/docs/latest/building-spark.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 @srowen Hi, I forget unit tests in python and R. Where can I find document about creating develop environment? thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3655/testReport)** for PR 17556 at commit [`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3655 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3655/testReport)** for PR 17556 at commit [`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3654/testReport)** for PR 17556 at commit [`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3654 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3654/testReport)** for PR 17556 at commit [`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 Just a flaky test. Can't be related --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 ``` Test Result (1 failure / +1) org.apache.spark.storage.TopologyAwareBlockReplicationPolicyBehavior.Peers in 2 racks ``` Does anyone know what is this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 is there something wrong with spark CI? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17556 **[Test build #3652 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3652/testReport)** for PR 17556 at commit [`9ca5750`](https://github.com/apache/spark/commit/9ca57505c8211954478a2d54ced48c2561cfb9f9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 It seems OK to me but @sethah or @jkbradley might be good as a second set of eyes. It does slightly alter behavior, but, it does seem like something that should work better in general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17556 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org