[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94704/testReport)**
 for PR 22079 at commit 
[`bab8e68`](https://github.com/apache/spark/commit/bab8e68bc292a3a71ee378a839fd540dcf0a72bd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94701/testReport)**
 for PR 22079 at commit 
[`81f57fe`](https://github.com/apache/spark/commit/81f57febfa5f81cd41ac7803eb9d8931df88fc5c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94665/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94665/testReport)**
 for PR 22079 at commit 
[`8d2d558`](https://github.com/apache/spark/commit/8d2d5585b2c2832cd4d88b3851607ce15180cca5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94665/testReport)**
 for PR 22079 at commit 
[`8d2d558`](https://github.com/apache/spark/commit/8d2d5585b2c2832cd4d88b3851607ce15180cca5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22079
  
Hmmm... I somehow managed to break SparkR tests but fixing a comment. It 
seems to have auto-retried and broke the second time too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22079
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94656/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94656 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94656/testReport)**
 for PR 22079 at commit 
[`8d2d558`](https://github.com/apache/spark/commit/8d2d5585b2c2832cd4d88b3851607ce15180cca5).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/22079
  
Both seems fine to me, it's just a minor improvement. Normally we don't 
backport a improvement, but since it's a simple and small change I'm confident 
it is safe to also include the change in a backport PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94656 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94656/testReport)**
 for PR 22079 at commit 
[`8d2d558`](https://github.com/apache/spark/commit/8d2d5585b2c2832cd4d88b3851607ce15180cca5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22079
  
@jiangxb1987 

> We shall also include #20088 in this backport PR.

I did that shortly after commenting, which allowed the tests to pass. I 
squashed it into the first commit, so it wasn't obvious I did it.

Should I also include #20426 in this PR, or treat that separately?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-12 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/22079
  
We shall also include #20088 in this backport PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94633/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94633/testReport)**
 for PR 22079 at commit 
[`495cba5`](https://github.com/apache/spark/commit/495cba55aee0223daef089fc8513962997468f77).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class RecordBinaryComparator extends RecordComparator `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94633/testReport)**
 for PR 22079 at commit 
[`495cba5`](https://github.com/apache/spark/commit/495cba55aee0223daef089fc8513962997468f77).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22079
  
The test "model load / save" in ChiSqSelectorSuite fails because of this 
line in 

[ChiSqSelector.scala](https://github.com/apache/spark/blob/branch-2.2/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala#L147)



spark.createDataFrame(dataArray).repartition(1).write.parquet(Loader.dataPath(path))


In 2.4, the line is:


spark.createDataFrame(sc.makeRDD(dataArray, 
1)).write.parquet(Loader.dataPath(path))


If you change 2.4 to also have that line, and also remove the follow-up PR 
(#20426) to avoid sorting when there is one partition, this test also fails on 
2.4 in the same way.

So I am not sure which way to go: Update ChiSqSelector.scala to be like 2.4 
(simply a one line change), or make the test accept this new order.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94619/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94619/testReport)**
 for PR 22079 at commit 
[`efccc02`](https://github.com/apache/spark/commit/efccc028bce64bf4754ce81ee16533c19b4384b2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class RecordBinaryComparator extends RecordComparator `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22079
  
The original fix 
https://github.com/apache/spark/pull/22079/commits/efccc028bce64bf4754ce81ee16533c19b4384b2
 has been merged to Spark 2.3. After 5+ months, we have not received any 
correctness regression that is caused by this fix, although this fix definitely 
will introduce a performance regression. I think we should merge it. The 
general risk is not very high and it resolves a serious correctness bug. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22079
  
cc @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22079
  
**[Test build #94619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94619/testReport)**
 for PR 22079 at commit 
[`efccc02`](https://github.com/apache/spark/commit/efccc028bce64bf4754ce81ee16533c19b4384b2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

2018-08-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22079
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org