[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
@gatorsmile Thanks much!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22188
  
Normally, we do not backport such improvement PRs. However, the risk of 
this PR is pretty small. I think it is fine. Let me do this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
@gatorsmile 
>Why 2.2 only?

Only that I forgot that master is already on 2.4. We should do 2.3 as well, 
but I haven't tested it yet.

Do I need to do anything on my end to get it into 2.2, and once I test, 
into 2.3?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22188
  
@bersprockets The risk is pretty small I think. I am fine to backport it to 
the previous versions. Why 2.2 only?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
@cloud-fan @gatorsmile Should we merge this also onto 2.2? It was a clean 
cherry-pick for me (from master to branch-2.2), and I ran the top and bottom 
tests (6000 columns, 1 million rows, 67 32M files, and 60 columns, 100 million 
rows, 67 32M files) from the PR description and got the same results.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22188
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
OK, I reran the tests for the lower column count cases, and the runs with 
the patch consistently show a tiny (1-3%) improvement compared to the master 
branch. So even the lower column count cases benefit a little.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22188
  
That does seem counter intuitive, but no idea what could explain that since 
the new code seems like a straight-forward better version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread bersprockets
Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
Thanks @vanzin. In my benchmark tests, the tiny degradation (0.5%) in the 
lower column count cases is pretty consistent, which concerns me a little. I am 
going to re-run those tests in a different environment and see what happens.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22188
  
LGTM. Will leave here for a bit to see if anyone else comments...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22188
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95118/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22188
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22188
  
**[Test build #95118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95118/testReport)**
 for PR 22188 at commit 
[`697de21`](https://github.com/apache/spark/commit/697de21501acbda3dbcd8ccc13a35ad3723a652e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22188
  
**[Test build #95118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95118/testReport)**
 for PR 22188 at commit 
[`697de21`](https://github.com/apache/spark/commit/697de21501acbda3dbcd8ccc13a35ad3723a652e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22188
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22188
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org