[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20265
  
Thank you so much, @cloud-fan and @gatorsmile !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20265
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86257/testReport)**
 for PR 20265 at commit 
[`eb7035d`](https://github.com/apache/spark/commit/eb7035defe225c53ac8e43d63d6e3e6a974f4b1c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86251/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86251 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86251/testReport)**
 for PR 20265 at commit 
[`a556169`](https://github.com/apache/spark/commit/a5561697314527938a0cff085be33b215a746c4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20265
  
There might be many questions about ORC (or Parquet) performance 
benchmarks. We can do that later. We cannot enumerate all cases. Also, users 
can do that for their own workload. In fact, Apache Spark didn't show this kind 
of benchmark when it turns on PPD for Parquet. If there is a benchmark for 
Parquet, this PR will be a piece of cake.

I think this PR is enough to show the benefit of ORC PPD for enabling the 
config true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86257 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86257/testReport)**
 for PR 20265 at commit 
[`eb7035d`](https://github.com/apache/spark/commit/eb7035defe225c53ac8e43d63d6e3e6a974f4b1c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20265
  
@gatorsmile . The number of rows are also changed. Why do you think so?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20265
  
ORC performs further better when the number of columns is small. Maybe also 
add test cases back to show this observations?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86251 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86251/testReport)**
 for PR 20265 at commit 
[`a556169`](https://github.com/apache/spark/commit/a5561697314527938a0cff085be33b215a746c4a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20265
  
LGTM except one comment. Let's worry about row group/stripe size later, 
since both parquet and orc use default settings, I think it's still fair.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86200/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86200/testReport)**
 for PR 20265 at commit 
[`87af693`](https://github.com/apache/spark/commit/87af693a82f9591a256c55a5eca65041f330a225).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86200/testReport)**
 for PR 20265 at commit 
[`87af693`](https://github.com/apache/spark/commit/87af693a82f9591a256c55a5eca65041f330a225).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20265
  
I updated the PR (except one RowGroupSize/OrcStripeSize part).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20265
  
I'll update the PR tomorrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86143/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86143/testReport)**
 for PR 20265 at commit 
[`440f76b`](https://github.com/apache/spark/commit/440f76bdbf4d720a361e0afde3599027ff6e7be2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20265
  
I think we need to make sure parquet row group size and orc strip size is 
same, to make this benchmark fair.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86143/testReport)**
 for PR 20265 at commit 
[`440f76b`](https://github.com/apache/spark/commit/440f76bdbf4d720a361e0afde3599027ff6e7be2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20265
  
Hi, @cloud-fan and @gatorsmile .
Your questions are valid for all PPD cases. According to the comments, I 
added the following expressions (positive and negative) for both ORC/Parquet.
```
+// Positive cases: Select one or no rows
+Seq("id = 0", "id == 0", "id <= 0", "id < 1", "id IS NULL").foreach { 
expr =>
+  filterPushDownBenchmark(1024 * 1024 * 1, 20, expr)
+}
+
+// Negative cases: Select all rows which means the predicate is always 
true.
+Seq("id > -1", "id != -1", "id IS NOT NULL").foreach { expr =>
+  filterPushDownBenchmark(1024 * 1024 * 1, 20, expr)
+}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20265
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86116/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86116 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86116/testReport)**
 for PR 20265 at commit 
[`dda5bdf`](https://github.com/apache/spark/commit/dda5bdf6865018613eeb98c6acbdd39ab2459c87).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20265
  
**[Test build #86116 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86116/testReport)**
 for PR 20265 at commit 
[`dda5bdf`](https://github.com/apache/spark/commit/dda5bdf6865018613eeb98c6acbdd39ab2459c87).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20265
  
cc @cloud-fan , @gatorsmile .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org