[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2017-06-14 Thread chenghao-intel
Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/13585
  
Oh, yes, I am closing it, will reopen it when we have another idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2017-06-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13585
  
@chenghao-intel Any update on this PR? Should we close this PR now and then 
revisit it later?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-07-25 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13585
  
@chenghao-intel Will you have time to update this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-16 Thread lianhuiwang
Github user lianhuiwang commented on the issue:

https://github.com/apache/spark/pull/13585
  
@liancheng I think you have access to revisit my branch. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-15 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13585
  
@lianhuiwang @chenghao-intel Thanks for working on this! As you already 
know, we are currently trying to get Spark 2.0 RC1 ASAP, please allow me to 
revisit both of your branches later. Sorry for the delay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-13 Thread lianhuiwang
Github user lianhuiwang commented on the issue:

https://github.com/apache/spark/pull/13585
  
@liancheng @chenghao-intel I think we can did it like mysql's range 
optimizer:http://dev.mysql.com/doc/refman/5.7/en/range-optimization.html. I 
have implemented it in my 
branch:https://github.com/lianhuiwang/spark/tree/partition_pruning. I did not 
combine many ranges because it will transform partition pruning predicates to 
Hive's filter expression in getPartitionsByFilter() and Hive's metastore 
database can filter partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-13 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13585
  
One problem in the tests is that other optimization rules may optimize the 
filter predicates before the newly added rule, and hide bugs in the new rule. 
The one @clockfly pointed out is one example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13585
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13585
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60376/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13585
  
**[Test build #60376 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60376/consoleFull)**
 for PR 13585 at commit 
[`79f7acb`](https://github.com/apache/spark/commit/79f7acbb660c2c398e21a36a7b92f316b7e5037f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread chenghao-intel
Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/13585
  
Updated with more meaningful function name and add more unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread chenghao-intel
Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/13585
  
cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13585
  
**[Test build #60376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60376/consoleFull)**
 for PR 13585 at commit 
[`79f7acb`](https://github.com/apache/spark/commit/79f7acbb660c2c398e21a36a7b92f316b7e5037f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-11 Thread chenghao-intel
Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/13585
  
Thank you all for the review, but I am not going to solve the CNF, the 
intention of this PR is to exact more partition pruning expression, so we will 
get have less partition to scan during the table scanning.

But I did find some bug in this PR, will add more unit test soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-11 Thread yangw1234
Github user yangw1234 commented on the issue:

https://github.com/apache/spark/pull/13585
  
Hi @liancheng , CNF is truly a more systematic way to deal with this 
problem. 

Not really sure I am right or not, but I think as long as we push the `not` 
operator down to the lowest level of the expression tree, the approach proposed 
by @chenghao-intel will work. Take the above example, expression `!(partition = 
1 && a > 3)` will be transformed to `(!(partition = 1)) || (! (a > 3))`, and 
according to the second example given by @chenghao-intel in the doc, the 
expression should be dropped entirely, so `partition = 1` will not be pruned. 
(But this rule is not appeared in the code, maybe he is working in progress to 
implemented this rule. I don't know for sure.) 

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-10 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13585
  
You probably meant "conjunction" (aka "logical and") instead of 
"disjunction" (aka "logical or") in the PR title and comments.

As @clockfly had pointed out, the current approach isn't correct. I think a 
better approach to extract as many partition column predicates as possible is 
through [CNF conversion][1], which pulls up all conjunctions to the top level, 
and then it's safe to do the optimization you intended to do in this PR.

There had been PR(s) tried to add CNF conversion to Spark SQL. However, one 
problem is that CNF conversion can lead to exponential explosion in respect to 
expression size (i.e. number of tree nodes in the expression tree). Thus 
usually we need to set an upper limit of the expression size and stops doing 
CNF conversion once the upper limit is exceeded.

[1]: https://en.wikipedia.org/wiki/Conjunctive_normal_form


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13585
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13585
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60263/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13585
  
**[Test build #60263 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60263/consoleFull)**
 for PR 13585 at commit 
[`08519f2`](https://github.com/apache/spark/commit/08519f2e7a3222cb791e6ce1b8af0c132ff16b29).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13585: [SPARK-15859][SQL] Optimize the partition pruning within...

2016-06-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13585
  
**[Test build #60263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60263/consoleFull)**
 for PR 13585 at commit 
[`08519f2`](https://github.com/apache/spark/commit/08519f2e7a3222cb791e6ce1b8af0c132ff16b29).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org