subject:"\[GitHub\] spark pull request #13893\: \[SPARK\-14172\]\[SQL\] Hive table partition predicate..."

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-07 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73821493
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

@cloud-fan Now that I could insert a `Scanner` operator over 
`CatalogRelation` in `Optimizer`, but I noticed a relation may also be 
something like `l: LogicalRelation(relation: CatalogRelation, _, _)`, in this 
case, we couldn't analyze the class `LogicalRelation` because it's in package 
`spark-sql` while `Optimizer` is in `spark-catalyst`, thus we are not able to 
determine whether a `Scanner` should be added. I think we don't want to add 
`Scanner` over every `BaseRelation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73325120
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

yup, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-03 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73318885
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

@cloud-fan Do you mean something like adding in basicLogicalOperators the 
following:
case class Scanner( projectionList: Seq[NamedExpression], filters: 
Seq[Expression], child: LogicalPlan) extends UnaryNode
And pass that to the planner instead of applying PhysicalOperation?

I'm willing to take this work. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73305098
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

after an offline discussion with @liancheng , we think it would be better 
to have a wrapper node for scan(table scan or file scan), and this wrapper node 
can also hold project list and filter conditions. Then in optimizer we can 
improve the ColumnPrunning and FilterPushdown rules to push down into this 
wrapper node. After this we don't need `PhysicalOperator` anymore and the 
planner can match on the wrapper node directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-02 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73176693
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

Thank you @cloud-fan for pointing that out, I realized my previous thoughts 
were wrong. I fully agree with @liancheng 's improvement idea. Will update 
related code as well as new testcases tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-02 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73156772
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

I also think that silently dropping nondeterministic filters can be 
dangerous. Maybe we should just return all operators beneath the top-most 
nondeterministic filter as the bottom operator?

For example, say we have a plan tree like this:

```
Project a, b
 Filter a > 1
  Filter b < 3
   Filter RAND(42) > 0.5
Filter c < 2
 TableScan t
```

We should return the following result:

```
(
  // Project list
  Seq(a, b),

  // Deterministic filters
  Seq(b < 3, a > 1),

  // The top-most nondeterministic operator filter with all operators 
beneath
  Filter(
RAND(42) > 0.5,
Filter(c < 2,
  TableScan(t)))
)
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73136264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

Can you write a test about this? The logic in `DataSourceStrategy` shows 
that, when we get a scan node with the projects and filters upon it, we will 
rebuild the project and filter(with project lists and filter conditions merged) 
and wrap the scan node with it. So the filter condition that isn't returned by 
`collectProjectsAndFilters` won't get executed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-02 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73126756
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

@cloud-fan Thanks for your comment! But I did searched the codebase and 
found the returned `filters` only used for predicates pushdown or partition 
pruning, in both case it should be safe for us to drop the `rest` condition. 
Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-08-02 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r73117855
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,17 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+
+// Deterministic parts of filter condition placed before 
non-deterministic predicates could
+// be pushed down safely.
+val (pushDown, rest) =
--- End diff --

after think about it more, I think it's not safe to do so. 
`collectProjectsAndFilters` should return all deterministic projects and 
filters upon a scan node. And the returned filter conditions are not only used 
for filter pushdown, but also treated as the whole filters upon this scan node. 
So the `rest` conditions here won't get executed.

cc @liancheng to confirm this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-06-27 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r68557086
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala 
---
@@ -141,6 +141,14 @@ class PruningSuite extends HiveComparisonTest with 
BeforeAndAfter {
   Seq("2008-04-08", "11"),
   Seq("2008-04-09", "11")))
 
+  createPruningTest("Partition pruning - with nondeterministic fields",
+"SELECT value, hr FROM srcpart1 WHERE ds = '2008-04-08' AND rand(7) < 
1",
--- End diff --

Yes you are right. I thought the deterministic part can always be PPDed 
safely but it was not, in fact, the order of each part should also be 
considered. For example:
rand() < 0.01 AND partition_col = 'some_value'
should not be PPDed, but
partition_col = 'some_value' AND rand() < 0.01
still could be.
Thank you for your kindly reply!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-06-27 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r68556327
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala 
---
@@ -141,6 +141,14 @@ class PruningSuite extends HiveComparisonTest with 
BeforeAndAfter {
   Seq("2008-04-08", "11"),
   Seq("2008-04-09", "11")))
 
+  createPruningTest("Partition pruning - with nondeterministic fields",
+"SELECT value, hr FROM srcpart1 WHERE ds = '2008-04-08' AND rand(7) < 
1",
--- End diff --

Good question. I think technically we can't push down any predicates that 
are placed after a non-deterministic predicate. Otherwise number of input rows 
may change and lead to wrong query results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-06-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r68538346
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala 
---
@@ -141,6 +141,14 @@ class PruningSuite extends HiveComparisonTest with 
BeforeAndAfter {
   Seq("2008-04-08", "11"),
   Seq("2008-04-09", "11")))
 
+  createPruningTest("Partition pruning - with nondeterministic fields",
+"SELECT value, hr FROM srcpart1 WHERE ds = '2008-04-08' AND rand(7) < 
1",
--- End diff --

cc @liancheng @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-06-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r68538310
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala 
---
@@ -141,6 +141,14 @@ class PruningSuite extends HiveComparisonTest with 
BeforeAndAfter {
   Seq("2008-04-08", "11"),
   Seq("2008-04-09", "11")))
 
+  createPruningTest("Partition pruning - with nondeterministic fields",
+"SELECT value, hr FROM srcpart1 WHERE ds = '2008-04-08' AND rand(7) < 
1",
--- End diff --

I'm not sure if it's safe to push it down. For non-deterministic 
expressions, the order(or number) of  input rows matters. If we push down the 
deterministic part of filter condition, then the input rows to the remaining 
filter condition will change and may result to wrong answer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-06-24 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r68483394
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,12 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
+val (deterministicFilters, nondeterministicFilters) = 
splitConjunctivePredicates(condition)
+  .partition(_.deterministic)
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+val substitutedFilters = 
deterministicFilters.map(substitute(aliases)(_))
+(fields, filters ++ deterministicFilters, other, aliases)
--- End diff --

@AlekseiS U r right! I have fixed this, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-06-24 Thread AlekseiS

Github user AlekseiS commented on a diff in the pull request:

https://github.com/apache/spark/pull/13893#discussion_r68438486
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -64,10 +64,12 @@ object PhysicalOperation extends PredicateHelper {
 val substitutedFields = 
fields.map(substitute(aliases)).asInstanceOf[Seq[NamedExpression]]
 (Some(substitutedFields), filters, other, 
collectAliases(substitutedFields))
 
-  case Filter(condition, child) if condition.deterministic =>
+  case Filter(condition, child) =>
+val (deterministicFilters, nondeterministicFilters) = 
splitConjunctivePredicates(condition)
+  .partition(_.deterministic)
 val (fields, filters, other, aliases) = 
collectProjectsAndFilters(child)
-val substitutedCondition = substitute(aliases)(condition)
-(fields, filters ++ 
splitConjunctivePredicates(substitutedCondition), other, aliases)
+val substitutedFilters = 
deterministicFilters.map(substitute(aliases)(_))
+(fields, filters ++ deterministicFilters, other, aliases)
--- End diff --

Likely there's a bug. I think you want to use "substitutedFilters" instead 
of "deterministicFilters" here. I think you can also add a test with 
substitution for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

2016-06-24 Thread jiangxb1987

GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/13893

[SPARK-14172][SQL] Hive table partition predicate not passed down correctly

## What changes were proposed in this pull request?

Currently partition predicate is not passed down correctly when condition 
contains nondeterministic parts. This PR changed the logic in 
collectProjectsAndFilters() to add the deterministic parts into filters, so 
that partition predicate can be passed down correctly.

## How was this patch tested?

new test in PruningSuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13893.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13893


commit ce6f0b8186dd18301e5087d16a5139057c42ab77
Author: èæå 
Date:   2016-06-24T11:53:08Z

[SPARK-14172][SQL] Hive table partition predicate not passed down correctly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

[GitHub] spark pull request #13893: [SPARK-14172][SQL] Hive table partition predicate...

16 matches

Site Navigation

Mail list logo

Footer information