[jira] [Work logged] (HIVE-24087) FK side join elimination in presence of PK-FK constraint

2020-09-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24087?focusedWorklogId=477269&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-477269
 ]

ASF GitHub Bot logged work on HIVE-24087:
-

Author: ASF GitHub Bot
Created on: 01/Sep/20 15:15
Start Date: 01/Sep/20 15:15
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 merged pull request #1440:
URL: https://github.com/apache/hive/pull/1440


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 477269)
Time Spent: 1h 10m  (was: 1h)

> FK side join elimination in presence of PK-FK constraint
> 
>
> Key: HIVE-24087
> URL: https://issues.apache.org/jira/browse/HIVE-24087
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If there is PK-FK join FK join could be eliminated by removing FK side if 
> following conditions are met
> * There is no row filtering on FK side.
> * No columns from FK side is required after JOIN.
> * FK join columns are guranteed to be unique (have group by)
> * FK join columns are guranteed to be NOT NULL (either IS NOT NULL filter or 
> constraint)
> *Example*
> {code:sql}
> EXPLAIN 
> SELECT customer_removal_n0.*
> FROM customer_removal_n0
> JOIN
> (SELECT lo_custkey
> FROM lineorder_removal_n0
> WHERE lo_custkey IS NOT NULL
> GROUP BY lo_custkey) fkSide ON fkSide.lo_custkey = 
> customer_removal_n0.c_custkey;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24087) FK side join elimination in presence of PK-FK constraint

2020-08-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24087?focusedWorklogId=475925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475925
 ]

ASF GitHub Bot logged work on HIVE-24087:
-

Author: ASF GitHub Bot
Created on: 28/Aug/20 17:49
Start Date: 28/Aug/20 17:49
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1440:
URL: https://github.com/apache/hive/pull/1440#discussion_r479451432



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
##
@@ -75,6 +75,7 @@
 import org.apache.hadoop.hive.ql.optimizer.calcite.translator.TypeConverter;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
+import org.apache.parquet.Preconditions;

Review comment:
   nit. Use guava preconditions instead of Parquet.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475925)
Time Spent: 1h  (was: 50m)

> FK side join elimination in presence of PK-FK constraint
> 
>
> Key: HIVE-24087
> URL: https://issues.apache.org/jira/browse/HIVE-24087
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If there is PK-FK join FK join could be eliminated by removing FK side if 
> following conditions are met
> * There is no row filtering on FK side.
> * No columns from FK side is required after JOIN.
> * FK join columns are guranteed to be unique (have group by)
> * FK join columns are guranteed to be NOT NULL (either IS NOT NULL filter or 
> constraint)
> *Example*
> {code:sql}
> EXPLAIN 
> SELECT customer_removal_n0.*
> FROM customer_removal_n0
> JOIN
> (SELECT lo_custkey
> FROM lineorder_removal_n0
> WHERE lo_custkey IS NOT NULL
> GROUP BY lo_custkey) fkSide ON fkSide.lo_custkey = 
> customer_removal_n0.c_custkey;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24087) FK side join elimination in presence of PK-FK constraint

2020-08-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24087?focusedWorklogId=475907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475907
 ]

ASF GitHub Bot logged work on HIVE-24087:
-

Author: ASF GitHub Bot
Created on: 28/Aug/20 17:06
Start Date: 28/Aug/20 17:06
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #1440:
URL: https://github.com/apache/hive/pull/1440#issuecomment-682920806


   @jcamachor Thanks for the suggestions. I will add all three tests (and yes 
all of these cases are expected to trigger the rewrite)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475907)
Time Spent: 50m  (was: 40m)

> FK side join elimination in presence of PK-FK constraint
> 
>
> Key: HIVE-24087
> URL: https://issues.apache.org/jira/browse/HIVE-24087
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If there is PK-FK join FK join could be eliminated by removing FK side if 
> following conditions are met
> * There is no row filtering on FK side.
> * No columns from FK side is required after JOIN.
> * FK join columns are guranteed to be unique (have group by)
> * FK join columns are guranteed to be NOT NULL (either IS NOT NULL filter or 
> constraint)
> *Example*
> {code:sql}
> EXPLAIN 
> SELECT customer_removal_n0.*
> FROM customer_removal_n0
> JOIN
> (SELECT lo_custkey
> FROM lineorder_removal_n0
> WHERE lo_custkey IS NOT NULL
> GROUP BY lo_custkey) fkSide ON fkSide.lo_custkey = 
> customer_removal_n0.c_custkey;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24087) FK side join elimination in presence of PK-FK constraint

2020-08-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24087?focusedWorklogId=475906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475906
 ]

ASF GitHub Bot logged work on HIVE-24087:
-

Author: ASF GitHub Bot
Created on: 28/Aug/20 17:05
Start Date: 28/Aug/20 17:05
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1440:
URL: https://github.com/apache/hive/pull/1440#discussion_r479430314



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java
##
@@ -213,61 +218,137 @@ public void onMatch(RelOptRuleCall call) {
 
 // 2) Check whether this join can be rewritten or removed
 RewritablePKFKJoinInfo r = HiveRelOptUtil.isRewritablePKFKJoin(
-join, leftInput == fkInput, call.getMetadataQuery());
+join, fkInput,  nonFkInput, call.getMetadataQuery());
 
 // 3) If it is the only condition, we can trigger the rewriting
 if (r.rewritable) {
-  List nullableNodes = r.nullableNodes;
-  // If we reach here, we trigger the transform
-  if (mode == Mode.REMOVE) {
-if (rightInputPotentialFK) {
-  // First, if FK is the right input, we need to shift
-  nullableNodes = nullableNodes.stream()
-  .map(node -> RexUtil.shift(node, 0, 
-leftInput.getRowType().getFieldCount()))
-  .collect(Collectors.toList());
-  topProjExprs = topProjExprs.stream()
-  .map(node -> RexUtil.shift(node, 0, 
-leftInput.getRowType().getFieldCount()))
-  .collect(Collectors.toList());
-}
-// Fix nullability in references to the input node
-topProjExprs = HiveCalciteUtil.fixNullability(rexBuilder, 
topProjExprs, RelOptUtil.getFieldTypeList(fkInput.getRowType()));
-// Trigger transformation
-if (nullableNodes.isEmpty()) {
-  call.transformTo(call.builder()
-  .push(fkInput)
-  .project(topProjExprs)
-  .convert(project.getRowType(), false)
-  .build());
+  rewrite(mode, fkInput, nonFkInput, join, topProjExprs, call, project, 
r.nullableNodes);
+} else {
+  // check if FK side could be removed instead
+
+  // Possibly this could be enhanced to take other join type into 
consideration.
+  if (joinType != JoinRelType.INNER) {
+return;
+  }
+
+  //first swap fk and non-fk input and see if we can rewrite them
+  RewritablePKFKJoinInfo fkRemoval = HiveRelOptUtil.isRewritablePKFKJoin(
+  join, nonFkInput, fkInput, call.getMetadataQuery());
+
+  if (fkRemoval.rewritable) {
+// we have established that nonFkInput is FK, and fkInput is PK
+// and there is no row filtering on FK side
+
+// check that FK side join column is distinct (i.e. have a group by)
+ImmutableBitSet fkSideBitSet;
+if (nonFkInput == leftInput) {
+  fkSideBitSet = leftBits;
 } else {
-  RexNode newFilterCond;
-  if (nullableNodes.size() == 1) {
-newFilterCond = 
rexBuilder.makeCall(SqlStdOperatorTable.IS_NOT_NULL, nullableNodes.get(0));
-  } else {
-List isNotNullConds = new ArrayList<>();
-for (RexNode nullableNode : nullableNodes) {
-  
isNotNullConds.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NOT_NULL, 
nullableNode));
+  fkSideBitSet = rightBits;
+}
+
+ImmutableBitSet.Builder fkJoinColBuilder = ImmutableBitSet.builder();
+for (RexNode conj : RelOptUtil.conjunctions(cond)) {
+  if (!conj.isA(SqlKind.EQUALS)) {
+continue;

Review comment:
   @kgyrtkirk If there is any other kind of predicate/condition 
`isRewritablePKFKJoin` will return false. But you are right that the code here 
should return instead of continue. I will update the code. Thanks for pointing 
it out.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475906)
Time Spent: 40m  (was: 0.5h)

> FK side join elimination in presence of PK-FK constraint
> 
>
> Key: HIVE-24087
> URL: https://issues.apache.org/jira/browse/HIVE-24087
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If t

[jira] [Work logged] (HIVE-24087) FK side join elimination in presence of PK-FK constraint

2020-08-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24087?focusedWorklogId=475870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475870
 ]

ASF GitHub Bot logged work on HIVE-24087:
-

Author: ASF GitHub Bot
Created on: 28/Aug/20 15:39
Start Date: 28/Aug/20 15:39
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #1440:
URL: https://github.com/apache/hive/pull/1440#issuecomment-682729013


   Apart from the tests mentioned above, we should add a test where the 
aggregate contains a SUM to see whether that works correctly too (that's the 
pattern seen in the original query).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475870)
Time Spent: 0.5h  (was: 20m)

> FK side join elimination in presence of PK-FK constraint
> 
>
> Key: HIVE-24087
> URL: https://issues.apache.org/jira/browse/HIVE-24087
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If there is PK-FK join FK join could be eliminated by removing FK side if 
> following conditions are met
> * There is no row filtering on FK side.
> * No columns from FK side is required after JOIN.
> * FK join columns are guranteed to be unique (have group by)
> * FK join columns are guranteed to be NOT NULL (either IS NOT NULL filter or 
> constraint)
> *Example*
> {code:sql}
> EXPLAIN 
> SELECT customer_removal_n0.*
> FROM customer_removal_n0
> JOIN
> (SELECT lo_custkey
> FROM lineorder_removal_n0
> WHERE lo_custkey IS NOT NULL
> GROUP BY lo_custkey) fkSide ON fkSide.lo_custkey = 
> customer_removal_n0.c_custkey;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24087) FK side join elimination in presence of PK-FK constraint

2020-08-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24087?focusedWorklogId=475841&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475841
 ]

ASF GitHub Bot logged work on HIVE-24087:
-

Author: ASF GitHub Bot
Created on: 28/Aug/20 14:32
Start Date: 28/Aug/20 14:32
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1440:
URL: https://github.com/apache/hive/pull/1440#discussion_r479342835



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java
##
@@ -213,61 +218,137 @@ public void onMatch(RelOptRuleCall call) {
 
 // 2) Check whether this join can be rewritten or removed
 RewritablePKFKJoinInfo r = HiveRelOptUtil.isRewritablePKFKJoin(
-join, leftInput == fkInput, call.getMetadataQuery());
+join, fkInput,  nonFkInput, call.getMetadataQuery());
 
 // 3) If it is the only condition, we can trigger the rewriting
 if (r.rewritable) {
-  List nullableNodes = r.nullableNodes;
-  // If we reach here, we trigger the transform
-  if (mode == Mode.REMOVE) {
-if (rightInputPotentialFK) {
-  // First, if FK is the right input, we need to shift
-  nullableNodes = nullableNodes.stream()
-  .map(node -> RexUtil.shift(node, 0, 
-leftInput.getRowType().getFieldCount()))
-  .collect(Collectors.toList());
-  topProjExprs = topProjExprs.stream()
-  .map(node -> RexUtil.shift(node, 0, 
-leftInput.getRowType().getFieldCount()))
-  .collect(Collectors.toList());
-}
-// Fix nullability in references to the input node
-topProjExprs = HiveCalciteUtil.fixNullability(rexBuilder, 
topProjExprs, RelOptUtil.getFieldTypeList(fkInput.getRowType()));
-// Trigger transformation
-if (nullableNodes.isEmpty()) {
-  call.transformTo(call.builder()
-  .push(fkInput)
-  .project(topProjExprs)
-  .convert(project.getRowType(), false)
-  .build());
+  rewrite(mode, fkInput, nonFkInput, join, topProjExprs, call, project, 
r.nullableNodes);
+} else {
+  // check if FK side could be removed instead
+
+  // Possibly this could be enhanced to take other join type into 
consideration.
+  if (joinType != JoinRelType.INNER) {
+return;
+  }
+
+  //first swap fk and non-fk input and see if we can rewrite them
+  RewritablePKFKJoinInfo fkRemoval = HiveRelOptUtil.isRewritablePKFKJoin(
+  join, nonFkInput, fkInput, call.getMetadataQuery());
+
+  if (fkRemoval.rewritable) {
+// we have established that nonFkInput is FK, and fkInput is PK
+// and there is no row filtering on FK side
+
+// check that FK side join column is distinct (i.e. have a group by)
+ImmutableBitSet fkSideBitSet;
+if (nonFkInput == leftInput) {
+  fkSideBitSet = leftBits;
 } else {
-  RexNode newFilterCond;
-  if (nullableNodes.size() == 1) {
-newFilterCond = 
rexBuilder.makeCall(SqlStdOperatorTable.IS_NOT_NULL, nullableNodes.get(0));
-  } else {
-List isNotNullConds = new ArrayList<>();
-for (RexNode nullableNode : nullableNodes) {
-  
isNotNullConds.add(rexBuilder.makeCall(SqlStdOperatorTable.IS_NOT_NULL, 
nullableNode));
+  fkSideBitSet = rightBits;
+}
+
+ImmutableBitSet.Builder fkJoinColBuilder = ImmutableBitSet.builder();
+for (RexNode conj : RelOptUtil.conjunctions(cond)) {
+  if (!conj.isA(SqlKind.EQUALS)) {
+continue;

Review comment:
   why do we skip all other kinds which are not `EQUALS`?
   I think instead there should be a return here instead of a continue





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475841)
Time Spent: 20m  (was: 10m)

> FK side join elimination in presence of PK-FK constraint
> 
>
> Key: HIVE-24087
> URL: https://issues.apache.org/jira/browse/HIVE-24087
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If there is PK-FK join FK join could be eliminated by removing FK side if 
> following conditions are met
> * The

[jira] [Work logged] (HIVE-24087) FK side join elimination in presence of PK-FK constraint

2020-08-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24087?focusedWorklogId=475528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475528
 ]

ASF GitHub Bot logged work on HIVE-24087:
-

Author: ASF GitHub Bot
Created on: 27/Aug/20 20:20
Start Date: 27/Aug/20 20:20
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 opened a new pull request #1440:
URL: https://github.com/apache/hive/pull/1440


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 475528)
Remaining Estimate: 0h
Time Spent: 10m

> FK side join elimination in presence of PK-FK constraint
> 
>
> Key: HIVE-24087
> URL: https://issues.apache.org/jira/browse/HIVE-24087
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If there is PK-FK join FK join could be eliminated by removing FK side if 
> following conditions are met
> * There is no row filtering on FK side.
> * No columns from FK side is required after JOIN.
> * FK join columns are guranteed to be unique (have group by)
> * FK join columns are guranteed to be NOT NULL (either IS NOT NULL filter or 
> constraint)
> *Example*
> {code:sql}
> EXPLAIN 
> SELECT customer_removal_n0.*
> FROM customer_removal_n0
> JOIN
> (SELECT lo_custkey
> FROM lineorder_removal_n0
> WHERE lo_custkey IS NOT NULL
> GROUP BY lo_custkey) fkSide ON fkSide.lo_custkey = 
> customer_removal_n0.c_custkey;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)