Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

2015-04-03 Thread Mehant Baid

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/
---

(Updated April 3, 2015, 8:13 a.m.)


Review request for drill and Aman Sinha.


Changes
---

Updated patch to address review comment from Aman to not use hashing as double 
for aggregates and only to use it for joins and distribution.


Repository: drill-git


Description
---

If the return type of the two expressions in the join condition are of 
different types then we need to inject a cast on one of the sides for 
comparison functions to work as expected


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java
 57154ed 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java
 b9ec956 
  
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
 84a2956 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java
 f7c144f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java 
d8652f2 
  exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab 

Diff: https://reviews.apache.org/r/31160/diff/


Testing
---

Added unit tests with hash join and merge join


Thanks,

Mehant Baid



Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

2015-04-03 Thread Aman Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review78788
---

Ship it!


Updated patch looks ok to me.  Couple of suggestions about testing:  the 
testing section says 'added unit tests for HJ and MJ' but I did not see 
those...I am thinking it is already covered by existing HJ and MJ tests as long 
as you set slice_target = 1 to force exchanges. Note that TpchDistributed sets 
slice_target = 1 but most of those plans are HashJoin plans.  So you might want 
to check that out.  Also, it would be useful to have a negative test case where 
we don't expect the hash as double (e.g for aggregates or for non-numeric 
types).

- Aman Sinha


On April 3, 2015, 8:13 a.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31160/
 ---
 
 (Updated April 3, 2015, 8:13 a.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 If the return type of the two expressions in the join condition are of 
 different types then we need to inject a cast on one of the sides for 
 comparison functions to work as expected
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java
  57154ed 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java
  b9ec956 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java
  PRE-CREATION 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
  84a2956 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java
  f7c144f 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java
  d8652f2 
   exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java 
 f1005ab 
 
 Diff: https://reviews.apache.org/r/31160/diff/
 
 
 Testing
 ---
 
 Added unit tests with hash join and merge join
 
 
 Thanks,
 
 Mehant Baid
 




Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

2015-02-18 Thread Aman Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review72989
---



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
https://reviews.apache.org/r/31160/#comment119152

This operation was previously done before adding the casts. Doing it after 
the cast seems fine from functional perspective; however if we are joining say 
Int to Bigint and the Int happens to be the build side, we will now create a 
bigint vector, so it will increase the memory footprint. It's probably worth 
noting this in the comments for future reference.  One could argue that if the 
build and probe were reversed, we would have created the bigint vector anyways.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
https://reviews.apache.org/r/31160/#comment119116

Since this method is being used by both HJ and MJ, we should change the 
error message to something more general: Join conditions cannot be compared; 
left expression: , right expression:



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
https://reviews.apache.org/r/31160/#comment119161

It seems like it would be useful to keep this check such that we don't 
materialize either left or right side if the right side returns NONE.


- Aman Sinha


On Feb. 18, 2015, 7:21 p.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31160/
 ---
 
 (Updated Feb. 18, 2015, 7:21 p.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 If the return type of the two expressions in the join condition are of 
 different types then we need to inject a cast on one of the sides for 
 comparison functions to work as expected
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
  ea19645 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
  04f3bbe 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
  257b93e 
   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 
 9d3b8dc 
   exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet 
 PRE-CREATION 
   exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/31160/diff/
 
 
 Testing
 ---
 
 Added unit tests with hash join and merge join
 
 
 Thanks,
 
 Mehant Baid
 




Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

2015-02-18 Thread Aman Sinha


 On Feb. 19, 2015, 3:30 a.m., Jacques Nadeau wrote:
  This isn't legal unless we know that the distribution keys were hashed the 
  same way.  Our thinking is that we should make sure that all numeric values 
  hash the same way. For example, all convert to double and then are hashed.  
  We need to evaluate the performance impact.  When that isn't possible, we 
  should throw a run time schema change exception and require the user to 
  express the query with an explicit cast.  Otherwise we'll return wrong 
  result.

Good point..I thought this issue was a follow-up to an older bug DRILL-1125 
which was fixed in 0.6. True, the implicit cast join will work only if the data 
is co-located or for broadcast joins.


- Aman


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73080
---


On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31160/
 ---
 
 (Updated Feb. 19, 2015, 2:23 a.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 If the return type of the two expressions in the join condition are of 
 different types then we need to inject a cast on one of the sides for 
 comparison functions to work as expected
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
  ea19645 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
  04f3bbe 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
  257b93e 
   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 
 55fb59f 
   exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet 
 PRE-CREATION 
   exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/31160/diff/
 
 
 Testing
 ---
 
 Added unit tests with hash join and merge join
 
 
 Thanks,
 
 Mehant Baid
 




Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

2015-02-18 Thread Aman Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73073
---

Ship it!


Ship It!

- Aman Sinha


On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31160/
 ---
 
 (Updated Feb. 19, 2015, 2:23 a.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 If the return type of the two expressions in the join condition are of 
 different types then we need to inject a cast on one of the sides for 
 comparison functions to work as expected
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
  ea19645 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
  04f3bbe 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
  257b93e 
   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 
 55fb59f 
   exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet 
 PRE-CREATION 
   exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/31160/diff/
 
 
 Testing
 ---
 
 Added unit tests with hash join and merge join
 
 
 Thanks,
 
 Mehant Baid
 




Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions

2015-02-18 Thread Jacques Nadeau

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31160/#review73080
---


This isn't legal unless we know that the distribution keys were hashed the same 
way.  Our thinking is that we should make sure that all numeric values hash the 
same way. For example, all convert to double and then are hashed.  We need to 
evaluate the performance impact.  When that isn't possible, we should throw a 
run time schema change exception and require the user to express the query with 
an explicit cast.  Otherwise we'll return wrong result.

- Jacques Nadeau


On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/31160/
 ---
 
 (Updated Feb. 19, 2015, 2:23 a.m.)
 
 
 Review request for drill and Aman Sinha.
 
 
 Repository: drill-git
 
 
 Description
 ---
 
 If the return type of the two expressions in the join condition are of 
 different types then we need to inject a cast on one of the sides for 
 comparison functions to work as expected
 
 
 Diffs
 -
 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java
  ea19645 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
  04f3bbe 
   
 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java
  257b93e 
   exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 
 55fb59f 
   exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet 
 PRE-CREATION 
   exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/31160/diff/
 
 
 Testing
 ---
 
 Added unit tests with hash join and merge join
 
 
 Thanks,
 
 Mehant Baid