Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/ --- (Updated April 3, 2015, 8:13 a.m.) Review request for drill and Aman Sinha. Changes --- Updated patch to address review comment from Aman to not use hashing as double for aggregates and only to use it for joins and distribution. Repository: drill-git Description --- If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected Diffs (updated) - exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java PRE-CREATION exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java PRE-CREATION exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956 exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java f7c144f exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2 exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab Diff: https://reviews.apache.org/r/31160/diff/ Testing --- Added unit tests with hash join and merge join Thanks, Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/#review78788 --- Ship it! Updated patch looks ok to me. Couple of suggestions about testing: the testing section says 'added unit tests for HJ and MJ' but I did not see those...I am thinking it is already covered by existing HJ and MJ tests as long as you set slice_target = 1 to force exchanges. Note that TpchDistributed sets slice_target = 1 but most of those plans are HashJoin plans. So you might want to check that out. Also, it would be useful to have a negative test case where we don't expect the hash as double (e.g for aggregates or for non-numeric types). - Aman Sinha On April 3, 2015, 8:13 a.m., Mehant Baid wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/ --- (Updated April 3, 2015, 8:13 a.m.) Review request for drill and Aman Sinha. Repository: drill-git Description --- If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64AsDouble.java PRE-CREATION exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64Functions.java 57154ed exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64FunctionsWithSeed.java b9ec956 exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/Hash64WithSeedAsDouble.java PRE-CREATION exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java 84a2956 exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java f7c144f exec/java-exec/src/main/java/org/apache/drill/exec/resolver/TypeCastRules.java d8652f2 exec/java-exec/src/test/java/org/apache/drill/TestFunctionsQuery.java f1005ab Diff: https://reviews.apache.org/r/31160/diff/ Testing --- Added unit tests with hash join and merge join Thanks, Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/#review72989 --- exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java https://reviews.apache.org/r/31160/#comment119152 This operation was previously done before adding the casts. Doing it after the cast seems fine from functional perspective; however if we are joining say Int to Bigint and the Int happens to be the build side, we will now create a bigint vector, so it will increase the memory footprint. It's probably worth noting this in the comments for future reference. One could argue that if the build and probe were reversed, we would have created the bigint vector anyways. exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java https://reviews.apache.org/r/31160/#comment119116 Since this method is being used by both HJ and MJ, we should change the error message to something more general: Join conditions cannot be compared; left expression: , right expression: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java https://reviews.apache.org/r/31160/#comment119161 It seems like it would be useful to keep this check such that we don't materialize either left or right side if the right side returns NONE. - Aman Sinha On Feb. 18, 2015, 7:21 p.m., Mehant Baid wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/ --- (Updated Feb. 18, 2015, 7:21 p.m.) Review request for drill and Aman Sinha. Repository: drill-git Description --- If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 9d3b8dc exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION Diff: https://reviews.apache.org/r/31160/diff/ Testing --- Added unit tests with hash join and merge join Thanks, Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions
On Feb. 19, 2015, 3:30 a.m., Jacques Nadeau wrote: This isn't legal unless we know that the distribution keys were hashed the same way. Our thinking is that we should make sure that all numeric values hash the same way. For example, all convert to double and then are hashed. We need to evaluate the performance impact. When that isn't possible, we should throw a run time schema change exception and require the user to express the query with an explicit cast. Otherwise we'll return wrong result. Good point..I thought this issue was a follow-up to an older bug DRILL-1125 which was fixed in 0.6. True, the implicit cast join will work only if the data is co-located or for broadcast joins. - Aman --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/#review73080 --- On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/ --- (Updated Feb. 19, 2015, 2:23 a.m.) Review request for drill and Aman Sinha. Repository: drill-git Description --- If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION Diff: https://reviews.apache.org/r/31160/diff/ Testing --- Added unit tests with hash join and merge join Thanks, Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/#review73073 --- Ship it! Ship It! - Aman Sinha On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/ --- (Updated Feb. 19, 2015, 2:23 a.m.) Review request for drill and Aman Sinha. Repository: drill-git Description --- If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION Diff: https://reviews.apache.org/r/31160/diff/ Testing --- Added unit tests with hash join and merge join Thanks, Mehant Baid
Re: Review Request 31160: DRILL-2244: Implicit cast for join conditions
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/#review73080 --- This isn't legal unless we know that the distribution keys were hashed the same way. Our thinking is that we should make sure that all numeric values hash the same way. For example, all convert to double and then are hashed. We need to evaluate the performance impact. When that isn't possible, we should throw a run time schema change exception and require the user to express the query with an explicit cast. Otherwise we'll return wrong result. - Jacques Nadeau On Feb. 19, 2015, 2:23 a.m., Mehant Baid wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31160/ --- (Updated Feb. 19, 2015, 2:23 a.m.) Review request for drill and Aman Sinha. Repository: drill-git Description --- If the return type of the two expressions in the join condition are of different types then we need to inject a cast on one of the sides for comparison functions to work as expected Diffs - exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java ea19645 exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java 04f3bbe exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java 257b93e exec/java-exec/src/test/java/org/apache/drill/TestExampleQueries.java 55fb59f exec/java-exec/src/test/resources/parquetinput/department_decimal.parquet PRE-CREATION exec/java-exec/src/test/resources/parquetinput/employee_decimal.parquet PRE-CREATION Diff: https://reviews.apache.org/r/31160/diff/ Testing --- Added unit tests with hash join and merge join Thanks, Mehant Baid