[jira] [Created] (HIVE-21408) Disable synthetic join predicates for non-equi joins for unintended cases
Deepak Jaiswal created HIVE-21408: - Summary: Disable synthetic join predicates for non-equi joins for unintended cases Key: HIVE-21408 URL: https://issues.apache.org/jira/browse/HIVE-21408 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal With support for synthetic join predicates on non-equi joins, it is important to make sure those predicates are used only for intended purpose. Currently, DPP and semi join reduction are not supposed to use it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 70031: HIVE-21167
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70031/ --- (Updated Feb. 22, 2019, 7:19 a.m.) Review request for hive, Jason Dere and Vaibhav Gumashta. Changes --- Added the union test which identified an issue which is fixed. The followup JIRA to show bucketing version in explain extended is created. https://issues.apache.org/jira/browse/HIVE-21304 Bugs: HIVE-21167 https://issues.apache.org/jira/browse/HIVE-21167 Repository: hive-git Description --- Bucketing: Bucketing version 1 is incorrectly partitioning data Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 4b10e8974e ql/src/test/queries/clientpositive/murmur_hash_migration.q 2b8da9f683 ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 5a2cd47381 ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out 5343628252 Diff: https://reviews.apache.org/r/70031/diff/2/ Changes: https://reviews.apache.org/r/70031/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-21304) Show Bucketing version for ReduceSinkOp in explain extended plan
Deepak Jaiswal created HIVE-21304: - Summary: Show Bucketing version for ReduceSinkOp in explain extended plan Key: HIVE-21304 URL: https://issues.apache.org/jira/browse/HIVE-21304 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal Show Bucketing version for ReduceSinkOp in explain extended plan. This helps identify what hashing algorithm is being used by by ReduceSinkOp. cc [~vgarg] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 70031: HIVE-21167
> On Feb. 21, 2019, 6:29 p.m., Vineet Garg wrote: > > ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out > > Line 1332 (original), 1332 (patched) > > <https://reviews.apache.org/r/70031/diff/1/?file=2126091#file2126091line1332> > > > > Do you know the reason this size changed? This seems strange. The size of one file went down by 2 and another went up by 2. It looks like this bug was hitting the test case. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70031/#review213034 --- On Feb. 21, 2019, 8:59 a.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/70031/ > --- > > (Updated Feb. 21, 2019, 8:59 a.m.) > > > Review request for hive, Jason Dere and Vaibhav Gumashta. > > > Bugs: HIVE-21167 > https://issues.apache.org/jira/browse/HIVE-21167 > > > Repository: hive-git > > > Description > --- > > Bucketing: Bucketing version 1 is incorrectly partitioning data > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 4b10e8974e > ql/src/test/queries/clientpositive/murmur_hash_migration.q 2b8da9f683 > > ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out > 5a2cd47381 > ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out > 5343628252 > > > Diff: https://reviews.apache.org/r/70031/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Review Request 70031: HIVE-21167
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70031/ --- Review request for hive, Jason Dere and Vaibhav Gumashta. Bugs: HIVE-21167 https://issues.apache.org/jira/browse/HIVE-21167 Repository: hive-git Description --- Bucketing: Bucketing version 1 is incorrectly partitioning data Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 4b10e8974e ql/src/test/queries/clientpositive/murmur_hash_migration.q 2b8da9f683 ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 5a2cd47381 ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out 5343628252 Diff: https://reviews.apache.org/r/70031/diff/1/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 69903: HIVE-21214
> On Feb. 5, 2019, 11:50 p.m., Jason Dere wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > > Lines 1876 (patched) > > <https://reviews.apache.org/r/69903/diff/1/?file=2123940#file2123940line1876> > > > > nit: add the filenames to the error message will do. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69903/#review212580 --- On Feb. 5, 2019, 10:10 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69903/ > --- > > (Updated Feb. 5, 2019, 10:10 p.m.) > > > Review request for hive and Jason Dere. > > > Bugs: HIVE-21214 > https://issues.apache.org/jira/browse/HIVE-21214 > > > Repository: hive-git > > > Description > --- > > MoveTask : Use attemptId instead of file size for deduplication of files > compareTempOrDuplicateFiles() > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 8937b43811 > > > Diff: https://reviews.apache.org/r/69903/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 69903: HIVE-21214
> On Feb. 5, 2019, 11:53 p.m., Jason Dere wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > > Line 1829 (original), 1838 (patched) > > <https://reviews.apache.org/r/69903/diff/1/?file=2123940#file2123940line1838> > > > > No "if" - this dedup strategy does not work with speculative execution > > enabled. Based on my understanding these are the two scenarios, 1. speculative execution succeeds, it has attempt ID 1. The original attempt ID is 0. The logic picks speculative one, regardless of original one's outcome. This works fine. 2. speculative execution fails, throws exception. Let me know I am getting it wrong. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69903/#review212581 ------- On Feb. 5, 2019, 10:10 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69903/ > --- > > (Updated Feb. 5, 2019, 10:10 p.m.) > > > Review request for hive and Jason Dere. > > > Bugs: HIVE-21214 > https://issues.apache.org/jira/browse/HIVE-21214 > > > Repository: hive-git > > > Description > --- > > MoveTask : Use attemptId instead of file size for deduplication of files > compareTempOrDuplicateFiles() > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 8937b43811 > > > Diff: https://reviews.apache.org/r/69903/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Review Request 69903: HIVE-21214
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69903/ --- Review request for hive and Jason Dere. Bugs: HIVE-21214 https://issues.apache.org/jira/browse/HIVE-21214 Repository: hive-git Description --- MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles() Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 8937b43811 Diff: https://reviews.apache.org/r/69903/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-21214) MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles()
Deepak Jaiswal created HIVE-21214: - Summary: MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles() Key: HIVE-21214 URL: https://issues.apache.org/jira/browse/HIVE-21214 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal For a given task, if there is more than one attempt then deduplication logic kicks in. {noformat} Utilities.compareTempOrDuplicateFiles(){noformat} The logic uses file size and picks the one with largest size. This logic is very fragile. ideally, it should pick the successful attempt's file. However, a simpler solution is to pick the newest attempt and also checking the file size for the newest attempt is the largest. If not, throw an exception. cc [~gopalv] [~thejas] [~jdere] [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21196) Support semijoin reduction on multiple column join
Deepak Jaiswal created HIVE-21196: - Summary: Support semijoin reduction on multiple column join Key: HIVE-21196 URL: https://issues.apache.org/jira/browse/HIVE-21196 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal Currently for a query involving join on multiple columns creates separate semi join edges for each key which in turn create a bloom filter for each of them, like below, EXPLAIN select count(*) from srcpart_date_n7 join srcpart_small_n3 on (srcpart_date_n7.key = srcpart_small_n3.key1 and srcpart_date_n7.value = srcpart_small_n3.value1) {code:java} Map 1 <- Reducer 5 (BROADCAST_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE) Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE) A masked pattern was here Vertices: Map 1 Map Operator Tree: TableScan alias: srcpart_date_n7 filterExpr: (key is not null and value is not null and (key BETWEEN DynamicValue(RS_7_srcpart_small_n3_key1_min) AND DynamicValue(RS_7_srcpart_small_n3_key1_max) and in_bloom_filter(key, DynamicValue(RS_7_srcpart_small_n3_key1_bloom_filter (type: boolean) Statistics: Num rows: 2000 Data size: 356000 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((key BETWEEN DynamicValue(RS_7_srcpart_small_n3_key1_min) AND DynamicValue(RS_7_srcpart_small_n3_key1_max) and in_bloom_filter(key, DynamicValue(RS_7_srcpart_small_n3_key1_bloom_filter))) and key is not null and value is not null) (type: boolean) Statistics: Num rows: 2000 Data size: 356000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: string), value (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 2000 Data size: 356000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string) sort order: ++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string) Statistics: Num rows: 2000 Data size: 356000 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized, llap LLAP IO: all inputs Map 4 Map Operator Tree: TableScan alias: srcpart_small_n3 filterExpr: (key1 is not null and value1 is not null) (type: boolean) Statistics: Num rows: 20 Data size: 3560 Basic stats: PARTIAL Column stats: PARTIAL Filter Operator predicate: (key1 is not null and value1 is not null) (type: boolean) Statistics: Num rows: 20 Data size: 3560 Basic stats: PARTIAL Column stats: PARTIAL Select Operator expressions: key1 (type: string), value1 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 3560 Basic stats: PARTIAL Column stats: PARTIAL Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string) sort order: ++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string) Statistics: Num rows: 20 Data size: 3560 Basic stats: PARTIAL Column stats: PARTIAL Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 20 Data size: 3560 Basic stats: PARTIAL Column stats: PARTIAL Group By Operator aggregations: min(_col0), max(_col0), bloom_filter(_col0, expectedEntries=20) mode: hash outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1 Data size: 730 Basic stats: PARTIAL Column stats: PARTIAL Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 730 Basic stats: PARTIAL Column stats: PARTIAL value expressions: _col0 (type: string), _col1 (type: string), _col2 (type: binary) Execution mode: vectorized, llap LLAP IO: all inputs Reducer 2 Execution mode: llap Reduce Operator Tree: Merge Join Operator c
Re: Review Request 69663: HIVE-16976
> On Jan. 11, 2019, 6:07 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 35 (patched) > > <https://reviews.apache.org/r/69663/diff/2/?file=2118652#file2118652line35> > > > > This is not needed? Yes. That is correct. I will remove it before committing. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69663/#review211896 ------- On Jan. 9, 2019, 5:50 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69663/ > --- > > (Updated Jan. 9, 2019, 5:50 p.m.) > > > Review request for hive, Ashutosh Chauhan, Gopal V, Jesús Camacho Rodríguez, > and Jason Dere. > > > Bugs: HIVE-16976 > https://issues.apache.org/jira/browse/HIVE-16976 > > > Repository: hive-git > > > Description > --- > > DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN > > The patch supports predicates on non-equi joins and provides an interface for > storage handler to decide if it can use this optimization. > Work to integrate this with DPP and semijoin will be done in separate JIRA. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d7f069eaa7 > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java > 2ebb149354 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java > a1401aac72 > ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java f8c7e18eb1 > ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicListDesc.java > 676dfc9421 > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > e97e44796f > ql/src/test/results/clientpositive/llap/cross_prod_1.q.out f900a01be4 > ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out > de74af6dff > ql/src/test/results/clientpositive/llap/semijoin.q.out 00bc6cec55 > ql/src/test/results/clientpositive/llap/subquery_in.q.out 07cc4dbabc > ql/src/test/results/clientpositive/llap/subquery_notin.q.out 29d8bbfb48 > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 1cf281afbd > ql/src/test/results/clientpositive/llap/subquery_select.q.out 6255abdd70 > > > Diff: https://reviews.apache.org/r/69663/diff/2/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 69663: HIVE-16976
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69663/ --- (Updated Jan. 9, 2019, 5:50 p.m.) Review request for hive, Ashutosh Chauhan, Gopal V, Jesús Camacho Rodríguez, and Jason Dere. Changes --- Updated the patch with review comments. Bugs: HIVE-16976 https://issues.apache.org/jira/browse/HIVE-16976 Repository: hive-git Description --- DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN The patch supports predicates on non-equi joins and provides an interface for storage handler to decide if it can use this optimization. Work to integrate this with DPP and semijoin will be done in separate JIRA. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d7f069eaa7 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java 2ebb149354 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java a1401aac72 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java f8c7e18eb1 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicListDesc.java 676dfc9421 ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java e97e44796f ql/src/test/results/clientpositive/llap/cross_prod_1.q.out f900a01be4 ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out de74af6dff ql/src/test/results/clientpositive/llap/semijoin.q.out 00bc6cec55 ql/src/test/results/clientpositive/llap/subquery_in.q.out 07cc4dbabc ql/src/test/results/clientpositive/llap/subquery_notin.q.out 29d8bbfb48 ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 1cf281afbd ql/src/test/results/clientpositive/llap/subquery_select.q.out 6255abdd70 Diff: https://reviews.apache.org/r/69663/diff/2/ Changes: https://reviews.apache.org/r/69663/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 69663: HIVE-16976
> On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 284 (patched) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line288> > > > > We should add a call to extended version here as we did above for > > equality predicates. The only required change seems to be in > > _addParentReduceSink_ called from _createDerivatives_, which would receive > > the comparison operator from here. All the rest should already work as > > expected. > > > > I believe this could be addressed in this JIRA since it is not a lot of > > code. However, if it is not addressed, please create follow-up and leave a > > TODO. > > Deepak Jaiswal wrote: > Will add the extended version. Thanks for bringing this up. The existing logic for extension works for equality. I am planning to do this later. HIVE-21098 tracks it. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69663/#review211725 ------- On Jan. 3, 2019, 8:39 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69663/ > --- > > (Updated Jan. 3, 2019, 8:39 p.m.) > > > Review request for hive, Ashutosh Chauhan, Gopal V, Jesús Camacho Rodríguez, > and Jason Dere. > > > Bugs: HIVE-16976 > https://issues.apache.org/jira/browse/HIVE-16976 > > > Repository: hive-git > > > Description > --- > > DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN > > The patch supports predicates on non-equi joins and provides an interface for > storage handler to decide if it can use this optimization. > Work to integrate this with DPP and semijoin will be done in separate JIRA. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java > 2ebb149354 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java > a1401aac72 > ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java f8c7e18eb1 > ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicListDesc.java > 676dfc9421 > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > e97e44796f > ql/src/test/results/clientpositive/llap/cross_prod_1.q.out ac1f4eabd8 > ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out > de74af6dff > ql/src/test/results/clientpositive/llap/semijoin.q.out 63a270e57d > ql/src/test/results/clientpositive/llap/subquery_in.q.out 07cc4dbabc > ql/src/test/results/clientpositive/llap/subquery_notin.q.out 29d8bbfb48 > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out e830835445 > ql/src/test/results/clientpositive/llap/subquery_select.q.out d3cc980ca1 > > > Diff: https://reviews.apache.org/r/69663/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
[jira] [Created] (HIVE-21098) DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN needs extension
Deepak Jaiswal created HIVE-21098: - Summary: DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN needs extension Key: HIVE-21098 URL: https://issues.apache.org/jira/browse/HIVE-21098 Project: Hive Issue Type: Bug Components: Hive Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal SyntheticJoinPredicates are supported for equality. Both in regular and extended format. Similar extended format is needed for non-equi joins too. See HIVE-16976 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69663: HIVE-16976
> On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 330 (patched) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line334> > > > > Should we still return null if function text is not recognized? > > Deepak Jaiswal wrote: > Yes, that helps recognize unsupported functions. For eg, > > ExprNodeGenericFuncDesc funcDesc = (ExprNodeGenericFuncDesc) > filter; > // filter should be of type <, >, <= or >= > if (getFuncText(funcDesc.getFuncText(), 1) == null) { > // unsupported > continue; > } > > I am open to better ways, hence the TODO. > > Jesús Camacho Rodríguez wrote: > Sorry, I did not express myself properly. Within the if (srcPos == 0) {, > shouldn't we return null if function text is not recognized (similar to what > we do below that you pointed out)? > > Deepak Jaiswal wrote: > That would require verifying the function text which is done in switch > case anyway. > Inorder for non-equi join's synthetic joins to work properly, if the > switch case cant get a valid inversion text then it is not supported. > That is why I used "1" to make sure it goes through the switch case. This > eliminates duplicating similar logic. > > Jesús Camacho Rodríguez wrote: > OK, I was getting confused by the semantics of the srcPos parameter (an > 'invert' boolean would have been clearer). > Tbh, I think it is better to create two methods: one internal in > SyntheticJoinPredicate that would return whether a function is supported or > not, and a utility method in FunctionRegistry that would return the inverse > of a given function. Overhead is neglibible and there will be clear different > semantics. Having two functions could create a maintenance headache. As mentioned above, the function will go to FunctionRegistry. There is already a comment, return null; // helps identify unsupported functions I can expand the comment to make things clearer. Leaving the function as it is keeps things short and sweet and involves much less maintenance. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69663/#review211725 --- On Jan. 3, 2019, 8:39 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69663/ > --- > > (Updated Jan. 3, 2019, 8:39 p.m.) > > > Review request for hive, Ashutosh Chauhan, Gopal V, Jesús Camacho Rodríguez, > and Jason Dere. > > > Bugs: HIVE-16976 > https://issues.apache.org/jira/browse/HIVE-16976 > > > Repository: hive-git > > > Description > --- > > DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN > > The patch supports predicates on non-equi joins and provides an interface for > storage handler to decide if it can use this optimization. > Work to integrate this with DPP and semijoin will be done in separate JIRA. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java > 2ebb149354 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java > a1401aac72 > ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java f8c7e18eb1 > ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicListDesc.java > 676dfc9421 > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > e97e44796f > ql/src/test/results/clientpositive/llap/cross_prod_1.q.out ac1f4eabd8 > ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out > de74af6dff > ql/src/test/results/clientpositive/llap/semijoin.q.out 63a270e57d > ql/src/test/results/clientpositive/llap/subquery_in.q.out 07cc4dbabc > ql/src/test/results/clientpositive/llap/subquery_notin.q.out 29d8bbfb48 > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out e830835445 > ql/src/test/results/clientpositive/llap/subquery_select.q.out d3cc980ca1 > > > Diff: https://reviews.apache.org/r/69663/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 69663: HIVE-16976
> On Jan. 7, 2019, 10:40 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 254 (patched) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line258> > > > > srcPos and targetPos do not seem to refer to function, but rather the > > inputs being joined. In addition, they do not change between loop > > iterations. Below, they are used to retrieve the child expression from the > > function, which does not seem correct. > > Jesús Camacho Rodríguez wrote: > OK, seeing your comment above, I understood better the code here. You may > be inverting the source and target, that is why you access the function > expression using them. Could you leave a comment explaining it? > My comment above about the value change for srcPos and targetPos between > iterations still seems valid, the check could be done before to skip the the > loop in line 242. Thanks for the tip. Yes, it makes sense to have the check before the loop begins as srcPos and targetPos do not change. We can skip the whole logic with this condition even before the if condition. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69663/#review211745 ------- On Jan. 3, 2019, 8:39 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69663/ > --- > > (Updated Jan. 3, 2019, 8:39 p.m.) > > > Review request for hive, Ashutosh Chauhan, Gopal V, Jesús Camacho Rodríguez, > and Jason Dere. > > > Bugs: HIVE-16976 > https://issues.apache.org/jira/browse/HIVE-16976 > > > Repository: hive-git > > > Description > --- > > DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN > > The patch supports predicates on non-equi joins and provides an interface for > storage handler to decide if it can use this optimization. > Work to integrate this with DPP and semijoin will be done in separate JIRA. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java > 2ebb149354 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java > a1401aac72 > ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java f8c7e18eb1 > ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicListDesc.java > 676dfc9421 > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > e97e44796f > ql/src/test/results/clientpositive/llap/cross_prod_1.q.out ac1f4eabd8 > ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out > de74af6dff > ql/src/test/results/clientpositive/llap/semijoin.q.out 63a270e57d > ql/src/test/results/clientpositive/llap/subquery_in.q.out 07cc4dbabc > ql/src/test/results/clientpositive/llap/subquery_notin.q.out 29d8bbfb48 > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out e830835445 > ql/src/test/results/clientpositive/llap/subquery_select.q.out d3cc980ca1 > > > Diff: https://reviews.apache.org/r/69663/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 69663: HIVE-16976
> On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Line 182 (original) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line183> > > > > Can you bring this back and delete 'if (sourceKeys.size() > 0) {' > > below? This is just a style change and indenting so many lines will just > > make more difficult following code provenance. > > Deepak Jaiswal wrote: > The continue is removed so that it reaches the residualFilter logic, > otherwise it would skip everything and move on to next target. > > Jesús Camacho Rodríguez wrote: > You are right, I did not see the extra }. Could the comment '//if > (sourceKeys.size() < 1) continue;' below be removed then? No need to leave it > there. Sure. I forgot to remove it. > On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 330 (patched) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line334> > > > > Should we still return null if function text is not recognized? > > Deepak Jaiswal wrote: > Yes, that helps recognize unsupported functions. For eg, > > ExprNodeGenericFuncDesc funcDesc = (ExprNodeGenericFuncDesc) > filter; > // filter should be of type <, >, <= or >= > if (getFuncText(funcDesc.getFuncText(), 1) == null) { > // unsupported > continue; > } > > I am open to better ways, hence the TODO. > > Jesús Camacho Rodríguez wrote: > Sorry, I did not express myself properly. Within the if (srcPos == 0) {, > shouldn't we return null if function text is not recognized (similar to what > we do below that you pointed out)? That would require verifying the function text which is done in switch case anyway. Inorder for non-equi join's synthetic joins to work properly, if the switch case cant get a valid inversion text then it is not supported. That is why I used "1" to make sure it goes through the switch case. This eliminates duplicating similar logic. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69663/#review211725 --- On Jan. 3, 2019, 8:39 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69663/ > --- > > (Updated Jan. 3, 2019, 8:39 p.m.) > > > Review request for hive, Ashutosh Chauhan, Gopal V, Jesús Camacho Rodríguez, > and Jason Dere. > > > Bugs: HIVE-16976 > https://issues.apache.org/jira/browse/HIVE-16976 > > > Repository: hive-git > > > Description > --- > > DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN > > The patch supports predicates on non-equi joins and provides an interface for > storage handler to decide if it can use this optimization. > Work to integrate this with DPP and semijoin will be done in separate JIRA. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java > 2ebb149354 > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java > a1401aac72 > ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java f8c7e18eb1 > ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicListDesc.java > 676dfc9421 > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > e97e44796f > ql/src/test/results/clientpositive/llap/cross_prod_1.q.out ac1f4eabd8 > ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out > de74af6dff > ql/src/test/results/clientpositive/llap/semijoin.q.out 63a270e57d > ql/src/test/results/clientpositive/llap/subquery_in.q.out 07cc4dbabc > ql/src/test/results/clientpositive/llap/subquery_notin.q.out 29d8bbfb48 > ql/src/test/results/clientpositive/llap/subquery_scalar.q.out e830835445 > ql/src/test/results/clientpositive/llap/subquery_select.q.out d3cc980ca1 > > > Diff: https://reviews.apache.org/r/69663/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 69663: HIVE-16976
> On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > Can we add some tests for the new feature? The reason there is no test yet is because it does nothing end to end. Both DPP route and semijoin reduction route dont process the predicate yet. > On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Line 182 (original) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line183> > > > > Can you bring this back and delete 'if (sourceKeys.size() > 0) {' > > below? This is just a style change and indenting so many lines will just > > make more difficult following code provenance. The continue is removed so that it reaches the residualFilter logic, otherwise it would skip everything and move on to next target. > On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 284 (patched) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line288> > > > > We should add a call to extended version here as we did above for > > equality predicates. The only required change seems to be in > > _addParentReduceSink_ called from _createDerivatives_, which would receive > > the comparison operator from here. All the rest should already work as > > expected. > > > > I believe this could be addressed in this JIRA since it is not a lot of > > code. However, if it is not addressed, please create follow-up and leave a > > TODO. Will add the extended version. Thanks for bringing this up. > On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 318 (patched) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line322> > > > > return colExprMap.get(rsColName) :| > On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 328 (patched) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line332> > > > > Can we move this method as _invertFunction_ utility method to > > _FunctionRegistry.java_? > > > > In addition, instead of relying on the function text, I believe it > > would be more robust to have the UDF as the input. In particular, we can > > use _funcDesc.getGenericUDF();_ when calling this method, then rely in e.g. > > _udf instanceof GenericUDFOPEqualOrGreaterThan_ for the checks. Yes, I can move this. The reason I used function text is because I can use switch case and also much faster. Otherwise, once extended in future, this could become a giant mess of if...else statements. We can discuss this further. > On Jan. 7, 2019, 4:41 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > > Lines 330 (patched) > > <https://reviews.apache.org/r/69663/diff/1/?file=2117432#file2117432line334> > > > > Should we still return null if function text is not recognized? Yes, that helps recognize unsupported functions. For eg, ExprNodeGenericFuncDesc funcDesc = (ExprNodeGenericFuncDesc) filter; // filter should be of type <, >, <= or >= if (getFuncText(funcDesc.getFuncText(), 1) == null) { // unsupported continue; } I am open to better ways, hence the TODO. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69663/#review211725 --- On Jan. 3, 2019, 8:39 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69663/ > --- > > (Updated Jan. 3, 2019, 8:39 p.m.) > > > Review request for hive, Ashutosh Chauhan, Gopal V, Jesús Camacho Rodríguez, > and Jason Dere. > > > Bugs: HIVE-16976 > https://issues.apache.org/jira/browse/HIVE-16976 > > > Repository: hive-git > > > Description > --- > > DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN > > The patch supports predicates on non-equi joins and provides an interface for > storage handler to decide if it can use this optimization. > Work to integrate this with DPP and semijoin will be done in se
Review Request 69663: HIVE-16976
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69663/ --- Review request for hive, Ashutosh Chauhan, Gopal V, Jesús Camacho Rodríguez, and Jason Dere. Bugs: HIVE-16976 https://issues.apache.org/jira/browse/HIVE-16976 Repository: hive-git Description --- DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN The patch supports predicates on non-equi joins and provides an interface for storage handler to decide if it can use this optimization. Work to integrate this with DPP and semijoin will be done in separate JIRA. Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStorageHandler.java 2ebb149354 ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java a1401aac72 ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java f8c7e18eb1 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDynamicListDesc.java 676dfc9421 ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java e97e44796f ql/src/test/results/clientpositive/llap/cross_prod_1.q.out ac1f4eabd8 ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out de74af6dff ql/src/test/results/clientpositive/llap/semijoin.q.out 63a270e57d ql/src/test/results/clientpositive/llap/subquery_in.q.out 07cc4dbabc ql/src/test/results/clientpositive/llap/subquery_notin.q.out 29d8bbfb48 ql/src/test/results/clientpositive/llap/subquery_scalar.q.out e830835445 ql/src/test/results/clientpositive/llap/subquery_select.q.out d3cc980ca1 Diff: https://reviews.apache.org/r/69663/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-20868) SMB Join fails intermittently when TezDummyOperator has child op in getFinalOp in MapRecordProcessor
Deepak Jaiswal created HIVE-20868: - Summary: SMB Join fails intermittently when TezDummyOperator has child op in getFinalOp in MapRecordProcessor Key: HIVE-20868 URL: https://issues.apache.org/jira/browse/HIVE-20868 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New PMC Member : Zoltan
Congratulations Zoltan! On 10/30/18, 10:08 PM, "Ashutosh Chauhan" wrote: Hello Hive community, I'm pleased to announce that Zoltan Haindrich has accepted the Apache Hive PMC's invitation, and is our newest PMC member. Many thanks to Zoltan for all of his hard work. Please join me in congratulating Zoltan! Thanks, Ashutosh
Re: [ANNOUNCE] New committer: Nishant Bangarwa
Congratulations! On 10/19/18, 10:14 AM, "Vineet Garg" wrote: Congrats Nishant! > On Oct 19, 2018, at 8:36 AM, Gunther Hagleitner wrote: > > Congrats Nishant! > > Cheers, > Gunther. > > From: Andrew Sherman > Sent: Friday, October 19, 2018 8:34 AM > To: dev@hive.apache.org > Subject: Re: [ANNOUNCE] New committer: Nishant Bangarwa > > Congratulations Nishant! > > On Fri, Oct 19, 2018 at 4:29 AM Peter Vary > wrote: > >> Congratulations Nishant! >> >>> On Oct 19, 2018, at 07:42, Sankar Hariappan >> wrote: >>> >>> Congrats Nishant! >>> >>> Best regards >>> Sankar >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On 15/10/18, 12:45 PM, "Ashutosh Chauhan" wrote: >>> Apache Hive's Project Management Committee (PMC) has invited Nishant Bangarwa to become a committer, and we are pleased to announce that he has >> accepted. Nishant, welcome, thank you for your contributions, and we look forward >> your further interactions with the community! Ashutosh Chauhan (on behalf of the Apache Hive PMC) >> >> > >
Re: [ANNOUNCE] New committer: Janaki Lahorani
Congratulations Janaki. On 10/9/18, 8:52 AM, "Vihang Karajgaonkar" wrote: Congratulations Janaki! On Tue, Oct 9, 2018 at 8:27 AM Andrew Sherman wrote: > Congratulations Janaki! > > On Mon, Oct 8, 2018 at 10:05 PM Ashutosh Chauhan > wrote: > > > Apache Hive's Project Management Committee (PMC) has invited Janaki > > Lahorani to become a committer, and we are pleased to announce that she > has > > accepted. > > Janaki, welcome, thank you for your contributions, and we look forward to > > your further interactions with the community! > > > > Ashutosh Chauhan (on behalf of the Apache Hive PMC) > > >
[jira] [Created] (HIVE-20641) load_data_using_job is failing
Deepak Jaiswal created HIVE-20641: - Summary: load_data_using_job is failing Key: HIVE-20641 URL: https://issues.apache.org/jira/browse/HIVE-20641 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal load_data_using_job is failing due to result diff. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Review Request 68848: HIV E-20540
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68848/ --- Review request for hive and Gopal V. Bugs: HIVE-20540 https://issues.apache.org/jira/browse/HIVE-20540 Repository: hive-git Description --- Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer - II Followup to HIVE-20510 with remaining issues 1. Avoid using Reflection. 2. In VectorizationContext, use correct place to setup the VectorExpression. It may be missed in certain cases. 3. In BucketNumExpression, make sure that a value is not overwritten before it is processed. Use a flag to achieve this. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 55d2a16f03 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java d8c696c302 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java 1a8395a71b Diff: https://reviews.apache.org/r/68848/diff/1/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68772: HIVE-20593
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68772/ --- (Updated Sept. 24, 2018, 6:56 a.m.) Review request for hive and Eugene Koifman. Changes --- Implemented changes recommended. Got green run on ptests. Bugs: HIVE-20593 https://issues.apache.org/jira/browse/HIVE-20593 Repository: hive-git Description --- Load Data for partitioned ACID tables fails with bucketId out of range: -1 The tempTblObj is inherited from target table. However, the only table property which needs to be inherited is bucketing version. Properties like transactional etc should be ignored. Diffs (updated) - data/files/load_data_job_acid/20180918230307-b382b8c7-271c-4025-be64-4a68f4db32e5_0_0 PRE-CREATION data/files/load_data_job_acid/20180918230307-b382b8c7-271c-4025-be64-4a68f4db32e5_1_0 PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 8d33cf5b23 ql/src/test/queries/clientpositive/load_data_using_job.q b760d9bc7e ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 21fd9334ea Diff: https://reviews.apache.org/r/68772/diff/2/ Changes: https://reviews.apache.org/r/68772/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Review Request 68772: HIVE-20593
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68772/ --- Review request for hive and Eugene Koifman. Bugs: HIVE-20593 https://issues.apache.org/jira/browse/HIVE-20593 Repository: hive-git Description --- Load Data for partitioned ACID tables fails with bucketId out of range: -1 The tempTblObj is inherited from target table. However, the only table property which needs to be inherited is bucketing version. Properties like transactional etc should be ignored. Diffs - data/files/load_data_job_acid/20180918230307-b382b8c7-271c-4025-be64-4a68f4db32e5_0_0 PRE-CREATION data/files/load_data_job_acid/20180918230307-b382b8c7-271c-4025-be64-4a68f4db32e5_1_0 PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 8d33cf5b23 ql/src/test/queries/clientpositive/load_data_using_job.q b760d9bc7e ql/src/test/results/clientpositive/llap/load_data_using_job.q.out 21fd9334ea Diff: https://reviews.apache.org/r/68772/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-20593) Load Data for partitioned ACID tables fails with bucketId out of range: -1
Deepak Jaiswal created HIVE-20593: - Summary: Load Data for partitioned ACID tables fails with bucketId out of range: -1 Key: HIVE-20593 URL: https://issues.apache.org/jira/browse/HIVE-20593 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal Load data for ACID tables is failing to load ORC files when it is converted to IAS job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20540) Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer - II
Deepak Jaiswal created HIVE-20540: - Summary: Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer - II Key: HIVE-20540 URL: https://issues.apache.org/jira/browse/HIVE-20540 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal Followup to HIVE-20510 with remaining issues, 1. Avoid using Reflection. 2. In VectorizationContext, use correct place to setup the VectorExpression. It may be missed in certain cases. 3. In BucketNumExpression, make sure that a value is not overwritten before it is processed. Use a flag to achieve this. cc [~gopalv] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Jdbc tests failing randomly
Hi All, It seems the jdbc UTs are failing randomly. Any idea who might know about them? https://builds.apache.org/job/PreCommit-HIVE-Build/13661/testReport https://builds.apache.org/job/PreCommit-HIVE-Build/13658/testReport Regards, Deepak
Re: Review Request 68648: HIVE-20510
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68648/ --- (Updated Sept. 8, 2018, 8:38 a.m.) Review request for hive, Gopal V and Matt McCline. Changes --- Updated results for failing tests. Bugs: HIVE-20510 https://issues.apache.org/jira/browse/HIVE-20510 Repository: hive-git Description --- Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer. Added a new VectorExpression BucketNumberExpression to evaluate _bucket_number. Made the loops as tight as possible. Diffs (updated) - itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out 74a9a56f07 itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out ee02c36f03 ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 8bf0a9c77d ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java a2a9c8421e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 57f7c0108e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java 5ab59c9c61 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 51010aac85 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBucketNumber.java PRE-CREATION ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q 435cdaddd0 ql/src/test/results/clientpositive/dynpart_sort_optimization_acid2.q.out aea757205f ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 22f0a31eb3 ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out 21fc2c545a ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out a0a5e0cf32 ql/src/test/results/clientpositive/show_functions.q.out 90608e2905 Diff: https://reviews.apache.org/r/68648/diff/4/ Changes: https://reviews.apache.org/r/68648/diff/3-4/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68648: HIVE-20510
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68648/ --- (Updated Sept. 7, 2018, 7:57 p.m.) Review request for hive, Gopal V and Matt McCline. Changes --- Missed non-vectorized case and some result updates. Bugs: HIVE-20510 https://issues.apache.org/jira/browse/HIVE-20510 Repository: hive-git Description --- Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer. Added a new VectorExpression BucketNumberExpression to evaluate _bucket_number. Made the loops as tight as possible. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 8bf0a9c77d ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java a2a9c8421e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 57f7c0108e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java 5ab59c9c61 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 51010aac85 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBucketNumber.java PRE-CREATION ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q 435cdaddd0 ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 22f0a31eb3 ql/src/test/results/clientpositive/llap/dynpart_sort_optimization.q.out 21fc2c545a ql/src/test/results/clientpositive/llap/dynpart_sort_optimization_acid.q.out a0a5e0cf32 Diff: https://reviews.apache.org/r/68648/diff/3/ Changes: https://reviews.apache.org/r/68648/diff/2-3/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68648: HIVE-20510
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68648/ --- (Updated Sept. 7, 2018, 6:42 p.m.) Review request for hive, Gopal V and Matt McCline. Changes --- Addressed concerns from Matt's review. Replaced the constant string _bucket_number with a UDF GenericUDFBucketNumber() to make sure _bucket_number could be uaed as a legitimate string in queries. Bugs: HIVE-20510 https://issues.apache.org/jira/browse/HIVE-20510 Repository: hive-git Description --- Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer. Added a new VectorExpression BucketNumberExpression to evaluate _bucket_number. Made the loops as tight as possible. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 8bf0a9c77d ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 57f7c0108e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java 5ab59c9c61 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java 51010aac85 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBucketNumber.java PRE-CREATION ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q 435cdaddd0 ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 22f0a31eb3 Diff: https://reviews.apache.org/r/68648/diff/2/ Changes: https://reviews.apache.org/r/68648/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Review Request 68648: HIVE-20510
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68648/ --- Review request for hive, Gopal V and Matt McCline. Bugs: HIVE-20510 https://issues.apache.org/jira/browse/HIVE-20510 Repository: hive-git Description --- Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer. Added a new VectorExpression BucketNumberExpression to evaluate _bucket_number. Made the loops as tight as possible. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 57f7c0108e ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/BucketNumExpression.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/vector/reducesink/VectorReduceSinkObjectHashOperator.java 5ab59c9c61 ql/src/test/queries/clientpositive/dynpart_sort_opt_vectorization.q 435cdaddd0 ql/src/test/results/clientpositive/llap/dynpart_sort_opt_vectorization.q.out 22f0a31eb3 Diff: https://reviews.apache.org/r/68648/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-20510) Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer
Deepak Jaiswal created HIVE-20510: - Summary: Vectorization : Support loading bucketed tables using sorted dynamic partition optimizer Key: HIVE-20510 URL: https://issues.apache.org/jira/browse/HIVE-20510 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal sorted dynamic partition optimizer does not work on bucketed tables when vectorization is enabled. cc [~mmccline] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20508) Hive does not support user names of type "user@realm"
Deepak Jaiswal created HIVE-20508: - Summary: Hive does not support user names of type "user@realm" Key: HIVE-20508 URL: https://issues.apache.org/jira/browse/HIVE-20508 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal Hive does not support user names of type "user@realm". This causes authentication problem for user names containing email ids in Kerberos environment. cc [~thejas] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [ANNOUNCE] New committer: Andrew Sherman
Congratulation Andrew. Deepak On 9/3/18, 10:17 PM, "Zoltan Haindrich" wrote: Congratulations Andrew! On 2 September 2018 04:49:00 CEST, Lefty Leverenz wrote: >Congratulations Andrew! > >-- Lefty > > >On Tue, Aug 28, 2018 at 11:36 AM Ashutosh Chauhan > >wrote: > >> Apache Hive's Project Management Committee (PMC) has invited Andrew >Sherman >> to become a committer, and we are pleased to announce that he has >accepted. >> >> Andrew, welcome, thank you for your contributions, and we look >forward to >> your >> further interactions with the community! >> >> Ashutosh Chauhan (on behalf of the Apache Hive PMC) >>
Re: Review Request 68506: HIVE-20187
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68506/ --- (Updated Aug. 25, 2018, 6:22 a.m.) Review request for hive and Gunther Hagleitner. Changes --- Updated results. Bugs: HIVE-20187 https://issues.apache.org/jira/browse/HIVE-20187 Repository: hive-git Description --- Incorrect query results in hive when hive.convert.join.bucket.mapjoin.tez is set to true In some cases, Bucket mapjoin is incorrectly selected which leads to wrong results. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java 89db530f54 ql/src/test/queries/clientpositive/bucket_map_join_tez2.q adcf6962ab ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 4f042cee50 ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 4fc1419acd ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 2e8d5f375f ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 9929989f0e ql/src/test/results/clientpositive/spark/bucket_map_join_tez2.q.out 243cbc3428 Diff: https://reviews.apache.org/r/68506/diff/2/ Changes: https://reviews.apache.org/r/68506/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Review Request 68506: HIVE-20187
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68506/ --- Review request for hive and Gunther Hagleitner. Bugs: HIVE-20187 https://issues.apache.org/jira/browse/HIVE-20187 Repository: hive-git Description --- Incorrect query results in hive when hive.convert.join.bucket.mapjoin.tez is set to true In some cases, Bucket mapjoin is incorrectly selected which leads to wrong results. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java 89db530f54 ql/src/test/queries/clientpositive/bucket_map_join_tez2.q adcf6962ab ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 4f042cee50 ql/src/test/results/clientpositive/llap/tez_smb_main.q.out 9929989f0e Diff: https://reviews.apache.org/r/68506/diff/1/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68476: HIVE-20433
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68476/ --- (Updated Aug. 23, 2018, 8:05 p.m.) Review request for hive, Ashutosh Chauhan and Gopal V. Changes --- Fixed ptest failures. Bugs: HIVE-20433 https://issues.apache.org/jira/browse/HIVE-20433 Repository: hive-git Description --- Implicit String to Timestamp conversion is slow Diffs (updated) - serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java 8a057d1dab Diff: https://reviews.apache.org/r/68476/diff/2/ Changes: https://reviews.apache.org/r/68476/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Review Request 68476: HIVE-20433
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68476/ --- Review request for hive, Ashutosh Chauhan and Gopal V. Bugs: HIVE-20433 https://issues.apache.org/jira/browse/HIVE-20433 Repository: hive-git Description --- Implicit String to Timestamp conversion is slow Diffs (updated) - serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java 8a057d1dab Diff: https://reviews.apache.org/r/68476/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-20433) Implicit String to Timestamp conversion is slow
Deepak Jaiswal created HIVE-20433: - Summary: Implicit String to Timestamp conversion is slow Key: HIVE-20433 URL: https://issues.apache.org/jira/browse/HIVE-20433 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal getTimestampFromString() is slow at casting dates. It throws twice before date conversion can happen. cc [~gopalv] [~ashutoshc] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68359: HIVE-20393
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68359/ --- (Updated Aug. 15, 2018, 10:19 p.m.) Review request for hive, Gopal V, Jesús Camacho Rodríguez, and Jason Dere. Changes --- Less reliannt on NPE handling. Bugs: HIVE-20393 https://issues.apache.org/jira/browse/HIVE-20393 Repository: hive-git Description --- See Jira. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java f316f09953 Diff: https://reviews.apache.org/r/68359/diff/2/ Changes: https://reviews.apache.org/r/68359/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Review Request 68359: HIVE-20393
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68359/ --- Review request for hive, Gopal V and Jesús Camacho Rodríguez. Bugs: HIVE-20393 https://issues.apache.org/jira/browse/HIVE-20393 Repository: hive-git Description --- See Jira. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java f316f09953 Diff: https://reviews.apache.org/r/68359/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-20393) Semijoin Reduction : markSemiJoinForDPP behaves inconsistently
Deepak Jaiswal created HIVE-20393: - Summary: Semijoin Reduction : markSemiJoinForDPP behaves inconsistently Key: HIVE-20393 URL: https://issues.apache.org/jira/browse/HIVE-20393 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal markSemiJoinForDPP has multiple issues, * Uses map tsOps which is wrong as it disallows going thru same TS which may have filters from more than 1 semijoin edges. This results in inconsistent plans for same query as semijoin edges may be processed in different order each time. * Uses getColumnExpr() which is not as robust as extractColumn() thus resulting in NPEs. * The logic to mark an edge useful when NPE is hit may end up having bad edge. cc [~gopalv] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68281: HIVE-20354
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68281/ --- (Updated Aug. 10, 2018, 8:59 p.m.) Review request for hive, Eugene Koifman and Jason Dere. Changes --- A new approach. Bugs: HIVE-20354 https://issues.apache.org/jira/browse/HIVE-20354 Repository: hive-git Description --- Semijoin hints dont work with merge statements. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a63aabed9f ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 8df290435d ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 Diff: https://reviews.apache.org/r/68281/diff/4/ Changes: https://reviews.apache.org/r/68281/diff/3-4/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68281: HIVE-20354
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68281/ --- (Updated Aug. 10, 2018, 6:48 a.m.) Review request for hive, Eugene Koifman and Jason Dere. Changes --- Fixed the issue for tables such as "select_table" Bugs: HIVE-20354 https://issues.apache.org/jira/browse/HIVE-20354 Repository: hive-git Description --- Semijoin hints dont work with merge statements. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java a63aabed9f ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 8df290435d ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 Diff: https://reviews.apache.org/r/68281/diff/3/ Changes: https://reviews.apache.org/r/68281/diff/2-3/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68281: HIVE-20354
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68281/ --- (Updated Aug. 9, 2018, 7:19 p.m.) Review request for hive, Eugene Koifman and Jason Dere. Changes --- Implemented review comments. Bugs: HIVE-20354 https://issues.apache.org/jira/browse/HIVE-20354 Repository: hive-git Description --- Semijoin hints dont work with merge statements. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 463880587e ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 8df290435d ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 Diff: https://reviews.apache.org/r/68281/diff/2/ Changes: https://reviews.apache.org/r/68281/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68281: HIVE-20354
> On Aug. 9, 2018, 6:33 p.m., Gopal V wrote: > > ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java > > Lines 1000 (patched) > > <https://reviews.apache.org/r/68281/diff/1/?file=2070795#file2070795line1000> > > > > why not save it directly into setHintList()? It has to be first processed before it can be set. Anyway I am going to abandon this approach in favor of what Eugene suggested. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68281/#review207047 ------- On Aug. 9, 2018, 5:44 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68281/ > --- > > (Updated Aug. 9, 2018, 5:44 p.m.) > > > Review request for hive, Eugene Koifman and Jason Dere. > > > Bugs: HIVE-20354 > https://issues.apache.org/jira/browse/HIVE-20354 > > > Repository: hive-git > > > Description > --- > > Semijoin hints dont work with merge statements. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java > 463880587e > > ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java > 8df290435d > ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 > ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 > > > Diff: https://reviews.apache.org/r/68281/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Review Request 68281: HIVE-20354
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68281/ --- Review request for hive and Jason Dere. Bugs: HIVE-20354 https://issues.apache.org/jira/browse/HIVE-20354 Repository: hive-git Description --- Semijoin hints dont work with merge statements. Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 463880587e ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 8df290435d ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 Diff: https://reviews.apache.org/r/68281/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-20354) Semijoin hints dont work with merge statements
Deepak Jaiswal created HIVE-20354: - Summary: Semijoin hints dont work with merge statements Key: HIVE-20354 URL: https://issues.apache.org/jira/browse/HIVE-20354 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal When merge statement is rewritten, it ignores any comment in the query which may include hints like semijoin. If it is, it should not be ignored. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68124: HIVE-20252
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/ --- (Updated Aug. 1, 2018, 6:10 p.m.) Review request for hive, Jesús Camacho Rodríguez and Jason Dere. Changes --- Left out a minor change from previous patch. Bugs: HIVE-20252 https://issues.apache.org/jira/browse/HIVE-20252 Repository: hive-git Description --- See Jira. removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. I will eventually remove it and can be ignored. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 7b2ae40107 ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 Diff: https://reviews.apache.org/r/68124/diff/5/ Changes: https://reviews.apache.org/r/68124/diff/4-5/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68124: HIVE-20252
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/ --- (Updated Aug. 1, 2018, 6:05 p.m.) Review request for hive, Jesús Camacho Rodríguez and Jason Dere. Changes --- Implemented review comments. Bugs: HIVE-20252 https://issues.apache.org/jira/browse/HIVE-20252 Repository: hive-git Description --- See Jira. removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. I will eventually remove it and can be ignored. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 7b2ae40107 ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 Diff: https://reviews.apache.org/r/68124/diff/4/ Changes: https://reviews.apache.org/r/68124/diff/3-4/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68124: HIVE-20252
> On Aug. 1, 2018, 2:39 a.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java > > Lines 451 (patched) > > <https://reviews.apache.org/r/68124/diff/3/?file=2065696#file2065696line451> > > > > We can remove this first block, it does not buy us much in terms of > > algorithm perfomance, and method would have no restriction on start > > operator (plus more readable). > > Deepak Jaiswal wrote: > No. It wont work without it. It is not for performance, it is for > correctness. The start in our case is the RS2, going up wont work as it will > stop when it encounters RS1. > The more generic one is in SharedWorkOptimizer, this one, I am afraid is > for this particular case. > > Jesús Camacho Rodríguez wrote: > The block can be part of the caller logic, so if you have the chain: > SEL->GBY1->RS1->GBY2->RS2 > then you end up passing the SEL as the start operator. > > > Then the method in OperatorUtils has no restriction and it is reusable: > given any operator, 1) output all the operators contained in the same work, > 2) gather all the terminal operators of that work, and 3) gather all the > semijoin branches of that work. Thanks. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/#review206717 --- On Aug. 1, 2018, 12:27 a.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68124/ > --- > > (Updated Aug. 1, 2018, 12:27 a.m.) > > > Review request for hive, Jesús Camacho Rodríguez and Jason Dere. > > > Bugs: HIVE-20252 > https://issues.apache.org/jira/browse/HIVE-20252 > > > Repository: hive-git > > > Description > --- > > See Jira. > > removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. > I will eventually remove it and can be ignored. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 7b2ae40107 > ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 > > > Diff: https://reviews.apache.org/r/68124/diff/3/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 68124: HIVE-20252
> On Aug. 1, 2018, 2:39 a.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java > > Lines 451 (patched) > > <https://reviews.apache.org/r/68124/diff/3/?file=2065696#file2065696line451> > > > > We can remove this first block, it does not buy us much in terms of > > algorithm perfomance, and method would have no restriction on start > > operator (plus more readable). No. It wont work without it. It is not for performance, it is for correctness. The start in our case is the RS2, going up wont work as it will stop when it encounters RS1. The more generic one is in SharedWorkOptimizer, this one, I am afraid is for this particular case. > On Aug. 1, 2018, 2:39 a.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java > > Lines 462 (patched) > > <https://reviews.apache.org/r/68124/diff/3/?file=2065696#file2065696line462> > > > > Probably more useful to do the inverse, the private method void and the > > public method returns the operators in the work? Aah, what was I thinking. I meant to do that only. Thanks for pointing this out. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/#review206717 ------- On Aug. 1, 2018, 12:27 a.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68124/ > --- > > (Updated Aug. 1, 2018, 12:27 a.m.) > > > Review request for hive, Jesús Camacho Rodríguez and Jason Dere. > > > Bugs: HIVE-20252 > https://issues.apache.org/jira/browse/HIVE-20252 > > > Repository: hive-git > > > Description > --- > > See Jira. > > removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. > I will eventually remove it and can be ignored. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 7b2ae40107 > ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 > > > Diff: https://reviews.apache.org/r/68124/diff/3/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 68124: HIVE-20252
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/ --- (Updated Aug. 1, 2018, 12:27 a.m.) Review request for hive, Jesús Camacho Rodríguez and Jason Dere. Changes --- Implemented review comments. Bugs: HIVE-20252 https://issues.apache.org/jira/browse/HIVE-20252 Repository: hive-git Description --- See Jira. removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. I will eventually remove it and can be ignored. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 7b2ae40107 ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 Diff: https://reviews.apache.org/r/68124/diff/3/ Changes: https://reviews.apache.org/r/68124/diff/2-3/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 68124: HIVE-20252
> On July 31, 2018, 11:38 p.m., Jesús Camacho Rodríguez wrote: > > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java > > Line 914 (original), 917 (patched) > > <https://reviews.apache.org/r/68124/diff/2/?file=2065678#file2065678line983> > > > > Can be collapsed into single line in if condition. I have been asked to not do that in other reviews before so I kept it that way. Lets keep it this way. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/#review206707 --- On July 31, 2018, 11:07 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68124/ > --- > > (Updated July 31, 2018, 11:07 p.m.) > > > Review request for hive, Jesús Camacho Rodríguez and Jason Dere. > > > Bugs: HIVE-20252 > https://issues.apache.org/jira/browse/HIVE-20252 > > > Repository: hive-git > > > Description > --- > > See Jira. > > removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. > I will eventually remove it and can be ignored. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 > > > Diff: https://reviews.apache.org/r/68124/diff/2/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 68124: HIVE-20252
> On July 31, 2018, 11:38 p.m., Jesús Camacho Rodríguez wrote: > > Thanks I will work on the comments. - Deepak --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/#review206707 --- On July 31, 2018, 11:07 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68124/ > --- > > (Updated July 31, 2018, 11:07 p.m.) > > > Review request for hive, Jesús Camacho Rodríguez and Jason Dere. > > > Bugs: HIVE-20252 > https://issues.apache.org/jira/browse/HIVE-20252 > > > Repository: hive-git > > > Description > --- > > See Jira. > > removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. > I will eventually remove it and can be ignored. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 > > > Diff: https://reviews.apache.org/r/68124/diff/2/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 68124: HIVE-20252
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/ --- (Updated July 31, 2018, 11:07 p.m.) Review request for hive, Jesús Camacho Rodríguez and Jason Dere. Changes --- New approach where a virtual edge is created from non-semijoin terminal operators in a task to semijoin terminal operators within the task. This creates a cycle if there exists a task level cycle. Bugs: HIVE-20252 https://issues.apache.org/jira/browse/HIVE-20252 Repository: hive-git Description --- See Jira. removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. I will eventually remove it and can be ignored. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 538aa5e924 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 Diff: https://reviews.apache.org/r/68124/diff/2/ Changes: https://reviews.apache.org/r/68124/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Review Request 68124: HIVE-20252
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68124/ --- Review request for hive, Jesús Camacho Rodríguez and Jason Dere. Bugs: HIVE-20252 https://issues.apache.org/jira/browse/HIVE-20252 Repository: hive-git Description --- See Jira. removeSemiJoinCyclesDueToMapsideJoins is deprecated, although it has changes. I will eventually remove it and can be ignored. Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 011dadf495 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java c3eb886fd2 Diff: https://reviews.apache.org/r/68124/diff/1/ Testing --- Thanks, Deepak Jaiswal
Re: [ANNOUNCE] New committer: Slim Bouguerra
Congrats Slim! On 7/30/18, 4:03 PM, "Prasanth Jayachandran" wrote: Congratulations Slim! > On Jul 30, 2018, at 4:00 PM, Sergey Shelukhin wrote: > > Congrats! > > On 18/7/30, 12:53, "Gunther Hagleitner" > wrote: > >> Congratulations! >> >> Thanks, >> Gunther. >> >> From: Xuefu Zhang >> Sent: Monday, July 30, 2018 12:11 PM >> To: dev@hive.apache.org >> Subject: Re: [ANNOUNCE] New committer: Slim Bouguerra >> >> congratulations!!! >> >> On Mon, Jul 30, 2018 at 12:10 PM, Jesus Camacho Rodriguez < >> jcamachorodrig...@hortonworks.com> wrote: >> >>> Congrats Slim! >>> >>> On 7/30/18, 10:53 AM, "Andrew Sherman" >>> wrote: >>> >>>Congratulations Slim! >>> >>>On Mon, Jul 30, 2018 at 12:46 AM Peter Vary >>> >>> >>>wrote: >>> Congratulations Slim! > On Jul 30, 2018, at 02:00, Ashutosh Chauhan >>> wrote: > > Apache Hive's Project Management Committee (PMC) has invited >>> Slim Bouguerra > to become a committer, and we are pleased to announce that he >>> has accepted. > > Slim, welcome, thank you for your contributions, and we look >>> forward your > further interactions with the community! > > Ashutosh Chauhan (on behalf of the Apache Hive PMC) >>> >>> >>> >
Re: [ANNOUNCE] New PMC Member : Vineet Garg
Congratulations Vineet! On 7/30/18, 12:45 AM, "Peter Vary" wrote: Congratulations Vineet! > On Jul 30, 2018, at 01:59, Ashutosh Chauhan wrote: > > On behalf of the Hive PMC I am delighted to announce Vineet Garg is joining > Hive PMC. > Thanks Vineet for all your contributions till now. Looking forward to many > more. > > Welcome, Vineet! > > Thanks, > Ashutosh
[jira] [Created] (HIVE-20252) Semijoin Reduction : Cycles due to semi join branch may remain undetected if small table side has a map join upstream.
Deepak Jaiswal created HIVE-20252: - Summary: Semijoin Reduction : Cycles due to semi join branch may remain undetected if small table side has a map join upstream. Key: HIVE-20252 URL: https://issues.apache.org/jira/browse/HIVE-20252 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal For eg, # 2018-07-26T17:22:14,664 DEBUG [51377701-dc98-424f-82e0-bbb5d6c84316 main] optimizer.SharedWorkOptimizer: Before SharedWorkOptimizer: # TS[0]-FIL[96]-SEL[2]-MAPJOIN[156]-MAPJOIN[157]-MAPJOIN[161]-MAPJOIN[162]-FIL[47]-SEL[48]-MAPJOIN[163]-FIL[66]-SEL[67]-TNK[105]-GBY[68]-RS[69]-GBY[70]-SEL[71]-RS[72]-SEL[73]-LIM[74]-FS[75] # -SEL[142]-GBY[143]-RS[144]-GBY[145]-RS[155] # TS[3]-FIL[97]-SEL[5]-RS[34]-MAPJOIN[156] # TS[6]-FIL[98]-SEL[8]-RS[37]-MAPJOIN[157] # TS[9]-FIL[99]-SEL[11]-MAPJOIN[158]-GBY[40]-RS[42]-MAPJOIN[161] # TS[12]-FIL[100]-SEL[14]-RS[16]-MAPJOIN[158] # -SEL[131]-GBY[132]-EVENT[133] # TS[19]-FIL[101]-SEL[21]-MAPJOIN[159]-GBY[29]-RS[30]-GBY[31]-SEL[32]-RS[45]-MAPJOIN[162] # TS[22]-FIL[102]-SEL[24]-RS[26]-MAPJOIN[159] # -SEL[139]-GBY[140]-EVENT[141] # TS[49]-FIL[103]-SEL[51]-MAPJOIN[160]-GBY[59]-RS[60]-GBY[61]-SEL[62]-RS[64]-MAPJOIN[163] # TS[52]-FIL[104]-SEL[54]-RS[56]-MAPJOIN[160] # -SEL[147]-GBY[148]-EVENT[149] # # # DPP information stored in the cache: \{TS[19]=[EVENT[141]], TS[9]=[EVENT[133]], TS[49]=[RS[155], EVENT[149]]} The semi join branch in line 3 feeds into TS[49] in line 12 which feeds to MAPJOIN[163] going back to parent of the semi join branch at line 2. The logic to detect cycle may fail as there is a MAPJOIN[160] at line 12 which could cause the logic to look for wrong TS. The logic to find TS operator upstream must use findOperatorsUpstream() and examine each TS Op for complete coverage. cc [~jcamachorodriguez] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Review Request 68069: HIVE-20240
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68069/ --- Review request for hive and Jason Dere. Bugs: HIVE-20240 https://issues.apache.org/jira/browse/HIVE-20240 Repository: hive-git Description --- See Jira Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java caec2c08e9 ql/src/test/queries/clientpositive/dynamic_semijoin_reduction_4.q a04ab666e0 ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_4.q.out 0feb362023 Diff: https://reviews.apache.org/r/68069/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-20240) Semijoin Reduction : Use local variable to check for external table condition
Deepak Jaiswal created HIVE-20240: - Summary: Semijoin Reduction : Use local variable to check for external table condition Key: HIVE-20240 URL: https://issues.apache.org/jira/browse/HIVE-20240 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal This condition, semiJoin = semiJoin && !disableSemiJoinOptDueToExternalTable(parseContext.getConf(), ts, ctx); may set semiJoin to false if an external table is encountered and will remain false for subsequent cases. It should only disable it for that particular case. cc [~jdere] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 67974: HIVE-20164
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67974/ --- (Updated July 23, 2018, 6:06 p.m.) Review request for hive, Gopal V and Jason Dere. Changes --- Made the data set smaller for easier verification. Bugs: HIVE-20164 https://issues.apache.org/jira/browse/HIVE-20164 Repository: hive-git Description --- Murmur Hash : Make sure CTAS and IAS use correct bucketing version Diffs (updated) - itests/src/test/resources/testconfiguration.properties 654185d962 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1661aeccd7 ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java bbce940c2e ql/src/test/queries/clientpositive/murmur_hash_migration.q PRE-CREATION ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67974/diff/4/ Changes: https://reviews.apache.org/r/67974/diff/3-4/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 67974: HIVE-20164
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67974/ --- (Updated July 23, 2018, 5:48 p.m.) Review request for hive, Gopal V and Jason Dere. Changes --- Sort results for easy verification and to avoid order change due to other possible changes. Bugs: HIVE-20164 https://issues.apache.org/jira/browse/HIVE-20164 Repository: hive-git Description --- Murmur Hash : Make sure CTAS and IAS use correct bucketing version Diffs (updated) - itests/src/test/resources/testconfiguration.properties 654185d962 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1661aeccd7 ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java bbce940c2e ql/src/test/queries/clientpositive/murmur_hash_migration.q PRE-CREATION ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67974/diff/3/ Changes: https://reviews.apache.org/r/67974/diff/2-3/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 67974: HIVE-20164
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67974/ --- (Updated July 20, 2018, 11:10 p.m.) Review request for hive, Gopal V and Jason Dere. Changes --- Implemented review comments. Bugs: HIVE-20164 https://issues.apache.org/jira/browse/HIVE-20164 Repository: hive-git Description --- Murmur Hash : Make sure CTAS and IAS use correct bucketing version Diffs (updated) - itests/src/test/resources/testconfiguration.properties d5a33bd8ca ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1661aeccd7 ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java bbce940c2e ql/src/test/queries/clientpositive/murmur_hash_migration.q PRE-CREATION ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67974/diff/2/ Changes: https://reviews.apache.org/r/67974/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Review Request 67974: HIVE-20164
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67974/ --- Review request for hive, Gopal V and Jason Dere. Bugs: HIVE-20164 https://issues.apache.org/jira/browse/HIVE-20164 Repository: hive-git Description --- Murmur Hash : Make sure CTAS and IAS use correct bucketing version Diffs - itests/src/test/resources/testconfiguration.properties d08528f319 ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 1b433c7498 ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java bbce940c2e ql/src/test/queries/clientpositive/murmur_hash_migration.q PRE-CREATION ql/src/test/results/clientpositive/llap/murmur_hash_migration.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67974/diff/1/ Testing --- Thanks, Deepak Jaiswal
Re: [VOTE] Should we release storage-api 2.7.0 rc1?
Thanks for testing out the RC and your vote. With 3 +1s, the vote passes. I will work on the release now. Regards, Deepak On 7/18/18, 9:14 AM, "Jesus Camacho Rodriguez" wrote: +1 Built from sources and ran tests. -Jesús On 7/16/18, 10:31 AM, "Ashutosh Chauhan" wrote: +1 Built from sources. Ran unit tests. Checksums and sigs matched up. On Mon, Jul 16, 2018 at 8:58 AM Owen O'Malley wrote: > +1 > > built & ran tests > checked checksums & signature > tested with ORC > > On Thu, Jul 12, 2018 at 4:37 PM, Deepak Jaiswal > wrote: > > > Hi, > > > > I have prepared the rc1 off of branch-3.1. > > Artifacts: > > Tag : https://github.com/apache/hive/releases/tag/storage- > > release-2.7.0-rc1 > > Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/ > > > > Regards, > > Deepak > > > > On 7/10/18, 10:16 AM, "Deepak Jaiswal" > wrote: > > > > Thanks Owen for finding this out. I will work on the next RC once > this > > blocker is resolved. > > > > Regards, > > Deepak > > > > On 7/10/18, 9:40 AM, "Owen O'Malley" wrote: > > > > Ok, Jesus and I tracked it down and I've filed > > https://issues.apache.org/jira/browse/HIVE-20135 that is a > > blocker on > > storage-api 2.7.0. > > > > The impact was that orc 1.5 and master failed with the RC. orc > 1.4 > > and > > older were fine. > > > > .. Owen > > > > On Tue, Jul 10, 2018 at 8:17 AM, Owen O'Malley < > > owen.omal...@gmail.com> > > wrote: > > > > > I wanted to give an update on this. For now, I'm -1 because the > > ORC > > > (branch-1.5) tests fail with this RC. I'll dig into what is > > wrong, but it > > > looks like something in the timezone changes broke backwards > > compatibility. > > > > > > .. Owen > > > > > > On Mon, Jul 9, 2018 at 11:12 AM, Deepak Jaiswal < > > djais...@hortonworks.com> > > > wrote: > > > > > >> Thanks Alan. > > >> > > >> On 7/9/18, 10:17 AM, "Alan Gates" > wrote: > > >> > > >> +1. Did a build with a clean maven repo, checked the > > signature and > > >> sha > > >> hash, ran RAT. > > >> > > >> Alan. > > >> > > >> On Fri, Jul 6, 2018 at 2:21 PM Deepak Jaiswal < > > >> djais...@hortonworks.com> > > >> wrote: > > >> > > >> > Hi, > > >> > > > >> > I would like to make a new release of the storage-api. > It > > contains > > >> changes > > >> > required for Hive 3.1 release. > > >> > > > >> > Artifcats: > > >> > Tag : > > >> > https://github.com/apache/hive/releases/tag/storage- > > release- > > >> 2.7.0-rc0 > > >> > Tar Ball : http://home.apache.org/~ > > djaiswal/hive-storage-2.7.0/ > > >> > > > >> > Regards, > > >> > Deepak > > >> > > > >> > > >> > > >> > > > > > > > > > > > > > >
[jira] [Created] (HIVE-20164) Murmur Hash : Make sure CTAS and IAS use correct bucketing version
Deepak Jaiswal created HIVE-20164: - Summary: Murmur Hash : Make sure CTAS and IAS use correct bucketing version Key: HIVE-20164 URL: https://issues.apache.org/jira/browse/HIVE-20164 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal With the migration to Murmur hash, CTAS and IAS from old table version to new table version does not work as intended and data is hashed using old hash logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [VOTE] Should we release storage-api 2.7.0 rc1?
Hi, I have prepared the rc1 off of branch-3.1. Artifacts: Tag : https://github.com/apache/hive/releases/tag/storage-release-2.7.0-rc1 Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/ Regards, Deepak On 7/10/18, 10:16 AM, "Deepak Jaiswal" wrote: Thanks Owen for finding this out. I will work on the next RC once this blocker is resolved. Regards, Deepak On 7/10/18, 9:40 AM, "Owen O'Malley" wrote: Ok, Jesus and I tracked it down and I've filed https://issues.apache.org/jira/browse/HIVE-20135 that is a blocker on storage-api 2.7.0. The impact was that orc 1.5 and master failed with the RC. orc 1.4 and older were fine. .. Owen On Tue, Jul 10, 2018 at 8:17 AM, Owen O'Malley wrote: > I wanted to give an update on this. For now, I'm -1 because the ORC > (branch-1.5) tests fail with this RC. I'll dig into what is wrong, but it > looks like something in the timezone changes broke backwards compatibility. > > .. Owen > > On Mon, Jul 9, 2018 at 11:12 AM, Deepak Jaiswal > wrote: > >> Thanks Alan. >> >> On 7/9/18, 10:17 AM, "Alan Gates" wrote: >> >> +1. Did a build with a clean maven repo, checked the signature and >> sha >> hash, ran RAT. >> >> Alan. >> >> On Fri, Jul 6, 2018 at 2:21 PM Deepak Jaiswal < >> djais...@hortonworks.com> >> wrote: >> >> > Hi, >> > >> > I would like to make a new release of the storage-api. It contains >> changes >> > required for Hive 3.1 release. >> > >> > Artifcats: >> > Tag : >> > https://github.com/apache/hive/releases/tag/storage-release- >> 2.7.0-rc0 >> > Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/ >> > >> > Regards, >> > Deepak >> > >> >> >> >
[jira] [Created] (HIVE-20155) Semijoin Reduction : Put all the min-max filters before all the bloom filters
Deepak Jaiswal created HIVE-20155: - Summary: Semijoin Reduction : Put all the min-max filters before all the bloom filters Key: HIVE-20155 URL: https://issues.apache.org/jira/browse/HIVE-20155 Project: Hive Issue Type: Task Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal If there are more than 1 semijoin reduction filters, apply all min-max filters before any of the bloom filters are applied as bloom filter lookup is expensive. cc [~gopalv] [~jdere] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 67887: HIVE-20090
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67887/#review205976 --- LGTM. I have some minor comments. ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 417 (patched) <https://reviews.apache.org/r/67887/#comment288929> Thanks! ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 419 (patched) <https://reviews.apache.org/r/67887/#comment288945> Each of these functions check if semijoin reduction is enabled or not. I think it would be a bit efficient if the check happens at the beginning of this function and remove it from all the underlying functions. if (!procCtx.conf.getBoolVar(ConfVars.TEZ_DYNAMIC_SEMIJOIN_REDUCTION) || procCtx.parseContext.getRsToSemiJoinBranchInfo().size() == 0) { return; } ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 1030 (patched) <https://reviews.apache.org/r/67887/#comment288953> this code is very similar to SemiJoinRemovalIfNoStatsProc. If possible, can we refactor it to void duplication? ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 1064 (patched) <https://reviews.apache.org/r/67887/#comment288954> Is the first condition to handle cycles? ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java Lines 1089 (patched) <https://reviews.apache.org/r/67887/#comment288955> Please move this line after the instanceof check. ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java Lines 309 (patched) <https://reviews.apache.org/r/67887/#comment288948> Extreme nit : Can you add a blank line before the numbered comments for better readability? - Deepak Jaiswal On July 11, 2018, 5:26 p.m., Jesús Camacho Rodríguez wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67887/ > --- > > (Updated July 11, 2018, 5:26 p.m.) > > > Review request for hive, Ashutosh Chauhan, Deepak Jaiswal, and Gopal V. > > > Bugs: HIVE-20090 > https://issues.apache.org/jira/browse/HIVE-20090 > > > Repository: hive-git > > > Description > --- > > HIVE-20090 > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java > 6ea68c35000a5dadb7a01db47bbd8183bff966da > itests/src/test/resources/testconfiguration.properties > 9e012ce2f8f789bde3f95acc43052bf4446fccbc > ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java > dfd790853b2f73a465989374e78c01d282d16891 > ql/src/java/org/apache/hadoop/hive/ql/ppd/SyntheticJoinPredicate.java > dec2d1ef38b748a5c9b40d06af491dd168d70b72 > ql/src/test/queries/clientpositive/dynamic_semijoin_reduction_sw2.q > PRE-CREATION > > ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw2.q.out > PRE-CREATION > ql/src/test/results/clientpositive/llap/explainuser_1.q.out > f87fe36e11a7c7e535678dbfaaced04f33bbb501 > ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out > 6987a96809e3c3300e1b76ea5df3069b3c1d162f > ql/src/test/results/clientpositive/perf/tez/query1.q.out > 579940c66e25ebf5e7d0635aaedd0c0cc994f4e0 > ql/src/test/results/clientpositive/perf/tez/query16.q.out > 0b64c55b0f4ba036aeba4c49f478e9ee1409087c > ql/src/test/results/clientpositive/perf/tez/query17.q.out > 2e5e254b2ddc3507f962cbc7691db51f1abafbca > ql/src/test/results/clientpositive/perf/tez/query18.q.out > e8585275b4e51a55ce778dd154033fcdf859e617 > ql/src/test/results/clientpositive/perf/tez/query2.q.out > d24899ccf371ad42ef88cebc26cc671c097686da > ql/src/test/results/clientpositive/perf/tez/query23.q.out > 6725bec30106bc3321c2869dfc304d0a4da82cf8 > ql/src/test/results/clientpositive/perf/tez/query24.q.out > 9fcec42c3ab29b898c9c947544a2e29dd08e95e8 > ql/src/test/results/clientpositive/perf/tez/query25.q.out > a885cf344b7e29dcf1b2d93d1914e7f9a8d4b921 > ql/src/test/results/clientpositive/perf/tez/query29.q.out > 46ff49d41a01591f075b2c48ae5a692640fd6eec > ql/src/test/results/clientpositive/perf/tez/query31.q.out > c4d717d8680f6ac6f8f8b6ed01742384a84ddcf9 > ql/src/test/results/clientpositive/perf/tez/query32.q.out > 6be6f7aa6e6fc50bcedebe3f4d1b5fc00b52ee86 > ql/src/test/results/clientpositive/perf/tez/query39.q.out > 5966e243ea79b4b884950f34a5b7336e40f92889 > ql/src/test/results/clientpositive/perf/tez/query40.q.out > 2f116f12ebcba44b876508d0d0f0d827e3a8b28d > ql/src/test/results
[jira] [Created] (HIVE-20142) Semijoin Reduction : Peform cost based removal after rule based removal.
Deepak Jaiswal created HIVE-20142: - Summary: Semijoin Reduction : Peform cost based removal after rule based removal. Key: HIVE-20142 URL: https://issues.apache.org/jira/browse/HIVE-20142 Project: Hive Issue Type: Task Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal The semijoin reduction removal logic is spread out into multiple functions. Currently, the cost based removal logic is applied before the rule based(dumb) ones. Instead, apply the rule based removal logic and then apply the cost based removal. cc [~jdere] [~jcamachorodriguez] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [VOTE] Should we release storage-api 2.7.0 rc0?
Thanks Owen for finding this out. I will work on the next RC once this blocker is resolved. Regards, Deepak On 7/10/18, 9:40 AM, "Owen O'Malley" wrote: Ok, Jesus and I tracked it down and I've filed https://issues.apache.org/jira/browse/HIVE-20135 that is a blocker on storage-api 2.7.0. The impact was that orc 1.5 and master failed with the RC. orc 1.4 and older were fine. .. Owen On Tue, Jul 10, 2018 at 8:17 AM, Owen O'Malley wrote: > I wanted to give an update on this. For now, I'm -1 because the ORC > (branch-1.5) tests fail with this RC. I'll dig into what is wrong, but it > looks like something in the timezone changes broke backwards compatibility. > > .. Owen > > On Mon, Jul 9, 2018 at 11:12 AM, Deepak Jaiswal > wrote: > >> Thanks Alan. >> >> On 7/9/18, 10:17 AM, "Alan Gates" wrote: >> >> +1. Did a build with a clean maven repo, checked the signature and >> sha >> hash, ran RAT. >> >> Alan. >> >> On Fri, Jul 6, 2018 at 2:21 PM Deepak Jaiswal < >> djais...@hortonworks.com> >> wrote: >> >> > Hi, >> > >> > I would like to make a new release of the storage-api. It contains >> changes >> > required for Hive 3.1 release. >> > >> > Artifcats: >> > Tag : >> > https://github.com/apache/hive/releases/tag/storage-release- >> 2.7.0-rc0 >> > Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/ >> > >> > Regards, >> > Deepak >> > >> >> >> >
Re: [VOTE] Should we release storage-api 2.7.0 rc0?
Thanks Alan. On 7/9/18, 10:17 AM, "Alan Gates" wrote: +1. Did a build with a clean maven repo, checked the signature and sha hash, ran RAT. Alan. On Fri, Jul 6, 2018 at 2:21 PM Deepak Jaiswal wrote: > Hi, > > I would like to make a new release of the storage-api. It contains changes > required for Hive 3.1 release. > > Artifcats: > Tag : > https://github.com/apache/hive/releases/tag/storage-release-2.7.0-rc0 > Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/ > > Regards, > Deepak >
Re: Hive QA batches timing out
Thanks Zoltan for the analysis. Perhaps we should disable the test in the meantime as it is blocking several people from committing. I can go ahead and create a patch for it. Regards, Deepak On 7/8/18, 11:33 PM, "Zoltan Haindrich" wrote: Hello Thank you Deepak for taking a closer look! from what you've found I've noticed that the runtime of TestReplicationScenariosAcidTables have jumped up to ~2000sec in the runs which have failedit seems like this problem is there for a long time now; I've found jira tickets in which this test was "timed out" and the HiveQA comment was date at April 03so it's not entirely new... The problem which prohibits this test from completing successfully seems like that it has difficulties closing down the metastore client - which goes on for a while ... I don't know if this is an acid/replication/metastore/? issue...but it seems intermittent - I've a hunch that somehow it might happen more reliably with this test...I've opened HIVE-20121 to investigate this... 2018-07-08T22:07:33,461 DEBUG [main] metastore.HiveMetaStoreClient: Unable to shutdown metastore client. Will try closing transport directly. org.apache.thrift.transport.TTransportException: Cannot write to null outputStream some links to more or less recent logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/maven-test.txt the hive.log is ~200M: http://104.198.109.242/logs/PreCommit-HIVE-Build-12481/failed/240_UTBatch_itests__hive-unit_9_tests/logs/hive.log cheers, Zoltan On 07/08/2018 06:49 PM, Deepak Jaiswal wrote: > I am seeing tests timing out in my latest ptest run, > > https://builds.apache.org/job/PreCommit-HIVE-Build/12468/testReport > https://builds.apache.org/job/PreCommit-HIVE-Build/12468/console > > TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=240) > TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240) > TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=240) > TestReplicationScenariosAcidTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240) > TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=240) > TestSparkStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=240) > > > From the Hive QA homepage, the last stable build was 12444 whereas the current run is 12473. I looked at some of the runs in between and it looks like most of the runs are failing due to the above batch of unit tests. > > Regards, > Deepak >
Hive QA batches timing out
I am seeing tests timing out in my latest ptest run, https://builds.apache.org/job/PreCommit-HIVE-Build/12468/testReport https://builds.apache.org/job/PreCommit-HIVE-Build/12468/console TestAlterTableMetadata - did not produce a TEST-*.xml file (likely timed out) (batchId=240) TestAutoPurgeTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240) TestLocationQueries - did not produce a TEST-*.xml file (likely timed out) (batchId=240) TestReplicationScenariosAcidTables - did not produce a TEST-*.xml file (likely timed out) (batchId=240) TestSemanticAnalyzerHookLoading - did not produce a TEST-*.xml file (likely timed out) (batchId=240) TestSparkStatistics - did not produce a TEST-*.xml file (likely timed out) (batchId=240) From the Hive QA homepage, the last stable build was 12444 whereas the current run is 12473. I looked at some of the runs in between and it looks like most of the runs are failing due to the above batch of unit tests. Regards, Deepak
[VOTE] Should we release storage-api 2.7.0 rc0?
Hi, I would like to make a new release of the storage-api. It contains changes required for Hive 3.1 release. Artifcats: Tag : https://github.com/apache/hive/releases/tag/storage-release-2.7.0-rc0 Tar Ball : http://home.apache.org/~djaiswal/hive-storage-2.7.0/ Regards, Deepak
[jira] [Created] (HIVE-20100) OpTraits : Select Optraits should stop when a mismatch is detected
Deepak Jaiswal created HIVE-20100: - Summary: OpTraits : Select Optraits should stop when a mismatch is detected Key: HIVE-20100 URL: https://issues.apache.org/jira/browse/HIVE-20100 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal The select operator's optraits logic as stated in the comment is, // For bucket columns // If all the columns match to the parent, put them in the bucket cols // else, add empty list. // For sort columns // Keep the subset of all the columns as long as order is maintained. However, this is not happening due to a bug. The bool found is never reset, so if a single match is found, the value remains true and allows the optraits get populated with partial list of columns for bucket col which is incorrect. This may lead to creation of SMB join which should not happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 67800: HIVE-20039
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67800/ --- (Updated July 3, 2018, 8:20 p.m.) Review request for hive and Gopal V. Changes --- Added test to llap only runs. Bugs: HIVE-20039 https://issues.apache.org/jira/browse/HIVE-20039 Repository: hive-git Description --- Bucket pruning: Left Outer Join on bucketed table gives wrong result. The context was reused by all the predicates. Instead use TS op directly. Diffs (updated) - data/files/bucket_pruning/l3_clarity__l3_monthly_dw_factplan_datajoin_1_s2_2018022300104_1/00_0 PRE-CREATION data/files/bucket_pruning/l3_clarity__l3_monthly_dw_factplan_dw_stg_2018022300104_1/00_0 PRE-CREATION data/files/bucket_pruning/l3_clarity__l3_snap_number_2018022300104/00_0 PRE-CREATION data/files/bucket_pruning/l3_monthly_dw_dimplan/56_0 PRE-CREATION itests/src/test/resources/testconfiguration.properties d02c0fe8ba ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java 2debacacb5 ql/src/test/queries/clientpositive/tez_fixed_bucket_pruning.q PRE-CREATION ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67800/diff/3/ Changes: https://reviews.apache.org/r/67800/diff/2-3/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 67800: HIVE-20039
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67800/ --- (Updated July 3, 2018, 6:49 a.m.) Review request for hive and Gopal V. Changes --- Implemented recommended changes. Added order by in queries for predictable results. Bugs: HIVE-20039 https://issues.apache.org/jira/browse/HIVE-20039 Repository: hive-git Description --- Bucket pruning: Left Outer Join on bucketed table gives wrong result. The context was reused by all the predicates. Instead use TS op directly. Diffs (updated) - data/files/bucket_pruning/l3_clarity__l3_monthly_dw_factplan_datajoin_1_s2_2018022300104_1/00_0 PRE-CREATION data/files/bucket_pruning/l3_clarity__l3_monthly_dw_factplan_dw_stg_2018022300104_1/00_0 PRE-CREATION data/files/bucket_pruning/l3_clarity__l3_snap_number_2018022300104/00_0 PRE-CREATION data/files/bucket_pruning/l3_monthly_dw_dimplan/56_0 PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java 2debacacb5 ql/src/test/queries/clientpositive/tez_fixed_bucket_pruning.q PRE-CREATION ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67800/diff/2/ Changes: https://reviews.apache.org/r/67800/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Re: Review Request 67800: HIVE-20039
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67800/#review205652 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java Lines 90 (patched) <https://reviews.apache.org/r/67800/#comment288542> Is it even possible to change bucket count for a partition of in a table? As far as I can see bucket number is a table wide property. - Deepak Jaiswal On July 3, 2018, 12:23 a.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67800/ > --- > > (Updated July 3, 2018, 12:23 a.m.) > > > Review request for hive and Gopal V. > > > Bugs: HIVE-20039 > https://issues.apache.org/jira/browse/HIVE-20039 > > > Repository: hive-git > > > Description > --- > > Bucket pruning: Left Outer Join on bucketed table gives wrong result. > The context was reused by all the predicates. Instead use TS op directly. > > > Diffs > - > > > data/files/bucket_pruning/l3_clarity__l3_monthly_dw_factplan_datajoin_1_s2_2018022300104_1/00_0 > PRE-CREATION > > data/files/bucket_pruning/l3_clarity__l3_monthly_dw_factplan_dw_stg_2018022300104_1/00_0 > PRE-CREATION > data/files/bucket_pruning/l3_clarity__l3_snap_number_2018022300104/00_0 > PRE-CREATION > data/files/bucket_pruning/l3_monthly_dw_dimplan/56_0 PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java > 2debacacb5 > ql/src/test/queries/clientpositive/tez_fixed_bucket_pruning.q PRE-CREATION > ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out > PRE-CREATION > > > Diff: https://reviews.apache.org/r/67800/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Review Request 67800: HIVE-20039
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67800/ --- Review request for hive and Gopal V. Bugs: HIVE-20039 https://issues.apache.org/jira/browse/HIVE-20039 Repository: hive-git Description --- Bucket pruning: Left Outer Join on bucketed table gives wrong result. The context was reused by all the predicates. Instead use TS op directly. Diffs - data/files/bucket_pruning/l3_clarity__l3_monthly_dw_factplan_datajoin_1_s2_2018022300104_1/00_0 PRE-CREATION data/files/bucket_pruning/l3_clarity__l3_monthly_dw_factplan_dw_stg_2018022300104_1/00_0 PRE-CREATION data/files/bucket_pruning/l3_clarity__l3_snap_number_2018022300104/00_0 PRE-CREATION data/files/bucket_pruning/l3_monthly_dw_dimplan/56_0 PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java 2debacacb5 ql/src/test/queries/clientpositive/tez_fixed_bucket_pruning.q PRE-CREATION ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67800/diff/1/ Testing --- Thanks, Deepak Jaiswal
[DISCUSS] Storage-API 2.7 release
All, The upcoming branch-3.1 will need changes from storage-api. I propose to create new release of storage-api. Please let me know your thoughts on this. I am working on the release candidate. Regards, Deepak
[jira] [Created] (HIVE-20039) Left Outer Join on bucketed table gives wrong result
Deepak Jaiswal created HIVE-20039: - Summary: Left Outer Join on bucketed table gives wrong result Key: HIVE-20039 URL: https://issues.apache.org/jira/browse/HIVE-20039 Project: Hive Issue Type: Bug Affects Versions: 2.3.2, 3.0.0 Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal Left outer join on bucketed table on certain cases gives wrong results. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Unstable Hive QA
:1.8.0_102] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_102] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_102] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) [junit-4.11.jar:?] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.11.jar:?] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) [junit-4.11.jar:?] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.11.jar:?] at org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92) [hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.junit.rules.RunRules.evaluate(RunRules.java:20) [junit-4.11.jar:?] at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) [junit-4.11.jar:?] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) [junit-4.11.jar:?] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) [junit-4.11.jar:?] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) [junit-4.11.jar:?] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) [junit-4.11.jar:?] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) [junit-4.11.jar:?] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) [junit-4.11.jar:?] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) [junit-4.11.jar:?] at org.junit.runners.ParentRunner.run(ParentRunner.java:309) [junit-4.11.jar:?] at org.junit.runners.Suite.runChild(Suite.java:127) [junit-4.11.jar:?] at org.junit.runners.Suite.runChild(Suite.java:26) [junit-4.11.jar:?] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) [junit-4.11.jar:?] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) [junit-4.11.jar:?] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) [junit-4.11.jar:?] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) [junit-4.11.jar:?] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) [junit-4.11.jar:?] at org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:73) [hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.junit.rules.RunRules.evaluate(RunRules.java:20) [junit-4.11.jar:?] at org.junit.runners.ParentRunner.run(ParentRunner.java:309) [junit-4.11.jar:?] at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) [surefire-junit4-2.21.0.jar:2.21.0] at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) [surefire-junit4-2.21.0.jar:2.21.0] at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) [surefire-junit4-2.21.0.jar:2.21.0] at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) [surefire-junit4-2.21.0.jar:2.21.0] at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) [surefire-booter-2.21.0.jar:2.21.0] at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) [surefire-booter-2.21.0.jar:2.21.0] at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) [surefire-booter-2.21.0.jar:2.21.0] at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) [surefire-booter-2.21.0.jar:2.21.0] 2018-06-27T22:24:51,363 INFO [d00e737b-0dde-4230-ae29-d20498bf8332 main] ql.Context: New scratch dir is hdfs://localhost:37593/home/hiveptest Vineet On Jun 27, 2018, at 11:47 PM, Deepak Jaiswal mailto:djais...@hortonworks.com>> wrote: Ptests have become really unstable. The druid tests are failing randomly, https://builds.apache.org/job/PreCommit-HIVE-Build/12203/testReport Should we disable them? Deepak On 6/27/18, 10:13 AM, "Deepak Jaiswal" wrote: Hi All, It seems we are going back to instability in Hive QA runs. In the past few days I saw many runs where the failures were completely independent. When those tests are run locally, they don’t fail which makes them harder to catch. On one side I think having green run to commit makes sense, however, on the other side, the development is unnecessarily blocked. Putting the randomly failing tests in disabled list is also not a good idea as it brings down the code coverage. Any suggestions? Regards, Deepak
Re: Unstable Hive QA
Ptests have become really unstable. The druid tests are failing randomly, https://builds.apache.org/job/PreCommit-HIVE-Build/12203/testReport Should we disable them? Deepak On 6/27/18, 10:13 AM, "Deepak Jaiswal" wrote: Hi All, It seems we are going back to instability in Hive QA runs. In the past few days I saw many runs where the failures were completely independent. When those tests are run locally, they don’t fail which makes them harder to catch. On one side I think having green run to commit makes sense, however, on the other side, the development is unnecessarily blocked. Putting the randomly failing tests in disabled list is also not a good idea as it brings down the code coverage. Any suggestions? Regards, Deepak
Re: Review Request 67698: HIVE-19967
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67698/ --- (Updated June 27, 2018, 6:42 p.m.) Review request for hive, Gunther Hagleitner and Jason Dere. Changes --- Added a missed test from original patch to enable SMB. Bugs: HIVE-19967 https://issues.apache.org/jira/browse/HIVE-19967 Repository: hive-git Description --- SMB Join : Need Optraits for PTFOperator ala GBY Op Diffs (updated) - itests/src/test/resources/testconfiguration.properties 9f25a9bad3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/AnnotateWithOpTraits.java 3c8e61d47b ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java dbcbbfd1a6 ql/src/test/queries/clientpositive/llap_smb_ptf.q PRE-CREATION ql/src/test/queries/clientpositive/tez_smb_reduce_side.q PRE-CREATION ql/src/test/results/clientpositive/llap/llap_smb_ptf.q.out PRE-CREATION ql/src/test/results/clientpositive/llap/tez_smb_reduce_side.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67698/diff/2/ Changes: https://reviews.apache.org/r/67698/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-20017) Logic to disable SMB/BMJ on external tables is too strict
Deepak Jaiswal created HIVE-20017: - Summary: Logic to disable SMB/BMJ on external tables is too strict Key: HIVE-20017 URL: https://issues.apache.org/jira/browse/HIVE-20017 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal The logic to disable SMB and BMJ on external tables is too strict as done in JIRA, https://issues.apache.org/jira/browse/HIVE-19336 For SMB, if there is a group by, then the source table becomes irrelevant as the rows are bucketed and sorted by group by keys. For BMJ, the small table(s) can be external, the check needs to be done only for big table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Unstable Hive QA
Hi All, It seems we are going back to instability in Hive QA runs. In the past few days I saw many runs where the failures were completely independent. When those tests are run locally, they don’t fail which makes them harder to catch. On one side I think having green run to commit makes sense, however, on the other side, the development is unnecessarily blocked. Putting the randomly failing tests in disabled list is also not a good idea as it brings down the code coverage. Any suggestions? Regards, Deepak
Re: Hive QA logs not accessible
It was too soon. Looks like it is broken again. One of my runs, https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/12114/console Regards, Deepak On 6/25/18, 1:57 PM, "Deepak Jaiswal" wrote: Hi Vihang, It took a while but tests started to appear, so all is good now. Regards, Deepak On 6/25/18, 12:24 PM, "Vihang Karajgaonkar" wrote: I see there are 6 builds in the queue right now (which is unusually small). What is the JIRA number where you submitted the patch? On Mon, Jun 25, 2018 at 11:05 AM, Deepak Jaiswal wrote: > Hi Vihang, > > I am looking for logs of failed test runs. Thanks for optimizing this for > successful runs. However, I think there is a problem with Hive QA, the > queue is gone and I submitted a patch more than 10 minutes ago and it > hasn’t started or enqueued yet. > > https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/ > > Regards, > Deepak > > On 6/25/18, 10:53 AM, "Vihang Karajgaonkar" > wrote: > > Are you looking for logs for successful tests? I had submitted a change > recently which stops skips downloading logs for successful tests to > shave > off ~10 min time from each run. I found that the job was spending too > much > time copying over ~20G of logs from worker nodes to the server. Can you > give the JIRA number so that I can take a look? > > On Mon, Jun 25, 2018 at 10:38 AM, Deepak Jaiswal < > djais...@hortonworks.com> > wrote: > > > The Hive QA logs are not accessible for yesterday night’s run. Also, > I > > don’t see any test running. > > Is the disk full again? > > > > Regards, > > Deepak > > > > >
Re: Hive QA logs not accessible
Hi Vihang, It took a while but tests started to appear, so all is good now. Regards, Deepak On 6/25/18, 12:24 PM, "Vihang Karajgaonkar" wrote: I see there are 6 builds in the queue right now (which is unusually small). What is the JIRA number where you submitted the patch? On Mon, Jun 25, 2018 at 11:05 AM, Deepak Jaiswal wrote: > Hi Vihang, > > I am looking for logs of failed test runs. Thanks for optimizing this for > successful runs. However, I think there is a problem with Hive QA, the > queue is gone and I submitted a patch more than 10 minutes ago and it > hasn’t started or enqueued yet. > > https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/ > > Regards, > Deepak > > On 6/25/18, 10:53 AM, "Vihang Karajgaonkar" > wrote: > > Are you looking for logs for successful tests? I had submitted a change > recently which stops skips downloading logs for successful tests to > shave > off ~10 min time from each run. I found that the job was spending too > much > time copying over ~20G of logs from worker nodes to the server. Can you > give the JIRA number so that I can take a look? > > On Mon, Jun 25, 2018 at 10:38 AM, Deepak Jaiswal < > djais...@hortonworks.com> > wrote: > > > The Hive QA logs are not accessible for yesterday night’s run. Also, > I > > don’t see any test running. > > Is the disk full again? > > > > Regards, > > Deepak > > > > >
Re: Review Request 67710: HIVE-19481
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67710/ --- (Updated June 25, 2018, 6:15 p.m.) Review request for hive, Jason Dere and Sergey Shelukhin. Changes --- Updated results for failed tests. Bugs: HIVE-19481 https://issues.apache.org/jira/browse/HIVE-19481 Repository: hive-git Description --- sample10.q returns wrong results. Multiple issues were fixed 1. Instead of using old MR logic which assumes there is 1 file for each bucket, lookup buckets by name(non-managed tables) 2. Skip bucket pruning for managed tables. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 517b413839 ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 9dbd869d57 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 8200e6a237 ql/src/test/queries/clientpositive/sample10_mm.q PRE-CREATION ql/src/test/results/clientpositive/archive_excludeHadoop20.q.out e4b390c9cd ql/src/test/results/clientpositive/beeline/smb_mapjoin_11.q.out 9f946e0b50 ql/src/test/results/clientpositive/llap/sample10.q.out ce3c2880a6 ql/src/test/results/clientpositive/llap/sample10_mm.q.out PRE-CREATION ql/src/test/results/clientpositive/masking_5.q.out 498fc117c7 ql/src/test/results/clientpositive/sample6.q.out 7f853e55c5 ql/src/test/results/clientpositive/sample7.q.out 0e2fc287d4 ql/src/test/results/clientpositive/sample9.q.out 0de49a698a ql/src/test/results/clientpositive/smb_mapjoin_11.q.out a83f3e66c4 ql/src/test/results/clientpositive/spark/infer_bucket_sort_bucketed_table.q.out 8fab7ecbd0 ql/src/test/results/clientpositive/spark/sample10.q.out 555e5f43ec ql/src/test/results/clientpositive/spark/sample2.q.out 8b73fdf874 ql/src/test/results/clientpositive/spark/sample4.q.out 3269b015ec ql/src/test/results/clientpositive/spark/sample6.q.out 36532d7fbe ql/src/test/results/clientpositive/spark/sample7.q.out d0b52bcdce Diff: https://reviews.apache.org/r/67710/diff/2/ Changes: https://reviews.apache.org/r/67710/diff/1-2/ Testing --- Thanks, Deepak Jaiswal
Re: Hive QA logs not accessible
Hi Vihang, I am looking for logs of failed test runs. Thanks for optimizing this for successful runs. However, I think there is a problem with Hive QA, the queue is gone and I submitted a patch more than 10 minutes ago and it hasn’t started or enqueued yet. https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-Build/ Regards, Deepak On 6/25/18, 10:53 AM, "Vihang Karajgaonkar" wrote: Are you looking for logs for successful tests? I had submitted a change recently which stops skips downloading logs for successful tests to shave off ~10 min time from each run. I found that the job was spending too much time copying over ~20G of logs from worker nodes to the server. Can you give the JIRA number so that I can take a look? On Mon, Jun 25, 2018 at 10:38 AM, Deepak Jaiswal wrote: > The Hive QA logs are not accessible for yesterday night’s run. Also, I > don’t see any test running. > Is the disk full again? > > Regards, > Deepak >
Hive QA logs not accessible
The Hive QA logs are not accessible for yesterday night’s run. Also, I don’t see any test running. Is the disk full again? Regards, Deepak
Review Request 67710: HIVE-19481
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67710/ --- Review request for hive, Jason Dere and Sergey Shelukhin. Bugs: HIVE-19481 https://issues.apache.org/jira/browse/HIVE-19481 Repository: hive-git Description --- sample10.q returns wrong results. Multiple issues were fixed 1. Instead of using old MR logic which assumes there is 1 file for each bucket, lookup buckets by name(non-managed tables) 2. Skip bucket pruning for managed tables. Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 9dbd869d57 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 8200e6a237 ql/src/test/queries/clientpositive/sample10_mm.q PRE-CREATION ql/src/test/results/clientpositive/llap/sample10.q.out 1b95314980 ql/src/test/results/clientpositive/llap/sample10_mm.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/sample10.q.out ac28779591 Diff: https://reviews.apache.org/r/67710/diff/1/ Testing --- Thanks, Deepak Jaiswal
[jira] [Created] (HIVE-19972) Followup to HIVE-19928 : Fix the check for managed table
Deepak Jaiswal created HIVE-19972: - Summary: Followup to HIVE-19928 : Fix the check for managed table Key: HIVE-19972 URL: https://issues.apache.org/jira/browse/HIVE-19972 Project: Hive Issue Type: Bug Reporter: Deepak Jaiswal Assignee: Deepak Jaiswal The check for managed table should use ENUM comparison rather than string comparison. The check in the patch will always return false, thus maintaining existing behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Review Request 67698: HIVE-19967
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67698/ --- Review request for hive, Gunther Hagleitner and Jason Dere. Bugs: HIVE-19967 https://issues.apache.org/jira/browse/HIVE-19967 Repository: hive-git Description --- SMB Join : Need Optraits for PTFOperator ala GBY Op Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/AnnotateWithOpTraits.java 3c8e61d47b ql/src/java/org/apache/hadoop/hive/ql/optimizer/metainfo/annotation/OpTraitsRulesProcFactory.java dbcbbfd1a6 ql/src/test/queries/clientpositive/llap_smb_ptf.q PRE-CREATION ql/src/test/results/clientpositive/llap/llap_smb_ptf.q.out PRE-CREATION Diff: https://reviews.apache.org/r/67698/diff/1/ Testing --- Thanks, Deepak Jaiswal