Re: Review Request 16728: Implement non-staged MapJoin
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/#review32885 --- itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java https://reviews.apache.org/r/16728/#comment61876 This would break tez tests. itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java https://reviews.apache.org/r/16728/#comment61875 This would eliminate tez unit tests. Was this intentional? ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java https://reviews.apache.org/r/16728/#comment61880 Could you raise a jira for this. ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java https://reviews.apache.org/r/16728/#comment61882 can it be only these 2 operators? Maybe common join operator can be used? - Vikram Dixit Kumaraswamy On Jan. 20, 2014, 5 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/ --- (Updated Jan. 20, 2014, 5 a.m.) Review request for hive. Bugs: HIVE-6144 https://issues.apache.org/jira/browse/HIVE-6144 Repository: hive-git Description --- For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a78b72f conf/hive-default.xml.template 7cd8a1f itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 9ad5986 itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/MapJoinCounterHook.java 1b0d57e ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java d8f4eb4 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java a080fcc ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java fc08b28 ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1e0314d ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java bdc85b9 ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 56676df ql/src/java/org/apache/hadoop/hive/ql/exec/TemporaryHashSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java
Re: Review Request 16728: Implement non-staged MapJoin
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/ --- (Updated Jan. 28, 2014, 1:37 a.m.) Review request for hive. Bugs: HIVE-6144 https://issues.apache.org/jira/browse/HIVE-6144 Repository: hive-git Description --- For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 84ee78f conf/hive-default.xml.template 66d22f9 itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 9ad5986 itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/MapJoinCounterHook.java 1b0d57e ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java d8f4eb4 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java a080fcc ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java fc08b28 ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1e0314d ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java bdc85b9 ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 56676df ql/src/java/org/apache/hadoop/hive/ql/exec/TemporaryHashSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 22e5777 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 58484af ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 2d2508d ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 2df8ab9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 5a53e15 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java 83b8d6e ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java ebccb14 ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java c30da56 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 709c50e ql/src/test/queries/clientpositive/auto_join_without_localtask.q PRE-CREATION ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 0595cd6 ql/src/test/results/clientnegative/deletejar.q.out b873e34 ql/src/test/results/clientnegative/file_with_header_footer_negative.q.out fa261b3 ql/src/test/results/clientnegative/sortmerge_mapjoin_mismatch_1.q.out bca069a ql/src/test/results/clientpositive/auto_join1.q.out b93c10f
Re: Review Request 16728: Implement non-staged MapJoin
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/ --- (Updated Jan. 28, 2014, 1:38 a.m.) Review request for hive. Bugs: HIVE-6144 https://issues.apache.org/jira/browse/HIVE-6144 Repository: hive-git Description --- For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 84ee78f conf/hive-default.xml.template 66d22f9 itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/MapJoinCounterHook.java 1b0d57e ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java d8f4eb4 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java a080fcc ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java fc08b28 ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1e0314d ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java bdc85b9 ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 56676df ql/src/java/org/apache/hadoop/hive/ql/exec/TemporaryHashSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 22e5777 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 58484af ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 2d2508d ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 2df8ab9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 5a53e15 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java 83b8d6e ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java ebccb14 ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java c30da56 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 709c50e ql/src/test/queries/clientpositive/auto_join_without_localtask.q PRE-CREATION ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 0595cd6 ql/src/test/results/clientnegative/file_with_header_footer_negative.q.out fa261b3 ql/src/test/results/clientnegative/sortmerge_mapjoin_mismatch_1.q.out bca069a ql/src/test/results/clientpositive/auto_join1.q.out b93c10f ql/src/test/results/clientpositive/auto_join15.q.out 6bfcfc7 ql/src/test/results/clientpositive/auto_join17.q.out 698270e
Re: Review Request 16728: Implement non-staged MapJoin
On Jan. 27, 2014, 9:52 p.m., Vikram Dixit Kumaraswamy wrote: itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java, line 90 https://reviews.apache.org/r/16728/diff/4/?file=431458#file431458line90 This would break tez tests. Included by mistake (described in HIVE-6241). I'll remove this. On Jan. 27, 2014, 9:52 p.m., Vikram Dixit Kumaraswamy wrote: itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java, line 369 https://reviews.apache.org/r/16728/diff/4/?file=431458#file431458line369 This would eliminate tez unit tests. Was this intentional? Same with above one On Jan. 27, 2014, 9:52 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java, line 142 https://reviews.apache.org/r/16728/diff/4/?file=431471#file431471line142 Could you raise a jira for this. Ok. sure. On Jan. 27, 2014, 9:52 p.m., Vikram Dixit Kumaraswamy wrote: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java, line 165 https://reviews.apache.org/r/16728/diff/4/?file=431471#file431471line165 can it be only these 2 operators? Maybe common join operator can be used? Logically, it might possible but in real, it seemed not possible to have CommonJoinOperator as a parent which would be in reducer task. Tez could exploit that but I don't know well on tez. - Navis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/#review32885 --- On Jan. 28, 2014, 1:37 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/ --- (Updated Jan. 28, 2014, 1:37 a.m.) Review request for hive. Bugs: HIVE-6144 https://issues.apache.org/jira/browse/HIVE-6144 Repository: hive-git Description --- For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 84ee78f conf/hive-default.xml.template 66d22f9 itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/MapJoinCounterHook.java 1b0d57e
Re: Review Request 16728: Implement non-staged MapJoin
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/ --- (Updated Jan. 20, 2014, 5 a.m.) Review request for hive. Changes --- Addressed comment Bugs: HIVE-6144 https://issues.apache.org/jira/browse/HIVE-6144 Repository: hive-git Description --- For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a78b72f conf/hive-default.xml.template 7cd8a1f itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 9ad5986 itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/MapJoinCounterHook.java 1b0d57e ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java d8f4eb4 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java a080fcc ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java fc08b28 ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1e0314d ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java bdc85b9 ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 56676df ql/src/java/org/apache/hadoop/hive/ql/exec/TemporaryHashSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 22e5777 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 58484af ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 2d2508d ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 2df8ab9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 5a53e15 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java 83b8d6e ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java ebccb14 ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java c30da56 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 709c50e ql/src/test/queries/clientpositive/auto_join_without_localtask.q PRE-CREATION ql/src/test/results/clientnegative/bucket_mapjoin_mismatch1.q.out 0595cd6 ql/src/test/results/clientnegative/deletejar.q.out b873e34 ql/src/test/results/clientnegative/file_with_header_footer_negative.q.out fa261b3 ql/src/test/results/clientnegative/sortmerge_mapjoin_mismatch_1.q.out bca069a
Re: Review Request 16728: Implement non-staged MapJoin
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/ --- (Updated Jan. 13, 2014, 4:43 a.m.) Review request for hive. Changes --- Rebased to trunk added entry for hive-defalt.xml.template Bugs: HIVE-6144 https://issues.apache.org/jira/browse/HIVE-6144 Repository: hive-git Description --- For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 16d54c6 conf/hive-default.xml.template d188f2a ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java d8f4eb4 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java a080fcc ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java aa8f19c ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1e0314d ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java bdc85b9 ql/src/java/org/apache/hadoop/hive/ql/exec/TemporaryHashSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 5511bca ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java efe5710 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0cc90d0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 5a53e15 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java 010ac54 ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java 14fced7 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 83a778d ql/src/test/queries/clientpositive/auto_join_without_localtask.q PRE-CREATION ql/src/test/results/clientpositive/auto_join_without_localtask.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16728/diff/ Testing --- Thanks, Navis Ryu
Re: Review Request 16728: Implement non-staged MapJoin
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/ --- (Updated Jan. 13, 2014, 7:10 a.m.) Review request for hive. Changes --- Missed a file Bugs: HIVE-6144 https://issues.apache.org/jira/browse/HIVE-6144 Repository: hive-git Description --- For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 16d54c6 conf/hive-default.xml.template d188f2a ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java d8f4eb4 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java a080fcc ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java aa8f19c ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1e0314d ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java bdc85b9 ql/src/java/org/apache/hadoop/hive/ql/exec/TemporaryHashSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 5511bca ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java efe5710 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0cc90d0 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 2df8ab9 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 5a53e15 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java 010ac54 ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java 14fced7 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 83a778d ql/src/test/queries/clientpositive/auto_join_without_localtask.q PRE-CREATION ql/src/test/results/clientpositive/auto_join_without_localtask.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16728/diff/ Testing --- Thanks, Navis Ryu
Review Request 16728: Implement non-staged MapJoin
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16728/ --- Review request for hive. Bugs: HIVE-6144 https://issues.apache.org/jira/browse/HIVE-6144 Repository: hive-git Description --- For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfd539 ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java d8f4eb4 ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java a080fcc ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java aa8f19c ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 1e0314d ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java bdc85b9 ql/src/java/org/apache/hadoop/hive/ql/exec/TemporaryHashSinkOperator.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 42d764d ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java efe5710 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0cc90d0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 5a53e15 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MapJoinResolver.java 010ac54 ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java 14fced7 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredLocalWork.java 83a778d ql/src/test/queries/clientpositive/auto_join_without_localtask.q PRE-CREATION ql/src/test/results/clientpositive/auto_join_without_localtask.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16728/diff/ Testing --- Thanks, Navis Ryu