[
https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100309#comment-14100309
]
Lefty Leverenz commented on HIVE-6144:
--------------------------------------
Review request: *hive.auto.convert.join.use.nonstaged* has been added to the
section "Optimize Auto Join Conversion" in a version-0.13.0 box. Is that the
right place for it? Could we have some examples and guidance on when to use it?
* [Optimize Auto Join Conversion |
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion]
Also in that section, I changed the value of
*hive.auto.convert.join.noconditionaltask.size* to match the default (10000000)
-- it had been 10000 which seemed rather small, but if that value was intended
please let me know.
> Implement non-staged MapJoin
> ----------------------------
>
> Key: HIVE-6144
> URL: https://issues.apache.org/jira/browse/HIVE-6144
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Navis
> Assignee: Navis
> Priority: Minor
> Labels: TODOC13
> Fix For: 0.13.0
>
> Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt,
> HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt, HIVE-6144.5.patch.txt,
> HIVE-6144.6.patch.txt, HIVE-6144.7.patch.txt, HIVE-6144.8.patch.txt,
> HIVE-6144.9.patch.txt
>
>
> For map join, all data in small aliases are hashed and stored into temporary
> file in MapRedLocalTask. But for some aliases without filter or projection,
> it seemed not necessary to do that. For example.
> {noformat}
> select a.* from src a join src b on a.key=b.key;
> {noformat}
> makes plan like this.
> {noformat}
> STAGE PLANS:
> Stage: Stage-4
> Map Reduce Local Work
> Alias -> Map Local Tables:
> a
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> a
> TableScan
> alias: a
> HashTable Sink Operator
> condition expressions:
> 0 {key} {value}
> 1
> handleSkewJoin: false
> keys:
> 0 [Column[key]]
> 1 [Column[key]]
> Position of Big Table: 1
> Stage: Stage-3
> Map Reduce
> Alias -> Map Operator Tree:
> b
> TableScan
> alias: b
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {key} {value}
> 1
> handleSkewJoin: false
> keys:
> 0 [Column[key]]
> 1 [Column[key]]
> outputColumnNames: _col0, _col1
> Position of Big Table: 1
> Select Operator
> File Output Operator
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> {noformat}
> table src(a) is fetched and stored as-is in MRLocalTask. With this patch,
> plan can be like below.
> {noformat}
> Stage: Stage-3
> Map Reduce
> Alias -> Map Operator Tree:
> b
> TableScan
> alias: b
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {key} {value}
> 1
> handleSkewJoin: false
> keys:
> 0 [Column[key]]
> 1 [Column[key]]
> outputColumnNames: _col0, _col1
> Position of Big Table: 1
> Select Operator
> File Output Operator
> Local Work:
> Map Reduce Local Work
> Alias -> Map Local Tables:
> a
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> a
> TableScan
> alias: a
> Has Any Stage Alias: false
> Stage: Stage-0
> Fetch Operator
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)