[jira] [Commented] (HIVE-6144) Implement non-staged MapJoin

Lefty Leverenz (JIRA) Sun, 17 Aug 2014 22:59:35 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100309#comment-14100309
 ]


Lefty Leverenz commented on HIVE-6144:
--------------------------------------

Review request:  *hive.auto.convert.join.use.nonstaged* has been added to the 
section "Optimize Auto Join Conversion" in a version-0.13.0 box.  Is that the 
right place for it?  Could we have some examples and guidance on when to use it?

* [Optimize Auto Join Conversion | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion]

Also in that section, I changed the value of 
*hive.auto.convert.join.noconditionaltask.size* to match the default (10000000) 
-- it had been 10000 which seemed rather small, but if that value was intended 
please let me know.

> Implement non-staged MapJoin
> ----------------------------
>
>                 Key: HIVE-6144
>                 URL: https://issues.apache.org/jira/browse/HIVE-6144
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>              Labels: TODOC13
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt, 
> HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt, HIVE-6144.5.patch.txt, 
> HIVE-6144.6.patch.txt, HIVE-6144.7.patch.txt, HIVE-6144.8.patch.txt, 
> HIVE-6144.9.patch.txt
>
>
> For map join, all data in small aliases are hashed and stored into temporary 
> file in MapRedLocalTask. But for some aliases without filter or projection, 
> it seemed not necessary to do that. For example.
> {noformat}
> select a.* from src a join src b on a.key=b.key;
> {noformat}
> makes plan like this.
> {noformat}
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         a 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         a 
>           TableScan
>             alias: a
>             HashTable Sink Operator
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               Position of Big Table: 1
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>         b 
>           TableScan
>             alias: b
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               outputColumnNames: _col0, _col1
>               Position of Big Table: 1
>               Select Operator
>                 File Output Operator
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
> {noformat}
> table src(a) is fetched and stored as-is in MRLocalTask. With this patch, 
> plan can be like below.
> {noformat}
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>         b 
>           TableScan
>             alias: b
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               outputColumnNames: _col0, _col1
>               Position of Big Table: 1
>               Select Operator
>                   File Output Operator
>       Local Work:
>         Map Reduce Local Work
>           Alias -> Map Local Tables:
>             a 
>               Fetch Operator
>                 limit: -1
>           Alias -> Map Local Operator Tree:
>             a 
>               TableScan
>                 alias: a
>           Has Any Stage Alias: false
>   Stage: Stage-0
>     Fetch Operator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6144) Implement non-staged MapJoin

Reply via email to