[jira] [Commented] (HIVE-8700) Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

Suhas Satish (JIRA) Thu, 06 Nov 2014 15:30:00 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201198#comment-14201198
 ]


Suhas Satish commented on HIVE-8700:
------------------------------------

Have a patch which now generates the HashTableSinkOperators as follows. Will be 
uploading  a patch soon. 

explain select table1.key, table2.value, table3.value from table1 join table2 
on table1.key=table2.key join table3 on table1.key=table3.key;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      Edges:
        Map 3 <- Map 1 (NONE, 0), Map 2 (NONE, 0)
      DagName: ssatish_20141106152828_299c0f54-40a8-4cf5-91f4-ecb1f420955f:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: table1
                  Statistics: Num rows: 1453 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 727 Data size: 2908 Basic stats: 
COMPLETE Column stats: NONE
                    HashTable Sink Operator
                      condition expressions:
                        0 {key}
                        1 {value}
                        2 {value}
                      keys:
                        0 key (type: int)
                        1 key (type: int)
                        2 key (type: int)
        Map 2 
            Map Operator Tree:
                TableScan
                  alias: table3
                  Statistics: Num rows: 2 Data size: 216 Basic stats: COMPLETE 
Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 108 Basic stats: 
COMPLETE Column stats: NONE
                    HashTable Sink Operator
                      condition expressions:
                        0 {key}
                        1 {value}
                        2 {value}
                      keys:
                        0 key (type: int)
                        1 key (type: int)
                        2 key (type: int)
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: table2
                  Statistics: Num rows: 55 Data size: 5791 Basic stats: 
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 28 Data size: 2948 Basic stats: 
COMPLETE Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                           Inner Join 0 to 2
                      condition expressions:
                        0 {key}
                        1 {value}
                        2 {value}
                      keys:
                        0 key (type: int)
                        1 key (type: int)
                        2 key (type: int)
                      outputColumnNames: _col0, _col6, _col11
                      input vertices:
                        0 Map 1
                        2 Map 2
                      Statistics: Num rows: 1599 Data size: 6397 Basic stats: 
COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: int), _col6 (type: string), 
_col11 (type: string)
                        outputColumnNames: _col0, _col1, _col2
                        Statistics: Num rows: 1599 Data size: 6397 Basic stats: 
COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1599 Data size: 6397 Basic 
stats: COMPLETE Column stats: NONE
                          table:
                              input format: 
org.apache.hadoop.mapred.TextInputFormat
                              output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


> Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-8700
>                 URL: https://issues.apache.org/jira/browse/HIVE-8700
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Suhas Satish
>         Attachments: HIVE-8700-spark.patch, HIVE-8700.patch
>
>
> With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small 
> tables. For example, the follow represents the operator plan for the small 
> table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from 
> dec join dec1 on dec.value=dec1.d;{code}
> {code}
>         Map 2 
>             Map Operator Tree:
>                 TableScan
>                   alias: dec1
>                   Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL 
> Column stats: NONE
>                   Filter Operator
>                     predicate: d is not null (type: boolean)
>                     Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: d (type: decimal(5,2))
>                       sort order: +
>                       Map-reduce partition columns: d (type: decimal(5,2))
>                       Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                       value expressions: i (type: int)
> {code}
> With the new design for broadcasting small tables, we need to convert the 
> ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8700) Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

Reply via email to