[jira] [Commented] (HIVE-8536) Enable SkewJoinResolver for spark [Spark Branch]

Rui Li (JIRA) Mon, 24 Nov 2014 05:06:32 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222964#comment-14222964
 ]


Rui Li commented on HIVE-8536:
------------------------------

With the latest patch, the previous example query plan is:
{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-4 depends on stages: Stage-1 , consists of Stage-3
  Stage-3
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 3 (PARTITION-LEVEL 
SORT, 1)
      DagName: root_20141124135050_f1804bff-a38d-496e-92de-62d50567da1c:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: tiny
                  Statistics: Num rows: 2 Data size: 241 Basic stats: COMPLETE 
Column stats: NONE
                  Filter Operator
                    predicate: pagerank is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 120 Basic stats: 
COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: pagerank (type: int)
                      sort order: +
                      Map-reduce partition columns: pagerank (type: int)
                      Statistics: Num rows: 1 Data size: 120 Basic stats: 
COMPLETE Column stats: NONE
                      value expressions: pageurl (type: string), avgduration 
(type: int)
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: rankings
                  Statistics: Num rows: 8922992 Data size: 963683136 Basic 
stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: pagerank is not null (type: boolean)
                    Statistics: Num rows: 4461496 Data size: 481841568 Basic 
stats: COMPLETE Column stats: NONE
                    Reduce Output Operator
                      key expressions: pagerank (type: int)
                      sort order: +
                      Map-reduce partition columns: pagerank (type: int)
                      Statistics: Num rows: 4461496 Data size: 481841568 Basic 
stats: COMPLETE Column stats: NONE
                      value expressions: pageurl (type: string), avgduration 
(type: int)
        Reducer 2 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                condition expressions:
                  0 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col1}
                  1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col1}
                handleSkewJoin: true
                outputColumnNames: _col0, _col1, _col2, _col6, _col7, _col8
                Statistics: Num rows: 4907645 Data size: 530025736 Basic stats: 
COMPLETE Column stats: NONE
                Select Operator
                  expressions: _col0 (type: string), _col1 (type: int), _col2 
(type: int), _col6 (type: string), _col7 (type: int), _col8 (type: int)
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                  Statistics: Num rows: 4907645 Data size: 530025736 Basic 
stats: COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 4907645 Data size: 530025736 Basic 
stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-4
    Conditional Operator

  Stage: Stage-3
    Spark
      Edges:
        Map 4 <- Map 5 (NONE, 0)
      DagName: root_20141124135050_f1804bff-a38d-496e-92de-62d50567da1c:2
      Vertices:
        Map 4 
            Map Operator Tree:
                TableScan
                  Map Join Operator
                    condition map:
                         Inner Join 0 to 1
                    condition expressions:
                      0 {0_VALUE_0} {0_VALUE_1} {0_VALUE_2}
                      1 {1_VALUE_0} {1_VALUE_1} {1_VALUE_2}
                    keys:
                      0 reducesinkkey0 (type: int)
                      1 reducesinkkey0 (type: int)
                    outputColumnNames: _col0, _col1, _col2, _col6, _col7, _col8
                    Select Operator
                      expressions: _col0 (type: string), _col1 (type: int), 
_col2 (type: int), _col6 (type: string), _col7 (type: int), _col8 (type: int)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
_col5
                      File Output Operator
                        compressed: false
                        table:
                            input format: 
org.apache.hadoop.mapred.TextInputFormat
                            output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Map 5 
            Map Operator Tree:
                TableScan
                  Spark HashTable Sink Operator
                    condition expressions:
                      0 {0_VALUE_0} {0_VALUE_1} {0_VALUE_2}
                      1 {1_VALUE_0} {1_VALUE_1} {1_VALUE_2}
                    keys:
                      0 reducesinkkey0 (type: int)
                      1 reducesinkkey0 (type: int)

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{noformat}

> Enable SkewJoinResolver for spark [Spark Branch]
> ------------------------------------------------
>
>                 Key: HIVE-8536
>                 URL: https://issues.apache.org/jira/browse/HIVE-8536
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-8536.1-spark.patch, HIVE-8536.2-spark.patch
>
>
> Sub-task of HIVE-8406



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8536) Enable SkewJoinResolver for spark [Spark Branch]

Reply via email to