[jira] [Commented] (HIVE-17414) HoS DPP + Vectorization generates invalid explain plan due to CombineEquivalentWorkResolver

Rui Li (JIRA) Thu, 31 Aug 2017 00:39:27 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-17414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16148579#comment-16148579
 ]


Rui Li commented on HIVE-17414:
-------------------------------

[~kellyzly], I think another way is to call SparkUtilities#collectOp twice to 
collect vector and non-vector DPP sinks separately. Either way is OK to me, as 
long as the change doesn't cause other issues.
BTW, isAssignableFrom returns true if the two classes are equivalent. So we 
don't need to call equals.

> HoS DPP + Vectorization generates invalid explain plan due to 
> CombineEquivalentWorkResolver
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17414
>                 URL: https://issues.apache.org/jira/browse/HIVE-17414
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-17414.patch
>
>
> Similar to HIVE-16948, the following query generates an invalid explain plan 
> when HoS DPP is enabled + vectorization:
> {code:sql}
> select ds from (select distinct(ds) as ds from srcpart union all select 
> distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from 
> srcpart union all select min(srcpart.ds) from srcpart)
> {code}
> Explain Plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
>     Spark
>       Edges:
>         Reducer 11 <- Map 10 (GROUP, 1)
>         Reducer 13 <- Map 12 (GROUP, 1)
> #### A masked pattern was here ####
>       Vertices:
>         Map 10
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                   Select Operator
>                     expressions: ds (type: string)
>                     outputColumnNames: ds
>                     Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                     Group By Operator
>                       aggregations: max(ds)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         sort order:
>                         Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: string)
>             Execution mode: vectorized
>         Map 12
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                   Select Operator
>                     expressions: ds (type: string)
>                     outputColumnNames: ds
>                     Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                     Group By Operator
>                       aggregations: min(ds)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         sort order:
>                         Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: string)
>             Execution mode: vectorized
>         Reducer 11
>             Execution mode: vectorized
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: max(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>                 Filter Operator
>                   predicate: _col0 is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: _col0 (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: _col0 (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                         Spark Partition Pruning Sink Operator
>                           Target column: ds (string)
>                           partition key expr: ds
>                           Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                           target work: Map 1
>                     Select Operator
>                       expressions: _col0 (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                         Spark Partition Pruning Sink Operator
>                           Target column: ds (string)
>                           partition key expr: ds
>                           Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                           target work: Map 4
>         Reducer 13
>             Execution mode: vectorized
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: min(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>                 Filter Operator
>                   predicate: _col0 is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: _col0 (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: _col0 (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                         Spark Partition Pruning Sink Operator
>                           Target column: ds (string)
>                           partition key expr: ds
>                           Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                           target work: Map 1
>                     Select Operator
>                       expressions: _col0 (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                         Spark Partition Pruning Sink Operator
>                           Target column: ds (string)
>                           partition key expr: ds
>                           Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                           target work: Map 4
>   Stage: Stage-1
>     Spark
>       Edges:
>         Reducer 2 <- Map 1 (GROUP, 4)
>         Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 4), Reducer 2 
> (PARTITION-LEVEL SORT, 4), Reducer 7 (PARTITION-LEVEL SORT, 4), Reducer 9 
> (PARTITION-LEVEL SORT, 4)
>         Reducer 7 <- Map 6 (GROUP, 1)
>         Reducer 9 <- Map 8 (GROUP, 1)
> #### A masked pattern was here ####
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   filterExpr: ds is not null (type: boolean)
>                   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: ds (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: string)
>                       sort order: +
>                       Map-reduce partition columns: _col0 (type: string)
>                       Statistics: Num rows: 2000 Data size: 21248 Basic 
> stats: COMPLETE Column stats: NONE
>             Execution mode: vectorized
>         Map 6
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                   Select Operator
>                     expressions: ds (type: string)
>                     outputColumnNames: ds
>                     Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                     Group By Operator
>                       aggregations: max(ds)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         sort order:
>                         Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: string)
>             Execution mode: vectorized
>         Map 8
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                   Select Operator
>                     expressions: ds (type: string)
>                     outputColumnNames: ds
>                     Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>                     Group By Operator
>                       aggregations: min(ds)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         sort order:
>                         Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: string)
>             Execution mode: vectorized
>         Reducer 2
>             Execution mode: vectorized
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
> COMPLETE Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: string)
>                   sort order: +
>                   Map-reduce partition columns: _col0 (type: string)
>                   Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
> COMPLETE Column stats: NONE
>         Reducer 3
>             Reduce Operator Tree:
>               Join Operator
>                 condition map:
>                      Left Semi Join 0 to 1
>                 keys:
>                   0 _col0 (type: string)
>                   1 _col0 (type: string)
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 2200 Data size: 23372 Basic stats: 
> COMPLETE Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 2200 Data size: 23372 Basic stats: 
> COMPLETE Column stats: NONE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>         Reducer 7
>             Execution mode: vectorized
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: max(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>                 Filter Operator
>                   predicate: _col0 is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: _col0 (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: string)
>                       sort order: +
>                       Map-reduce partition columns: _col0 (type: string)
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>         Reducer 9
>             Execution mode: vectorized
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: min(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE 
> Column stats: NONE
>                 Filter Operator
>                   predicate: _col0 is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: _col0 (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: string)
>                       sort order: +
>                       Map-reduce partition columns: _col0 (type: string)
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: 
> COMPLETE Column stats: NONE
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-17414) HoS DPP + Vectorization generates invalid explain plan due to CombineEquivalentWorkResolver

Reply via email to