[ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:
--------------------------------
    Status: Patch Available  (was: In Progress)

> Enhance EXPLAIN display for vectorization
> -----------------------------------------
>
>                 Key: HIVE-11394
>                 URL: https://issues.apache.org/jira/browse/HIVE-11394
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch, HIVE-11394.07.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> …
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: decimal_date_test
>                   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
>                     Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: cdate (type: date)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: date)
>                         sort order: +
>                         Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 2 
>             Execution mode: vectorized, llap
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
>                 groupByVectorOutput: true
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>             Reduce Operator Tree:
>               Select Operator
>                 expressions: KEY.reducesinkkey0 (type: date)
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> …
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>         Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: vectortab2korc
>                   Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
> COMPLETE Column stats: NONE
>                   Select Operator
>                     expressions: bo (type: boolean), b (type: bigint)
>                     outputColumnNames: bo, b
>                     Select Vectorization:
>                         className: VectorSelectOperator
>                         native: true
>                         nativeConditionsMet: Supported IS true
>                         selectExpressions: IdentityExpression[7:boolean], 
> IdentityExpression[3:bigint]
>                         vectorized: true
>                     Statistics: Num rows: 2000 Data size: 918712 Basic stats: 
> COMPLETE Column stats: NONE
>                     Group By Operator
>                       aggregations: max(b)
>                       Group By Vectorization:
>                           aggregators: 
> VectorUDAFMaxLong(IdentityExpression[3:bigint])
>                           className: VectorGroupByOperator
>                           vectorOutput: true
>                           keyExpressions: IdentityExpression[7:boolean]
>                           native: false
>                           nativeConditionsNotMet: Supported IS false
>                           vectorized: true
>                       keys: bo (type: boolean)
>                       mode: hash
>                       outputColumnNames: _col0, _col1
>                       Statistics: Num rows: 2000 Data size: 918712 Basic 
> stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         key expressions: _col0 (type: boolean)
>                         sort order: +
>                         Map-reduce partition columns: _col0 (type: boolean)
>                         Reduce Sink Vectorization:
>                             className: VectorReduceSinkLongOperator
>                             native: true
>                             nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE 
> IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No 
> DISTINCT columns IS true, BinarySortableSerDe for keys IS true, 
> LazyBinarySerDe for values IS true
>                             vectorized: true
>                         Statistics: Num rows: 2000 Data size: 918712 Basic 
> stats: COMPLETE Column stats: NONE
>                         value expressions: _col1 (type: bigint)
>             Execution mode: vectorized
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Reducer 2 
>             Execution mode: vectorized
>             Reduce Vectorization:
>                 enabled: true
>                 enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
>                 groupByVectorOutput: true
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: max(VALUE._col0)
>                 Group By Vectorization:
>                     aggregators: 
> VectorUDAFMaxLong(IdentityExpression[1:bigint])
>                     className: VectorGroupByOperator
>                     vectorOutput: true
>                     keyExpressions: IdentityExpression[0:boolean]
>                     native: false
>                     nativeConditionsNotMet: Supported IS false
>                     vectorized: true
>                 keys: KEY._col0 (type: boolean)
>                 mode: mergepartial
>                 outputColumnNames: _col0, _col1
>                 Statistics: Num rows: 1000 Data size: 459356 Basic stats: 
> COMPLETE Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: boolean)
>                   sort order: -
>                   Reduce Sink Vectorization:
>                       className: VectorReduceSinkOperator
>                       native: false
>                       nativeConditionsMet: 
> hive.vectorized.execution.reducesink.new.enabled IS true, 
> hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE 
> IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, 
> BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true
>                       nativeConditionsNotMet: Uniform Hash IS false
>                       vectorized: true
>                   Statistics: Num rows: 1000 Data size: 459356 Basic stats: 
> COMPLETE Column stats: NONE
>                   value expressions: _col1 (type: bigint)
> …
> {code}
> EXPLAIN VECTORIZATION ONLY example:
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       Edges:
>         Map 1 <- Map 2 (BROADCAST_EDGE)
>       Vertices:
>         Map 1 
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: false
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>         Map 2 
>             Map Vectorization:
>                 enabled: true
>                 enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
>                 groupByVectorOutput: true
>                 inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                 allNative: true
>                 usesVectorUDFAdaptor: false
>                 vectorized: true
>   Stage: Stage-0
> {code}
> The standard @Explain Annotation Type is used.  A new 'vectorization' 
> annotation marks each new class and method.
> Works for FORMATTED, like other non-vectorization EXPLAIN variations.
> EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED
> {code}
> {"PLAN 
> VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled
>  IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT 
> STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE 
> PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map 
> 3","type":"BROADCAST_EDGE"},{"parent":"Map 
> 4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map 
> 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map 
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
>  IS 
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
>  3":{"Map 
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
>  IS 
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map
>  4":{"Map 
> Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format
>  IS 
> true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer
>  2":{"Reduce 
> Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled
>  IS true","hive.execution.engine tez IN [tez, spark] IS 
> true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}}
> {code}
> or pretty printed:
> {code}
> {
>   "PLAN VECTORIZATION": {
>     "enabled": true,
>     "enabledConditionsMet": [
>       "hive.vectorized.execution.enabled IS true"
>     ]
>   },
>   "STAGE DEPENDENCIES": {
>     "Stage-1": {
>       "ROOT STAGE": "TRUE"
>     },
>     "Stage-0": {
>       "DEPENDENT STAGES": "Stage-1"
>     }
>   },
>   "STAGE PLANS": {
>     "Stage-1": {
>       "Tez": {
>         "Edges:": {
>           "Map 1": [
>             {
>               "parent": "Map 3",
>               "type": "BROADCAST_EDGE"
>             },
>             {
>               "parent": "Map 4",
>               "type": "BROADCAST_EDGE"
>             }
>           ],
>           "Reducer 2": {
>             "parent": "Map 1",
>             "type": "SIMPLE_EDGE"
>           }
>         },
>         "Vertices:": {
>           "Map 1": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "false",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Map 3": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "true",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Map 4": {
>             "Map Vectorization:": {
>               "enabled:": "true",
>               "enabledConditionsMet:": [
>                 "hive.vectorized.use.vectorized.input.format IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "inputFileFormats:": [
>                 "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"
>               ],
>               "allNative:": "true",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           },
>           "Reducer 2": {
>             "Reduce Vectorization:": {
>               "enabled:": "true",
>               "enableConditionsMet:": [
>                 "hive.vectorized.execution.reduce.enabled IS true",
>                 "hive.execution.engine tez IN [tez, spark] IS true"
>               ],
>               "groupByVectorOutput:": "true",
>               "allNative:": "false",
>               "usesVectorUDFAdaptor:": "false",
>               "vectorized:": "true"
>             }
>           }
>         }
>       }
>     },
>     "Stage-0": {
>       
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to