[ https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline updated HIVE-11394: -------------------------------- Status: In Progress (was: Patch Available) > Enhance EXPLAIN display for vectorization > ----------------------------------------- > > Key: HIVE-11394 > URL: https://issues.apache.org/jira/browse/HIVE-11394 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, > HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, > HIVE-11394.06.patch, HIVE-11394.07.patch > > > Add detail to the EXPLAIN output showing why a Map and Reduce work is not > vectorized. > New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\] > The ONLY option suppresses most non-vectorization elements. > SUMMARY shows vectorization information for the PLAN (is vectorization > enabled) and a summary of Map and Reduce work. > The optional clause defaults are not ONLY and SUMMARY. > Here are some examples: > EXPLAIN VECTORIZATION example: > (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization > sections) > It is the same as EXPLAIN VECTORIZATION SUMMARY. > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > … > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > … > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: decimal_date_test > Statistics: Num rows: 12288 Data size: 2467616 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: > boolean) > Statistics: Num rows: 6144 Data size: 1233808 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: cdate (type: date) > outputColumnNames: _col0 > Statistics: Num rows: 6144 Data size: 1233808 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: date) > sort order: + > Statistics: Num rows: 6144 Data size: 1233808 Basic > stats: COMPLETE Column stats: NONE > Execution mode: vectorized, llap > LLAP IO: all inputs > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > Execution mode: vectorized, llap > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez, spark] IS true > groupByVectorOutput: true > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reduce Operator Tree: > Select Operator > expressions: KEY.reducesinkkey0 (type: date) > outputColumnNames: _col0 > Statistics: Num rows: 6144 Data size: 1233808 Basic stats: > COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 6144 Data size: 1233808 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > {code} > EXPLAIN VECTORIZATION DETAIL > (Note the added Select Vectorization, Group By Vectorization, Reduce Sink > Vectorization sections in this example) > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > … > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > … > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: vectortab2korc > Statistics: Num rows: 2000 Data size: 918712 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: bo (type: boolean), b (type: bigint) > outputColumnNames: bo, b > Select Vectorization: > className: VectorSelectOperator > native: true > nativeConditionsMet: Supported IS true > selectExpressions: IdentityExpression[7:boolean], > IdentityExpression[3:bigint] > vectorized: true > Statistics: Num rows: 2000 Data size: 918712 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: max(b) > Group By Vectorization: > aggregators: > VectorUDAFMaxLong(IdentityExpression[3:bigint]) > className: VectorGroupByOperator > vectorOutput: true > keyExpressions: IdentityExpression[7:boolean] > native: false > nativeConditionsNotMet: Supported IS false > vectorized: true > keys: bo (type: boolean) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 2000 Data size: 918712 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: boolean) > sort order: + > Map-reduce partition columns: _col0 (type: boolean) > Reduce Sink Vectorization: > className: VectorReduceSinkLongOperator > native: true > nativeConditionsMet: > hive.vectorized.execution.reducesink.new.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE > IS true, No buckets IS true, No TopN IS true, Uniform Hash IS true, No > DISTINCT columns IS true, BinarySortableSerDe for keys IS true, > LazyBinarySerDe for values IS true > vectorized: true > Statistics: Num rows: 2000 Data size: 918712 Basic > stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: bigint) > Execution mode: vectorized > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reducer 2 > Execution mode: vectorized > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez, spark] IS true > groupByVectorOutput: true > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > Group By Vectorization: > aggregators: > VectorUDAFMaxLong(IdentityExpression[1:bigint]) > className: VectorGroupByOperator > vectorOutput: true > keyExpressions: IdentityExpression[0:boolean] > native: false > nativeConditionsNotMet: Supported IS false > vectorized: true > keys: KEY._col0 (type: boolean) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 1000 Data size: 459356 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: boolean) > sort order: - > Reduce Sink Vectorization: > className: VectorReduceSinkOperator > native: false > nativeConditionsMet: > hive.vectorized.execution.reducesink.new.enabled IS true, > hive.execution.engine tez IN [tez, spark] IS true, Not ACID UPDATE or DELETE > IS true, No buckets IS true, No TopN IS true, No DISTINCT columns IS true, > BinarySortableSerDe for keys IS true, LazyBinarySerDe for values IS true > nativeConditionsNotMet: Uniform Hash IS false > vectorized: true > Statistics: Num rows: 1000 Data size: 459356 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col1 (type: bigint) > … > {code} > EXPLAIN VECTORIZATION ONLY example: > {code} > PLAN VECTORIZATION: > enabled: true > enabledConditionsMet: [hive.vectorized.execution.enabled IS true] > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 1 <- Map 2 (BROADCAST_EDGE) > Vertices: > Map 1 > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: false > usesVectorUDFAdaptor: false > vectorized: true > Map 2 > Map Vectorization: > enabled: true > enabledConditionsMet: > hive.vectorized.use.vectorized.input.format IS true > groupByVectorOutput: true > inputFileFormats: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > allNative: true > usesVectorUDFAdaptor: false > vectorized: true > Stage: Stage-0 > {code} > The standard @Explain Annotation Type is used. A new 'vectorization' > annotation marks each new class and method. > Works for FORMATTED, like other non-vectorization EXPLAIN variations. > EXPLAIN VECTORIZATION ONLY SUMMARY FORMATTED > {code} > {"PLAN > VECTORIZATION":{"enabled":true,"enabledConditionsMet":["hive.vectorized.execution.enabled > IS true"]},"STAGE DEPENDENCIES":{"Stage-1":{"ROOT > STAGE":"TRUE"},"Stage-0":{"DEPENDENT STAGES":"Stage-1"}},"STAGE > PLANS":{"Stage-1":{"Tez":{"Edges:":{"Map 1":[{"parent":"Map > 3","type":"BROADCAST_EDGE"},{"parent":"Map > 4","type":"BROADCAST_EDGE"}],"Reducer 2":{"parent":"Map > 1","type":"SIMPLE_EDGE"}},"Vertices:":{"Map 1":{"Map > Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format > IS > true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map > 3":{"Map > Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format > IS > true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Map > 4":{"Map > Vectorization:":{"enabled:":"true","enabledConditionsMet:":["hive.vectorized.use.vectorized.input.format > IS > true"],"groupByVectorOutput:":"true","inputFileFormats:":["org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"],"allNative:":"true","usesVectorUDFAdaptor:":"false","vectorized:":"true"}},"Reducer > 2":{"Reduce > Vectorization:":{"enabled:":"true","enableConditionsMet:":["hive.vectorized.execution.reduce.enabled > IS true","hive.execution.engine tez IN [tez, spark] IS > true"],"groupByVectorOutput:":"true","allNative:":"false","usesVectorUDFAdaptor:":"false","vectorized:":"true"}}}}},"Stage-0":{}}} > {code} > or pretty printed: > {code} > { > "PLAN VECTORIZATION": { > "enabled": true, > "enabledConditionsMet": [ > "hive.vectorized.execution.enabled IS true" > ] > }, > "STAGE DEPENDENCIES": { > "Stage-1": { > "ROOT STAGE": "TRUE" > }, > "Stage-0": { > "DEPENDENT STAGES": "Stage-1" > } > }, > "STAGE PLANS": { > "Stage-1": { > "Tez": { > "Edges:": { > "Map 1": [ > { > "parent": "Map 3", > "type": "BROADCAST_EDGE" > }, > { > "parent": "Map 4", > "type": "BROADCAST_EDGE" > } > ], > "Reducer 2": { > "parent": "Map 1", > "type": "SIMPLE_EDGE" > } > }, > "Vertices:": { > "Map 1": { > "Map Vectorization:": { > "enabled:": "true", > "enabledConditionsMet:": [ > "hive.vectorized.use.vectorized.input.format IS true" > ], > "groupByVectorOutput:": "true", > "inputFileFormats:": [ > "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" > ], > "allNative:": "false", > "usesVectorUDFAdaptor:": "false", > "vectorized:": "true" > } > }, > "Map 3": { > "Map Vectorization:": { > "enabled:": "true", > "enabledConditionsMet:": [ > "hive.vectorized.use.vectorized.input.format IS true" > ], > "groupByVectorOutput:": "true", > "inputFileFormats:": [ > "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" > ], > "allNative:": "true", > "usesVectorUDFAdaptor:": "false", > "vectorized:": "true" > } > }, > "Map 4": { > "Map Vectorization:": { > "enabled:": "true", > "enabledConditionsMet:": [ > "hive.vectorized.use.vectorized.input.format IS true" > ], > "groupByVectorOutput:": "true", > "inputFileFormats:": [ > "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" > ], > "allNative:": "true", > "usesVectorUDFAdaptor:": "false", > "vectorized:": "true" > } > }, > "Reducer 2": { > "Reduce Vectorization:": { > "enabled:": "true", > "enableConditionsMet:": [ > "hive.vectorized.execution.reduce.enabled IS true", > "hive.execution.engine tez IN [tez, spark] IS true" > ], > "groupByVectorOutput:": "true", > "allNative:": "false", > "usesVectorUDFAdaptor:": "false", > "vectorized:": "true" > } > } > } > } > }, > "Stage-0": { > > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)