Chao Sun created HIVE-15492: ------------------------------- Summary: Nested column pruning: be more aggressive on RS value expressions Key: HIVE-15492 URL: https://issues.apache.org/jira/browse/HIVE-15492 Project: Hive Issue Type: Sub-task Components: Query Planning Affects Versions: 2.2.0 Reporter: Chao Sun Assignee: Chao Sun
Currently nested column pruning could still process unnecessary data when handling RS operators. For instance, given the following query (the source table can be found in {{nested_column_pruning.q}}): {code} SELECT t1.s1.f3.f4 FROM nested_tbl_1 t1 JOIN nested_tbl_2 t2 ON t1.s1.f3.f4 = t2.s1.f6 {code} The generated plan is: {code} STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: s1 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0.f3.f4 (type: int) sort order: + Map-reduce partition columns: _col0.f3.f4 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>) TableScan alias: t2 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: s1 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0.f6 (type: int) sort order: + Map-reduce partition columns: _col0.f6 (type: int) Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0.f3.f4 (type: int) 1 _col0.f6 (type: int) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0.f3.f4 (type: int) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} In particular, for table {{t1}} it needs to scan the whole {{s1}} struct since this is in the value expression of the associated RS. This can be further optimized as we only need {{s1.f3.f4}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)