[ 
https://issues.apache.org/jira/browse/HIVE-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-13200:
--------------------------------
    Attachment: HIVE-13200.1.patch

When select only on partition column with aggregation function, the 
MetadataOnlyOptimizer is used for the metadata is good enough to provide the 
needed values for the partition column. But it is not true, when need skip 
header and footer which should operate on real rows. 
The empty return for the select on column b is because the 
MetadataOnlyOptimizer creates OneNullRowRecordReader which alway return empty 
return after first skip. 
The patch fixes the issue by do not use MetadataOnlyOptimizer when need skip 
headers/footers. 

> Aggregation functions returning empty rows on partitioned columns
> -----------------------------------------------------------------
>
>                 Key: HIVE-13200
>                 URL: https://issues.apache.org/jira/browse/HIVE-13200
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>         Attachments: HIVE-13200.1.patch
>
>
> Running aggregation functions like MAX, MIN, DISTINCT against partitioned 
> columns will return empty rows if table has property: 
> 'skip.header.line.count'='1'
> Reproduce:
> {noformat}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (a int) 
> PARTITIONED BY (b int) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
> TBLPROPERTIES('skip.header.line.count'='1');
> INSERT OVERWRITE TABLE test PARTITION (b = 1) VALUES (1), (2), (3), (4);
> INSERT OVERWRITE TABLE test PARTITION (b = 2) VALUES (1), (2), (3), (4);
> SELECT * FROM test;
> SELECT DISTINCT b FROM test;
> SELECT MAX(b) FROM test;
> SELECT DISTINCT a FROM test;
> {noformat}
> The output:
> {noformat}
> 0: jdbc:hive2://localhost:10000/default> SELECT * FROM test;
> +---------+---------+--+
> | test.a  | test.b  |
> +---------+---------+--+
> | 2       | 1       |
> | 3       | 1       |
> | 4       | 1       |
> | 2       | 2       |
> | 3       | 2       |
> | 4       | 2       |
> +---------+---------+--+
> 6 rows selected (0.631 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT DISTINCT b FROM test;
> +----+--+
> | b  |
> +----+--+
> +----+--+
> No rows selected (47.229 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT MAX(b) FROM test;
> +-------+--+
> |  _c0  |
> +-------+--+
> | NULL  |
> +-------+--+
> 1 row selected (49.508 seconds)
> 0: jdbc:hive2://localhost:10000/default> SELECT DISTINCT a FROM test;
> +----+--+
> | a  |
> +----+--+
> | 2  |
> | 3  |
> | 4  |
> +----+--+
> 3 rows selected (46.859 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to