Jason Xu created HIVE-22495:
-------------------------------
Summary: Parquet count(*) read in all data
Key: HIVE-22495
URL: https://issues.apache.org/jira/browse/HIVE-22495
Project: Hive
Issue Type: Bug
Components: Reader
Affects Versions: 2.3.4
Reporter: Jason Xu
Assignee: Jason Xu
Running a hive query on a Parquet table
select count(*) from t
The query read in all data (all columns) instead of just metadata.
For comparison, hive 0.13 and Spark read in much less data.
||engine||HDFS data read||
|Hive 2.3.4| 452.9 MB|
|Hive 0.13| 22.5 KB|
|Spark| 41.6 KB|
Seems cause is that Parquet read support fall back to file schema if
indexColumnsWanted is empty.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)