Ashish Sharma created HIVE-19103:
------------------------------------
Summary: Reading only required column in nested structure schema
in ORC
Key: HIVE-19103
URL: https://issues.apache.org/jira/browse/HIVE-19103
Project: Hive
Issue Type: Improvement
Reporter: Ashish Sharma
Assignee: Ashish Sharma
Reading required columns only in nested structure schema
Example -
*Current state* -
Schema - struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
Query - select c.e.f from t where c.e.f > 10;
Current state - read entire c struct from the file and then filter because
"hive.io.file.readcolumn.ids" is referred due to which all the children column
are select to read from the file.
Conf -
_hive.io.file.readcolumn.ids = "2"
hive.io.file.readNestedColumn.paths = "c.e.f"_
Result -
boolean[ ] include = [true,false,false,true,true,true,true,true]
*Expected state* -
Schema - struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
Query - select c.e.f from t where c.e.f > 10;
Expected state - instead of reading entire c struct from the file just read
only the f column by referring the " hive.io.file.readNestedColumn.paths".
Conf -
_hive.io.file.readcolumn.ids = "2"
hive.io.file.readNestedColumn.paths = "c.e.f"_
Result -
boolean[ ] include = [true,false,false,true,false,true,true,false]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)