[ https://issues.apache.org/jira/browse/IMPALA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Rawat reassigned IMPALA-2017: -------------------------------------- Assignee: Abhishek Rawat > Lazy materialization of Parquet columns during query > ---------------------------------------------------- > > Key: IMPALA-2017 > URL: https://issues.apache.org/jira/browse/IMPALA-2017 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 1.4, Impala 2.0, Impala 2.1, Impala 2.2 > Reporter: Lou Bershad > Assignee: Abhishek Rawat > Priority: Minor > Labels: parquet, performance > > When I run a query over a 4 billion row table that returns a single row, it > takes ~30 seconds if i do 'select * ...'. It takes only 3 seconds if I do a > 'select field1, field2 ...'. This is repeatable. > Given these times, it would seem that the 'select *' query is materializing > all the fields for rows whether they match or not. > Lazy materialization of columns when they are needed could improve > performance. > > These four queries were run back to back. The actual returned data is elided > (sorry). The table has 35 fields. > {noformat} > 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where > event_id=1416403791; > <elided> > 1 row selected (33.777 seconds) > 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id > from events where event_id=1416403791; > +-------------+------------+--+ > | event_id | client_id | > +-------------+------------+--+ > | 1416403791 | <elided> | > +-------------+------------+--+ > 1 row selected (3.363 seconds) > 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where > event_id=1416403791; > <elided> > 1 row selected (33.138 seconds) > 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id > from events where event_id=1416403791; > +-------------+------------+--+ > | event_id | client_id | > +-------------+------------+--+ > | 1416403791 | <elided> | > +-------------+------------+--+ > 1 row selected (3.074 seconds) > 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org