[ https://issues.apache.org/jira/browse/DRILL-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953303#comment-14953303 ]
ASF GitHub Bot commented on DRILL-3921: --------------------------------------- GitHub user sudheeshkatkam opened a pull request: https://github.com/apache/drill/pull/197 DRILL-3921: Initialize the underlying record reader lazily in HiveRec… …ordReader @vkorukanti and @jacques-n can you please take a look. I need to add unit tests. For my setup with 20K files, LIMIT 1 query now takes 53 seconds (~48 seconds for planning). Previously the query took 1300 seconds (~45 seconds for planning). You can merge this pull request into a Git repository by running: $ git pull https://github.com/sudheeshkatkam/drill DRILL-3921 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/197.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #197 ---- commit fdca17f3c223a4f51099616e059c394c8db3974d Author: Sudheesh Katkam <skat...@maprtech.com> Date: 2015-10-12T16:32:15Z DRILL-3921: Initialize the underlying record reader lazily in HiveRecordReader ---- > Hive LIMIT 1 queries take too long > ---------------------------------- > > Key: DRILL-3921 > URL: https://issues.apache.org/jira/browse/DRILL-3921 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Reporter: Sudheesh Katkam > Assignee: Sudheesh Katkam > > Fragment initialization on a Hive table (that is backed by a directory of > many files) can take really long. This is evident through LIMIT 1 queries. > The root cause is that the underlying reader in the HiveRecordReader is > initialized when the ctor is called, rather than when setup is called. > Two changes need to be made: > 1) lazily initialize the underlying record reader in HiveRecordReader > 2) allow for running a callable as a proxy user within an operator (through > OperatorContext). This is required as initialization of the underlying record > reader needs to be done as a proxy user (proxy for owner of the file). > Previously, this was handled while creating the record batch tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)