[jira] [Commented] (DRILL-4905) Push down the LIMIT to the parquet reader scan to limit the numbers of records read

ASF GitHub Bot (JIRA) Thu, 13 Oct 2016 17:20:18 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573623#comment-15573623
 ]


ASF GitHub Bot commented on DRILL-4905:
---------------------------------------

Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/597#discussion_r83335915
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetGroupScan.java
 ---
    @@ -117,4 +119,18 @@ public void testSelectEmptyNoCache() throws Exception {
             uex.getMessage()), uex.getMessage().contains(expectedMsg));
         }
       }
    +
    +  @Test
    +  public void testLimit() throws Exception {
    +    List<QueryDataBatch> results = 
testSqlWithResults(String.format("select * from cp.`parquet/limitTest.parquet` 
limit 1"));
    --- End diff --
    
    In stead of using a new parquet file, you may consider using 
cp.`tpch/nation.parquet`, unless the new file has some properties that the 
existing files do not have. It's preferable to use existing file, since 1) 
reduce drill package size, 2) people is more familiar with tpch sample files; 
making it easier to understand the expected result.



> Push down the LIMIT to the parquet reader scan to limit the numbers of 
> records read
> -----------------------------------------------------------------------------------
>
>                 Key: DRILL-4905
>                 URL: https://issues.apache.org/jira/browse/DRILL-4905
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.8.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>             Fix For: 1.9.0
>
>
> Limit the number of records read from disk by pushing down the limit to 
> parquet reader.
> For queries like
> select * from <table> limit N; 
> where N < size of Parquet row group, we are reading 32K/64k rows or entire 
> row group. This needs to be optimized to read only N rows.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4905) Push down the LIMIT to the parquet reader scan to limit the numbers of records read

Reply via email to