[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7064:
---------------------------------------
    Labels: ready-to-commit  (was: )

> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries 
> (also prevent eager expansion of files)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7064
>                 URL: https://issues.apache.org/jira/browse/DRILL-7064
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Metadata
>            Reporter: Venkata Jyothsna Donapati
>            Assignee: Aman Sinha
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.16.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary 
> stats: totalRowCount (across all files and row groups) and the per-column 
> totalNullCount (across all files and row groups) to answer plain COUNT 
> aggregation queries without Group-By.  These are currently converted to a 
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group 
> metadata; however this rule is applied on Drill Logical rels and converts the 
> logical plan to a physical plan with DirectScanPrel but this is too late 
> since the DrillScanRel that is already created during logical planning has 
> already read the entire metadata cache file along with its full list of row 
> group entries. The metadata cache file can grow quite large and this does not 
> scale. 
> The solution is to use the Metadata Summary file that is created in 
> DRILL-7063 and create a new rule that will apply early on such that it 
> operates on the Calcite logical rels instead of the Drill logical rels and 
> prevents eager expansion of the list of files/row groups.   
> We will not remove the existing rule. The existing rule will continue to 
> operate as before because it is possible that after some transformations, we 
> still want to apply the optimizations for COUNT queries. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to