[ https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Volodymyr Vysotskyi updated DRILL-7064: --------------------------------------- Labels: ready-to-commit (was: ) > Leverage the summary's totalRowCount and totalNullCount for COUNT() queries > (also prevent eager expansion of files) > ------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-7064 > URL: https://issues.apache.org/jira/browse/DRILL-7064 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata > Reporter: Venkata Jyothsna Donapati > Assignee: Aman Sinha > Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > This sub-task is meant to leverage the Parquet metadata cache's summary > stats: totalRowCount (across all files and row groups) and the per-column > totalNullCount (across all files and row groups) to answer plain COUNT > aggregation queries without Group-By. These are currently converted to a > DirectScan by the ConvertCountToDirectScanRule which utilizes the row group > metadata; however this rule is applied on Drill Logical rels and converts the > logical plan to a physical plan with DirectScanPrel but this is too late > since the DrillScanRel that is already created during logical planning has > already read the entire metadata cache file along with its full list of row > group entries. The metadata cache file can grow quite large and this does not > scale. > The solution is to use the Metadata Summary file that is created in > DRILL-7063 and create a new rule that will apply early on such that it > operates on the Calcite logical rels instead of the Drill logical rels and > prevents eager expansion of the list of files/row groups. > We will not remove the existing rule. The existing rule will continue to > operate as before because it is possible that after some transformations, we > still want to apply the optimizations for COUNT queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)