[
https://issues.apache.org/jira/browse/DRILL-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jinfeng Ni updated DRILL-684:
-----------------------------
Attachment: DRILL-684.1.patch
In addition to the code change for row count, the patch contains bug fixes:
1) set the type's nullable property for extract function, 'any' type in view
DDL or table column list.
2) fix bug in logical/physical Project rule : set up the traits properly.
> Use parquet row count in cost-based optimization. Use parquet row count,
> column value count to optimize count() aggregate function.
> -------------------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-684
> URL: https://issues.apache.org/jira/browse/DRILL-684
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Attachments: DRILL-684.1.patch
>
>
> Parquet group scan provides the exact row count and the exact value count for
> each individual column. Such information could be leveraged in the following
> two ways:
> 1. Use the count in the cost estimation, when query refers parquet files.
> 2. Use the row count or column value count to optimize count() aggregate
> function.
> For instance, select count(*) from parquet_file;
> select count(column_a) from parquet_file;
> First query could be transformed to return the row count directly, the second
> one could return the column value count for 'column_a'. Both of the two cases
> will avoid scan the whole parquet files, thus improve query performance.
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)