[
https://issues.apache.org/jira/browse/CALCITE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diveyam Mishra updated CALCITE-7618:
------------------------------------
Description:
The file adapter's CSV implementation currently supports projection pushdown
but does not appear to support filter pushdown.
For example:
{code}
SELECT name, empno
FROM EMPS
WHERE deptno = 20
{code}
The resulting plan is:
{code}
PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)],
NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
{code}
Desired Plan could be something like:
{code}
CsvTableScan(..., filters=[deptno=20])
{code}
The filter condition is evaluated in the upper \{{EnumerableCalc}} node rather
than during the CSV scan itself. As a result, all rows are read from the
underlying CSV file and filtering occurs afterward.
The file adapter already contains infrastructure related to filtering:
* {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
* {\{CsvTable}} was originally derived from the demo CSV adapter's filterable
implementation.
* The file adapter defines table flavors including \{{FILTERABLE}}.
However, the file adapter currently exposes \{{CsvTranslatableTable}}, and
there does not appear to be a mechanism that translates pushdown-compatible
filter predicates into the filtering capabilities already present in
\{{CsvEnumerator}}.
Evidence:
* Physical plan for the query above contains \{{EnumerableCalc}} over
\{{CsvTableScan}}.
* Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
* No file-adapter-specific filter pushdown rule appears to fire.
* Filter evaluation remains outside the scan.
This results in unnecessary parsing and processing of rows that could
potentially be eliminated during scanning.
was:
The file adapter's CSV implementation currently supports projection pushdown
but does not appear to support filter pushdown.
For example:
{code}
SELECT name, empno
FROM EMPS
WHERE deptno = 20
{code}
The resulting plan is:
{code}
PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)],
NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
{code}
The filter condition is evaluated in the upper \{{EnumerableCalc}} node rather
than during the CSV scan itself. As a result, all rows are read from the
underlying CSV file and filtering occurs afterward.
The file adapter already contains infrastructure related to filtering:
* {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
* {\{CsvTable}} was originally derived from the demo CSV adapter's filterable
implementation.
* The file adapter defines table flavors including \{{FILTERABLE}}.
However, the file adapter currently exposes \{{CsvTranslatableTable}}, and
there does not appear to be a mechanism that translates pushdown-compatible
filter predicates into the filtering capabilities already present in
\{{CsvEnumerator}}.
Evidence:
* Physical plan for the query above contains \{{EnumerableCalc}} over
\{{CsvTableScan}}.
* Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
* No file-adapter-specific filter pushdown rule appears to fire.
* Filter evaluation remains outside the scan.
This results in unnecessary parsing and processing of rows that could
potentially be eliminated during scanning.
> Add filter pushdown support to the file adapter's CSV table implementation
> --------------------------------------------------------------------------
>
> Key: CALCITE-7618
> URL: https://issues.apache.org/jira/browse/CALCITE-7618
> Project: Calcite
> Issue Type: Improvement
> Components: file-adapter
> Reporter: Diveyam Mishra
> Assignee: Diveyam Mishra
> Priority: Minor
>
> The file adapter's CSV implementation currently supports projection pushdown
> but does not appear to support filter pushdown.
> For example:
> {code}
> SELECT name, empno
> FROM EMPS
> WHERE deptno = 20
> {code}
> The resulting plan is:
> {code}
> PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)],
> NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
> {code}
> Desired Plan could be something like:
> {code}
> CsvTableScan(..., filters=[deptno=20])
> {code}
> The filter condition is evaluated in the upper \{{EnumerableCalc}} node
> rather than during the CSV scan itself. As a result, all rows are read from
> the underlying CSV file and filtering occurs afterward.
> The file adapter already contains infrastructure related to filtering:
> * {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
> * {\{CsvTable}} was originally derived from the demo CSV adapter's
> filterable implementation.
> * The file adapter defines table flavors including \{{FILTERABLE}}.
> However, the file adapter currently exposes \{{CsvTranslatableTable}}, and
> there does not appear to be a mechanism that translates pushdown-compatible
> filter predicates into the filtering capabilities already present in
> \{{CsvEnumerator}}.
> Evidence:
> * Physical plan for the query above contains \{{EnumerableCalc}} over
> \{{CsvTableScan}}.
> * Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
> * No file-adapter-specific filter pushdown rule appears to fire.
> * Filter evaluation remains outside the scan.
> This results in unnecessary parsing and processing of rows that could
> potentially be eliminated during scanning.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)