[ 
https://issues.apache.org/jira/browse/CALCITE-7618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diveyam Mishra updated CALCITE-7618:
------------------------------------
    Description: 
The file adapter's CSV implementation currently supports projection pushdown 
but does not appear to support filter pushdown.

For example:

{code}
SELECT name, empno
FROM EMPS
WHERE deptno = 20
{code}

The resulting plan is:

{code}
PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], 
NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
{code}

Desired Plan could be something like:
{code}
CsvTableScan(..., filters=[deptno=20])
{code}

The filter condition is evaluated in the upper \{{EnumerableCalc}} node rather 
than during the CSV scan itself. As a result, all rows are read from the 
underlying CSV file and filtering occurs afterward.

The file adapter already contains infrastructure related to filtering:
 * {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
 * {\{CsvTable}} was originally derived from the demo CSV adapter's filterable 
implementation.
 * The file adapter defines table flavors including \{{FILTERABLE}}.

However, the file adapter currently exposes \{{CsvTranslatableTable}}, and 
there does not appear to be a mechanism that translates pushdown-compatible 
filter predicates into the filtering capabilities already present in 
\{{CsvEnumerator}}.

Evidence:
 * Physical plan for the query above contains \{{EnumerableCalc}} over 
\{{CsvTableScan}}.
 * Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
 * No file-adapter-specific filter pushdown rule appears to fire.
 * Filter evaluation remains outside the scan.

This results in unnecessary parsing and processing of rows that could 
potentially be eliminated during scanning.

  was:
The file adapter's CSV implementation currently supports projection pushdown 
but does not appear to support filter pushdown.

For example:

{code}
SELECT name, empno
FROM EMPS
WHERE deptno = 20
{code}

The resulting plan is:

{code}
PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], 
NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
{code}

The filter condition is evaluated in the upper \{{EnumerableCalc}} node rather 
than during the CSV scan itself. As a result, all rows are read from the 
underlying CSV file and filtering occurs afterward.

The file adapter already contains infrastructure related to filtering:
 * {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
 * {\{CsvTable}} was originally derived from the demo CSV adapter's filterable 
implementation.
 * The file adapter defines table flavors including \{{FILTERABLE}}.

However, the file adapter currently exposes \{{CsvTranslatableTable}}, and 
there does not appear to be a mechanism that translates pushdown-compatible 
filter predicates into the filtering capabilities already present in 
\{{CsvEnumerator}}.

Evidence:
 * Physical plan for the query above contains \{{EnumerableCalc}} over 
\{{CsvTableScan}}.
 * Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
 * No file-adapter-specific filter pushdown rule appears to fire.
 * Filter evaluation remains outside the scan.

This results in unnecessary parsing and processing of rows that could 
potentially be eliminated during scanning.


> Add filter pushdown support to the file adapter's CSV table implementation
> --------------------------------------------------------------------------
>
>                 Key: CALCITE-7618
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7618
>             Project: Calcite
>          Issue Type: Improvement
>          Components: file-adapter
>            Reporter: Diveyam Mishra
>            Assignee: Diveyam Mishra
>            Priority: Minor
>
> The file adapter's CSV implementation currently supports projection pushdown 
> but does not appear to support filter pushdown.
> For example:
> {code}
> SELECT name, empno
> FROM EMPS
> WHERE deptno = 20
> {code}
> The resulting plan is:
> {code}
> PLAN=EnumerableCalc(expr#0..2=[\{inputs}], expr#3=[20], expr#4=[=($t2, $t3)], 
> NAME=[$t1], EMPNO=[$t0], $condition=[$t4])
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2]])
> {code}
> Desired Plan could be something like:
> {code}
> CsvTableScan(..., filters=[deptno=20])
> {code}
> The filter condition is evaluated in the upper \{{EnumerableCalc}} node 
> rather than during the CSV scan itself. As a result, all rows are read from 
> the underlying CSV file and filtering occurs afterward.
> The file adapter already contains infrastructure related to filtering:
>  * {\{CsvEnumerator}} contains \{{filterValues}}-based row filtering logic.
>  * {\{CsvTable}} was originally derived from the demo CSV adapter's 
> filterable implementation.
>  * The file adapter defines table flavors including \{{FILTERABLE}}.
> However, the file adapter currently exposes \{{CsvTranslatableTable}}, and 
> there does not appear to be a mechanism that translates pushdown-compatible 
> filter predicates into the filtering capabilities already present in 
> \{{CsvEnumerator}}.
> Evidence:
>  * Physical plan for the query above contains \{{EnumerableCalc}} over 
> \{{CsvTableScan}}.
>  * Planner logs show projection pushdown via \{{CsvProjectTableScanRule}}.
>  * No file-adapter-specific filter pushdown rule appears to fire.
>  * Filter evaluation remains outside the scan.
> This results in unnecessary parsing and processing of rows that could 
> potentially be eliminated during scanning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to