GitHub user SteveLauC closed a discussion: How to filter a row depending on 
which parquet file it is in?

Say I have multiple parquet files (`0.parquet`, `1.parquet`), when running 
queries against them, I want to filter specific rows depending on the source 
file by tweaking the generated physical plan, for example, rows that come from 
`0.parquet` and have `ID`(assume `ID` is a field stored in the parquet file) 0 
should be filtered out, rows that are from `1.parquet` and have `ID` 1 should 
be filtered out...

I see that we can use 
[FilterExec](https://docs.rs/datafusion/latest/datafusion/physical_plan/filter/struct.FilterExec.html)
 to do the filter functionality, but with 
[`PhysicalPlan`](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.PhysicalExpr.html),
 it seems that we ONLY know the row itself (`RecordBatch`), not the parquet 
file information:

```
fn evaluate(
    &self,
    batch: &RecordBatch
) -> Result<ColumnarValue, DataFusionError>;
```


GitHub link: https://github.com/apache/datafusion/discussions/7979

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to