GitHub user SteveLauC closed a discussion: How to filter a row depending on which parquet file it is in?
Say I have multiple parquet files (`0.parquet`, `1.parquet`), when running queries against them, I want to filter specific rows depending on the source file by tweaking the generated physical plan, for example, rows that come from `0.parquet` and have `ID`(assume `ID` is a field stored in the parquet file) 0 should be filtered out, rows that are from `1.parquet` and have `ID` 1 should be filtered out... I see that we can use [FilterExec](https://docs.rs/datafusion/latest/datafusion/physical_plan/filter/struct.FilterExec.html) to do the filter functionality, but with [`PhysicalPlan`](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.PhysicalExpr.html), it seems that we ONLY know the row itself (`RecordBatch`), not the parquet file information: ``` fn evaluate( &self, batch: &RecordBatch ) -> Result<ColumnarValue, DataFusionError>; ``` GitHub link: https://github.com/apache/datafusion/discussions/7979 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
