xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192595733


##########
datafusion/datasource-parquet/src/opener.rs:
##########
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
         let file_metadata = Arc::clone(reader_metadata.metadata());
         let rg_metadata = file_metadata.row_groups();
 
-        // Filter pushdown: evaluate predicates during scan
-        let row_filter = if let Some(predicate) = prepared
+        // Filter pushdown: evaluate predicates during scan.
+        // Keep the predicate around so we can rebuild RowFilter per decoder 
run
+        // when fully matched row groups split the scan into multiple decoders.
+        let pushdown_predicate = prepared
             .pushdown_filters
             .then_some(prepared.predicate.clone())
-            .flatten()
-        {
-            let row_filter = row_filter::build_row_filter(
-                &predicate,
-                &prepared.physical_file_schema,
-                file_metadata.as_ref(),
-                prepared.reorder_predicates,
-                &prepared.file_metrics,
-            );
+            .flatten();
 
-            match row_filter {
-                Ok(Some(filter)) => Some(filter),
-                Ok(None) => None,
-                Err(e) => {
-                    debug!("Ignoring error building row filter for 
'{predicate:?}': {e}");
-                    None
+        let try_build_row_filter =
+            |predicate: &Arc<dyn PhysicalExpr>| -> Option<RowFilter> {
+                match row_filter::build_row_filter(
+                    predicate,
+                    &prepared.physical_file_schema,
+                    file_metadata.as_ref(),
+                    prepared.reorder_predicates,
+                    &prepared.file_metrics,
+                ) {
+                    Ok(Some(filter)) => Some(filter),
+                    Ok(None) => None,
+                    Err(e) => {
+                        debug!(
+                            "Ignoring error building row filter for 
'{predicate:?}': {e}"
+                        );
+                        None
+                    }
                 }
-            }
-        } else {
-            None
-        };
+            };
+
+        // Build the first RowFilter eagerly; it will be reused for the first

Review Comment:
   Thanks, that makes sense. I agree both would make the code easier to read.
   Since this PR is already focused on the fully matched row group behavior, 
I’ll keep this as-is here and follow up with a small cleanup PR to introduce 
helper(s) for RowFilter generation / decoder building.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to