Fokko commented on code in PR #377:
URL: https://github.com/apache/iceberg-rust/pull/377#discussion_r1611209532
##########
crates/iceberg/src/scan.rs:
##########
@@ -463,18 +464,19 @@ impl ManifestEvaluatorCache {
}
/// A task to scan part of file.
-#[derive(Debug)]
+#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FileScanTask {
- data_manifest_entry: ManifestEntryRef,
+ data_file_path: String,
Review Comment:
This change makes a lot of sense to me. The statistics are used in the
planning phase to filter out files where possible. The task gets handed over to
the query engine where it will open up the actual file and there it can
leverage the Parquet statistics to skip row groups and such.
The task should be extended with delete files (for example, based on the
upper and lower bound we can efficiently remove unrelated positional deletes).
Optional, but nice, a possibility of a residual predicate (for example, if you
filter on `date(created_at) == '2024-03-01' and user_id = 123` then the first
part of the predicate might be satisfied by the partitioning of the table, and
we just need to filter on the `user_id`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]