[PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-04 Thread via GitHub


sdd opened a new pull request, #323:
URL: https://github.com/apache/iceberg-rust/pull/323

   This PR was broken out of https://github.com/apache/iceberg-rust/pull/241 as 
that PR was getting too large.
   
   It depends on https://github.com/apache/iceberg-rust/pull/322, and 
integrates the `ManifestEvaluator` from that PR into `TableScan`, to perform 
the filtering capability provided by `ManfestEvaluator` on the manifest files 
in the manifest for the scan's snapshot, rejecting any manifests that don't 
match the table scan's filter predicate, if present.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-19 Thread via GitHub


marvinlanhenke commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1572233768


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +239,27 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_eval_factory(
+//&self,
+id: &i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> ManifestEvaluator {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();

Review Comment:
   I'll guess this is the place where we want to apply `rewrite_not` - which 
probably resolves our previous discussion in the PR for the 
`InclusiveProjection`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-19 Thread via GitHub


sdd commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1572771120


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +239,27 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_eval_factory(
+//&self,
+id: &i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> ManifestEvaluator {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();

Review Comment:
   We already do it here? 
https://github.com/apache/iceberg-rust/pull/323/files/8c6ed23c7a226e3af594889e70b5dabf6118fe02#diff-bbfbe5e334be6c501ba2ca0ddd84d658ff0f3f84f2b5b532212f4e096a09e09bR76



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-19 Thread via GitHub


marvinlanhenke commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1572785208


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +239,27 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_eval_factory(
+//&self,
+id: &i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> ManifestEvaluator {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();

Review Comment:
   Well I guess finally my search for the dreaded rewrite_not has come to an 
end 😁 thanks for your help



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-19 Thread via GitHub


sdd commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1572787996


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +239,27 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_eval_factory(
+//&self,
+id: &i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> ManifestEvaluator {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();

Review Comment:
   
![image](https://github.com/apache/iceberg-rust/assets/323439/05abda94-649e-4a14-8294-58a7cb59ed4b)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-23 Thread via GitHub


liurenjie1024 commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1576369286


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +239,27 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_eval_factory(
+//&self,
+id: &i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> ManifestEvaluator {

Review Comment:
   We should return `Result` rather panic here.



##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +239,27 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_eval_factory(

Review Comment:
   ```suggestion
   fn create_manifest_evaluator(
   ```
   I think it's creating `ManifestEvaluator` rather than factory?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-24 Thread via GitHub


sdd commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1577443557


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +239,27 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_eval_factory(

Review Comment:
   Aah yes, the variable name has been left over from the previous name for the 
`ManifestEvaluator`. Will fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-24 Thread via GitHub


sdd commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1578902001


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +239,27 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_eval_factory(
+//&self,
+id: &i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> ManifestEvaluator {

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1579248989


##
crates/iceberg/src/scan.rs:
##
@@ -158,8 +196,24 @@ impl TableScan {
 .await?;
 
 // Generate data file stream
-let mut entries = iter(manifest_list.entries());
-while let Some(entry) = entries.next().await {
+for entry in manifest_list.entries() {
+// If this scan has a filter, check the partition evaluator 
cache for an existing
+// PartitionEvaluator that matches this manifest's partition 
spec ID.
+// Use one from the cache if there is one. If not, create one, 
put it in
+// the cache, and take a reference to it.
+#[allow(clippy::map_entry)]
+if let Some(filter) = filter.as_ref() {
+if 
!manifest_evaluator_cache.contains_key(&entry.partition_spec_id) {

Review Comment:
   TBH I prefer the entry api which make code more concise, but this is not a 
blocker.



##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +240,25 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_evaluator(
+id: i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> crate::Result {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();

Review Comment:
   ```suggestion
   let bound_predicate = filter.bind(schema.clone(), case_sensitive)?;
   ```
   
   We should not panic here.



##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +240,25 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_evaluator(
+id: i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> crate::Result {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();
+
+let partition_spec = table_metadata.partition_spec_by_id(id).unwrap();

Review Comment:
   ```suggestion
   let partition_spec = table_metadata.partition_spec_by_id(id)?;
   ```
   
   Why panic here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


sdd commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1580038792


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +240,25 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_evaluator(
+id: i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> crate::Result {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();
+
+let partition_spec = table_metadata.partition_spec_by_id(id).unwrap();

Review Comment:
   Sorry, old code here. Will push an update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


sdd commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1580039092


##
crates/iceberg/src/scan.rs:
##
@@ -186,6 +240,25 @@ impl TableScan {
 .boxed())
 }
 
+fn create_manifest_evaluator(
+id: i32,
+schema: SchemaRef,
+table_metadata: Arc,
+case_sensitive: bool,
+filter: &Predicate,
+) -> crate::Result {
+let bound_predicate = filter.bind(schema.clone(), 
case_sensitive).unwrap();

Review Comment:
   sorry, thought I'd pushed a fix for this but it is on the other PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


sdd commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1580049448


##
crates/iceberg/src/scan.rs:
##
@@ -158,8 +196,24 @@ impl TableScan {
 .await?;
 
 // Generate data file stream
-let mut entries = iter(manifest_list.entries());
-while let Some(entry) = entries.next().await {
+for entry in manifest_list.entries() {
+// If this scan has a filter, check the partition evaluator 
cache for an existing
+// PartitionEvaluator that matches this manifest's partition 
spec ID.
+// Use one from the cache if there is one. If not, create one, 
put it in
+// the cache, and take a reference to it.
+#[allow(clippy::map_entry)]
+if let Some(filter) = filter.as_ref() {
+if 
!manifest_evaluator_cache.contains_key(&entry.partition_spec_id) {

Review Comment:
   I also prefer the entry API, but `Self::create_manifest_evaluator` returns a 
`Result` rather than a `ManifestEvaluator`. The `?` operator 
can't be used inside the closure passed to the Entry API's 
[or_insert_with_key](https://doc.rust-lang.org/std/collections/hash_map/enum.Entry.html#method.or_insert_with_key)
 method, and this was the least ugly way I found of handling that. If you know 
of a nice way of using the `or_insert_with_key` with en entry_generating 
function that returns a `Result`, please share! :-) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 commented on code in PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#discussion_r1580334216


##
crates/iceberg/src/scan.rs:
##
@@ -158,8 +196,24 @@ impl TableScan {
 .await?;
 
 // Generate data file stream
-let mut entries = iter(manifest_list.entries());
-while let Some(entry) = entries.next().await {
+for entry in manifest_list.entries() {
+// If this scan has a filter, check the partition evaluator 
cache for an existing
+// PartitionEvaluator that matches this manifest's partition 
spec ID.
+// Use one from the cache if there is one. If not, create one, 
put it in
+// the cache, and take a reference to it.
+#[allow(clippy::map_entry)]
+if let Some(filter) = filter.as_ref() {
+if 
!manifest_evaluator_cache.contains_key(&entry.partition_spec_id) {

Review Comment:
   Oh, got it. Thanks for clarification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 commented on PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#issuecomment-2078490131

   Let's wait a moment to see if others have comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 merged PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Implement manifest filtering in `TableScan` [iceberg-rust]

2024-04-25 Thread via GitHub


liurenjie1024 commented on PR #323:
URL: https://github.com/apache/iceberg-rust/pull/323#issuecomment-2078666082

   Thanks @sdd for this pr, and @Xuanwo @marvinlanhenke for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org