Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 merged PR #309: URL: https://github.com/apache/iceberg-rust/pull/309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2039348517 > Hi, @marvinlanhenke Thanks for pr, it looks great! I have some small suggestion to restructure the code to make it easier for review. Really greatful for these tests! Thanks for the review - I did some minor fixes according to your suggestions. If no other comments, I think we're good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553275061 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, ), +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( Review Comment: I missed that, we don't need a Generic. Fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553272539 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, ), +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( +, +name: String, +expr: , +func: , +) -> Result> { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, + +} + +/// Projects a binary expression to a predicate with an adjusted boundary. +/// +/// Checks if the literal within the given binary expression is +/// transformable. If transformable, it proceeds to potentially adjust +/// the boundary of the expression based on the comparison operator (`op`). +/// The potential adjustments involve incrementing or decrementing the +/// literal value and changing the `PredicateOperator` itself to its +/// inclusive variant. +fn project_binary_with_adjusted_boundary( +, +name: String, +expr: , +func: , +
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553271703 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, ), +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( +, +name: String, +expr: , +func: , +) -> Result> { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, + +} + +/// Projects a binary expression to a predicate with an adjusted boundary. +/// +/// Checks if the literal within the given binary expression is +/// transformable. If transformable, it proceeds to potentially adjust +/// the boundary of the expression based on the comparison operator (`op`). +/// The potential adjustments involve incrementing or decrementing the +/// literal value and changing the `PredicateOperator` itself to its +/// inclusive variant. +fn project_binary_with_adjusted_boundary( +, +name: String, +expr: , +func: , +
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1553262971 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, ), +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( +, +name: String, +expr: , +func: , +) -> Result> { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, + +} + +/// Projects a binary expression to a predicate with an adjusted boundary. +/// +/// Checks if the literal within the given binary expression is +/// transformable. If transformable, it proceeds to potentially adjust +/// the boundary of the expression based on the comparison operator (`op`). +/// The potential adjustments involve incrementing or decrementing the +/// literal value and changing the `PredicateOperator` itself to its +/// inclusive variant. +fn project_binary_with_adjusted_boundary( +, +name: String, +expr: , +func: , +
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2039007641 cc @Fokko Do you have other comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1548808318 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,323 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +match self { +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +, +BoundPredicate::Set(expr) => Ok(Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +, +_ => Ok(None), +}, +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => self.project_binary(name, expr, ), +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Truncate(width) => match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , Some(*width)) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +}, +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +match predicate { +BoundPredicate::Unary(expr) => Self::project_unary(expr.op(), name), +BoundPredicate::Binary(expr) => { +self.project_binary_with_adjusted_boundary(name, expr, , None) +} +BoundPredicate::Set(expr) => self.project_set(expr, name, ), +_ => Ok(None), +} +} +_ => Ok(None), +} +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Creates a unary predicate from a given operator and a reference name. +fn project_unary(op: PredicateOperator, name: String) -> Result> { +Ok(Some(Predicate::Unary(UnaryExpression::new( +op, +Reference::new(name), + +} + +/// Attempts to create a binary predicate based on a binary expression, +/// if applicable. +/// +/// This method evaluates a given binary expression and, if the operation +/// is equality (`Eq`) and the literal can be transformed, constructs a +/// `Predicate::Binary`variant representing the binary operation. +fn project_binary( +, +name: String, +expr: , +func: , +) -> Result> { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Ok(Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, + +} + +/// Projects a binary expression to a predicate with an adjusted boundary. +/// +/// Checks if the literal within the given binary expression is +/// transformable. If transformable, it proceeds to potentially adjust +/// the boundary of the expression based on the comparison operator (`op`). +/// The potential adjustments involve incrementing or decrementing the +/// literal value and changing the `PredicateOperator` itself to its +/// inclusive variant. +fn project_binary_with_adjusted_boundary( +, +name: String, +expr: , +func: , +
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547819858 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { Review Comment: Yeah, it looks much better now, thanks! I'll take a careful review later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547785953 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { Review Comment: @liurenjie1024 I did a refactor changing the structure (matching order). I also extracted common functionality, renamed those helpers and updated the docs. I hope not only the structure but the overall design is more readable and understandable with those changes applied? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547681855 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { Review Comment: sure, I'm already implementing it - since I wanted to compare for myself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547676828 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { Review Comment: I think some code duplication is worth so that we can have better readability? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547667177 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { Review Comment: I had the structure you suggested in an earlier version. I changed it the other way around since `predicate` has the smaller cardinality, which allows me to group more transforms into a single predicate match arm. I can change it back, however this would introduce more match arms and some code duplication? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547649993 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { +Transform::Identity +| Transform::Bucket(_) +| Transform::Truncate(_) +| Transform::Year +| Transform::Month +| Transform::Day +| Transform::Hour => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +_ => None, +}, +BoundPredicate::Binary(expr) => match self { +Transform::Identity => Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +))), +Transform::Bucket(_) => { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, +))) +} +Transform::Truncate(width) => { +if !self.can_transform(expr.literal()) { +return Ok(None); +} + +self.transform_projected_boundary( +name, +expr.literal(), +(), +, +Some(*width), +)? +} +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +if !self.can_transform(expr.literal()) { +return Ok(None); +} + +self.transform_projected_boundary( +name, +expr.literal(), +(), +, +None, +)? +} +_ => None, +}, +BoundPredicate::Set(expr) => match self { +Transform::Identity => Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +))), +Transform::Bucket(_) +| Transform::Truncate(_) +| Transform::Year +| Transform::Month +| Transform::Day +| Transform::Hour => { +if expr.op() != PredicateOperator::In +|| expr.literals().iter().any(|d| !self.can_transform(d)) +{ +return Ok(None); +} + +Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +self.transform_set(expr.literals(), )?, +))) +} +_ => None, +}, +_ => None, +}; + +Ok(projection) +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Transform each literal value of `FnvHashSet` +fn transform_set( +, +literals: , +func: , +) -> Result> { +let mut new_set = FnvHashSet::default(); + +for lit in literals { +let datum = func.transform_literal_result(lit)?; + +if let Some(AdjustedProjection::Single(d)) = +
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547653811 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { +Transform::Identity +| Transform::Bucket(_) +| Transform::Truncate(_) +| Transform::Year +| Transform::Month +| Transform::Day +| Transform::Hour => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +_ => None, +}, +BoundPredicate::Binary(expr) => match self { +Transform::Identity => Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +))), +Transform::Bucket(_) => { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, +))) +} +Transform::Truncate(width) => { +if !self.can_transform(expr.literal()) { +return Ok(None); +} + +self.transform_projected_boundary( +name, +expr.literal(), +(), +, +Some(*width), +)? +} +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +if !self.can_transform(expr.literal()) { +return Ok(None); +} + +self.transform_projected_boundary( +name, +expr.literal(), +(), +, +None, +)? +} +_ => None, +}, +BoundPredicate::Set(expr) => match self { +Transform::Identity => Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +))), +Transform::Bucket(_) +| Transform::Truncate(_) +| Transform::Year +| Transform::Month +| Transform::Day +| Transform::Hour => { +if expr.op() != PredicateOperator::In +|| expr.literals().iter().any(|d| !self.can_transform(d)) +{ +return Ok(None); +} + +Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +self.transform_set(expr.literals(), )?, +))) +} +_ => None, +}, +_ => None, +}; + +Ok(projection) +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Transform each literal value of `FnvHashSet` +fn transform_set( +, +literals: , +func: , +) -> Result> { +let mut new_set = FnvHashSet::default(); + +for lit in literals { +let datum = func.transform_literal_result(lit)?; + +if let Some(AdjustedProjection::Single(d)) = +
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547639860 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { +Transform::Identity +| Transform::Bucket(_) +| Transform::Truncate(_) +| Transform::Year +| Transform::Month +| Transform::Day +| Transform::Hour => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +_ => None, +}, +BoundPredicate::Binary(expr) => match self { +Transform::Identity => Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +))), +Transform::Bucket(_) => { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, +))) +} +Transform::Truncate(width) => { +if !self.can_transform(expr.literal()) { +return Ok(None); +} + +self.transform_projected_boundary( +name, +expr.literal(), +(), +, +Some(*width), +)? +} +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +if !self.can_transform(expr.literal()) { +return Ok(None); +} + +self.transform_projected_boundary( +name, +expr.literal(), +(), +, +None, +)? +} +_ => None, +}, +BoundPredicate::Set(expr) => match self { +Transform::Identity => Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +))), +Transform::Bucket(_) +| Transform::Truncate(_) +| Transform::Year +| Transform::Month +| Transform::Day +| Transform::Hour => { +if expr.op() != PredicateOperator::In +|| expr.literals().iter().any(|d| !self.can_transform(d)) +{ +return Ok(None); +} + +Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +self.transform_set(expr.literals(), )?, +))) +} +_ => None, +}, +_ => None, +}; + +Ok(projection) +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Transform each literal value of `FnvHashSet` +fn transform_set( +, +literals: , +func: , +) -> Result> { +let mut new_set = FnvHashSet::default(); + +for lit in literals { +let datum = func.transform_literal_result(lit)?; + +if let Some(AdjustedProjection::Single(d)) = +
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2031700958 > Hi, @marvinlanhenke Thanks for pr, it looks great! I have some small suggestion to restructure the code to make it easier for review. Really greatful for these tests! Thanks for the review, I'll get to your suggestions - those should be easy to fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1547488931 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +269,300 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match predicate { +BoundPredicate::Unary(expr) => match self { +Transform::Identity +| Transform::Bucket(_) +| Transform::Truncate(_) +| Transform::Year +| Transform::Month +| Transform::Day +| Transform::Hour => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +_ => None, +}, +BoundPredicate::Binary(expr) => match self { +Transform::Identity => Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +))), +Transform::Bucket(_) => { +if expr.op() != PredicateOperator::Eq || !self.can_transform(expr.literal()) { +return Ok(None); +} + +Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +func.transform_literal_result(expr.literal())?, +))) +} +Transform::Truncate(width) => { +if !self.can_transform(expr.literal()) { +return Ok(None); +} + +self.transform_projected_boundary( +name, +expr.literal(), +(), +, +Some(*width), +)? +} +Transform::Year | Transform::Month | Transform::Day | Transform::Hour => { +if !self.can_transform(expr.literal()) { +return Ok(None); +} + +self.transform_projected_boundary( +name, +expr.literal(), +(), +, +None, +)? +} +_ => None, +}, +BoundPredicate::Set(expr) => match self { +Transform::Identity => Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +))), +Transform::Bucket(_) +| Transform::Truncate(_) +| Transform::Year +| Transform::Month +| Transform::Day +| Transform::Hour => { +if expr.op() != PredicateOperator::In +|| expr.literals().iter().any(|d| !self.can_transform(d)) +{ +return Ok(None); +} + +Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +self.transform_set(expr.literals(), )?, +))) +} +_ => None, +}, +_ => None, +}; + +Ok(projection) +} + +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> bool { +let input_type = datum.data_type().clone(); +self.result_type(::Primitive(input_type)).is_ok() +} + +/// Transform each literal value of `FnvHashSet` +fn transform_set( +, +literals: , +func: , +) -> Result> { +let mut new_set = FnvHashSet::default(); + +for lit in literals { +let datum = func.transform_literal_result(lit)?; + +if let Some(AdjustedProjection::Single(d)) = +
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2029473672 > Hi, @marvinlanhenke I skimmed through the code and it seems that we only need to split tests into separate modules to make it easier to maintain and read. But I agree that we can merge it first to unblock following features, so it's up to you. Take your time while reviewing in the meantime - I'll open another draft with a possible refactor and we can compare whats best? I 'm thinking of splitting the code into Identity, Bucket, Truncate and Temporal logic - including the tests. This however might introduce some minor code duplication (I'll have to try and see for myself though) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2029468475 Hi, @marvinlanhenke I skimmed through the code and it seems that we only need to split tests into separate modules to make it easier to maintain and read. But I agree that we can merge it first to unblock following features, so it's up to you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2029456431 > > @Fokko @liurenjie1024 PTAL > > I ported all the tests - and fixed #311. Now, all tests are passing and align with the Java implementation. I think we only can optimize the design/ structure by perhaps moving the test-suite to the respective transforms (e.g. bucket.rs, etc)? Other than that I think we are good to go? > > Hi, @marvinlanhenke Thanks, I'll take a careful look of this pr. About moving the specific logic into respective transforms, do you plan to do it in this pr or in following prs? ... I haven made up my mind yet. I think we can merge this if its okay, in order to unblock #253? Also I'm not so sure anymore we have to move the logic at all? Since most of the helper functions handle logic that is not dependent of the type of transform? And don't know - perhaps you can outline the refactor high-level what you have in mind? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2029440339 > @Fokko @liurenjie1024 PTAL > > I ported all the tests - and fixed #311. Now, all tests are passing and align with the Java implementation. I think we only can optimize the design/ structure by perhaps moving the test-suite to the respective transforms (e.g. bucket.rs, etc)? Other than that I think we are good to go? Hi, @marvinlanhenke Thanks, I'll take a careful look of this pr. About moving the specific logic into respective transforms, do you plan to do it in this pr or in following prs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2029426833 @Fokko @liurenjie1024 PTAL I ported all the tests - and fixed #311. Now, all tests are passing and align with the Java implementation. I think we only can optimize the design/ structure by perhaps moving the test-suite to the respective transforms (e.g. bucket.rs, etc)? Other than that I think we are good to go? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2028045351 @Fokko @liurenjie1024 I'm having trouble understanding/ verifiying the test-case for [timestamps_day_inclusive_epoch](https://github.com/apache/iceberg/blob/d350c9b8c995a2953aa8b80a0a1fc7cadc4dd16a/api/src/test/java/org/apache/iceberg/transforms/TestTimestampsProjection.java#L168) for the case `greater than`. If I understood correctly timestamps are handled as [follows](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/transforms/Timestamps.java#L134-L137): - call `truncateLong` -> add +1L, rewrite OP to GT_EQ and apply transformation - then `fixInclusiveTimeProjection` -> for GT_EQ -> do nothing, simply return so this would leave me with a predicate with an OP -> GT_EQ and a new literal (+1L) -> "1970-01-02T00:00:00.00"? Why would the test then check for "1970-01-01" // I can't find the part I'm missing; Is there some part that adjusts the boundary after the fact? Or is the +1L in truncateLong not applied? This is the last part I need to port. Thanks for your help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2026487818 Hi, @marvinlanhenke Thanks for your contribution, the overall design looks good to me! Please take your time and enjoy time with family -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1543956147 ## crates/iceberg/src/transform/mod.rs: ## @@ -37,6 +36,15 @@ pub trait TransformFunction: Send { fn transform(, input: ArrayRef) -> Result; /// transform_literal will take an input literal and transform it into a new literal. fn transform_literal(, input: ) -> Result>; +/// wrapper Review Comment: ```suggestion /// A thin wrapper around `transform_literal` to return error even when it's `None`. ``` ## crates/iceberg/src/spec/transform.rs: ## @@ -384,6 +394,14 @@ impl Transform { Ok(projection) } +/// Check if `Transform` is applicable on datum's `PrimitiveType` +fn can_transform(, datum: ) -> Result<()> { Review Comment: Why not just return bool directly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1543730206 ## crates/iceberg/src/spec/transform.rs: ## @@ -398,6 +679,80 @@ mod tests { } } +#[test] +fn test_none_projection() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Void; +let result_unary = transform.project(name.clone(), )?; +assert!(result_unary.is_none()); + +let transform = Transform::Year; +let result_binary = transform.project(name.clone(), )?; +assert!(result_binary.is_none()); + +let transform = Transform::Month; +let result_set = transform.project(name.clone(), )?; +assert!(result_set.is_none()); + +Ok(()) +} + +#[test] +fn test_truncate_project() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Truncate(10); + +let result_unary = transform.project(name.clone(), )?.unwrap(); +let result_binary = transform.project(name.clone(), )?.unwrap(); +let result_set = transform.project(name.clone(), )?.unwrap(); + +assert_eq!(format!("{}", result_unary), "projected_name IS NULL"); +assert_eq!(format!("{}", result_binary), "projected_name = 0"); +assert_eq!(format!("{}", result_set), "projected_name IN (0)"); + +Ok(()) +} + +#[test] +fn test_identity_project() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Identity; + +let result_unary = transform.project(name.clone(), )?.unwrap(); +let result_binary = transform.project(name.clone(), )?.unwrap(); +let result_set = transform.project(name.clone(), )?.unwrap(); + +assert_eq!(format!("{}", result_unary), "projected_name IS NULL"); +assert_eq!(format!("{}", result_binary), "projected_name = 5"); +assert_eq!(format!("{}", result_set), "projected_name IN (5, 6)"); + +Ok(()) +} + +#[test] +fn test_bucket_project() -> Result<()> { Review Comment: @Fokko @liurenjie1024 ...I ported the suite for `Bucket` transforms - with all tests passing. So bucket impl seems to produce the correct results (at least no obvious bugs present). However, PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2025357488 @liurenjie1024 I think I covered most of your suggestions. PTAL if the overall design and implementation in general is fine? ### Unresolved Issues: - [ ] How to handle PrimitiveType::Time and PrimitiveType::Timestamp for `Dates` projection and boundary adjustment - [ ] Move Transform specific logic into /transform (e.g. logic to handle Transform::Bucket projection -> /transform/bucket.rs) - [ ] Port Java testsuite to 1) I think we need to take PrimitiveType::Time and convert it to PrimitiveType::Date and then adjust the boundary. If this is correct, I still havent found a way to convert between those types. Perhaps someone can point me in the right direction? --- *due to easter-holidays (and my two children) - I can probably do only some minimal work on this over the weekend -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542937628 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. Review Comment: I would suggest to extend the doc. The example doesn't explain why `a !=5`, `bucket(a, 10) as ba` can't be projected to `ba != bucket_apply(5, 10)`, because it may violate the guarantee since there are other values which could be hashed to `bucket_apply(5, 10)`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542923641 ## crates/iceberg/src/spec/values.rs: ## @@ -683,6 +684,47 @@ impl Datum { pub fn data_type() -> { #type } + +/// Create a new `Datum` with adjusted boundary for projection +pub fn boundary(, op: ) -> Result> { Review Comment: yes - but still implemented on `Datum` or as a helper fn on `Transform` with param: `` ? it feels strange to impl `Datum` across two modules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542923641 ## crates/iceberg/src/spec/values.rs: ## @@ -683,6 +684,47 @@ impl Datum { pub fn data_type() -> { #type } + +/// Create a new `Datum` with adjusted boundary for projection +pub fn boundary(, op: ) -> Result> { Review Comment: yes - but still implemented on `Datum` or as a helper fn on `Transform` with param: `` ? it feels strange to impl `Datum` across two modules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542916299 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. Review Comment: should I extend the docstring then? or doesn't the example specify this enough? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542902827 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match self { +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +BoundPredicate::Binary(expr) => { +if expr.op() != PredicateOperator::Eq { +return Ok(None); +} + +let new_datum = func.transform_literal(expr.literal())?.ok_or_else(|| { +Error::new( +ErrorKind::DataInvalid, +"Transformed datum must not be 'None'", +) +})?; + +Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +new_datum, +))) +} +BoundPredicate::Set(expr) => { +if expr.op() != PredicateOperator::In { +return Ok(None); +} + +Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +self.apply_transform_on_set(expr.literals(), )?, +))) +} +_ => None, +}, +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +BoundPredicate::Binary(expr) => Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +))), +BoundPredicate::Set(expr) => Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +))), +_ => None, +}, +Transform::Truncate(_) => match predicate { +BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +BoundPredicate::Binary(expr) => { +let op = expr.op(); +let primitive = expr.literal().literal(); + +match primitive { +PrimitiveLiteral::Int(v) => { +self.apply_transform_boundary(name, v, op, )? +} +PrimitiveLiteral::Long(v) => { +self.apply_transform_boundary(name, v, op, )? +} +PrimitiveLiteral::Decimal(v) => { +self.apply_transform_boundary(name, v, op, )? +} +PrimitiveLiteral::Fixed(v) => { +self.apply_transform_boundary(name, v, op, )? +} +_ => return Ok(None), +} +} +BoundPredicate::Set(expr) => { +if expr.op() != PredicateOperator::In { +return Ok(None); +} + +Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +self.apply_transform_on_set(expr.literals(), )?, +))) +} +_ => None, +}, +_ => None, +}; + +Ok(projection) +} + +/// Transform
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542870129 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; Review Comment: You mean, I do the check right before I call `func.transform_literal(...)`. I have to do it in every match arm, since I need access to the `Datum` in order to get the `PrimitiveType` as input for `result_type()`? Do I understand you correctly? ...something like? ```Rust let input_type = expr.literal().data_type(); self.result_type(::Primitive(input_type.clone()))?; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542878163 ## crates/iceberg/src/transform/mod.rs: ## @@ -16,6 +16,7 @@ // under the License. //! Transform function used to compute partition values. + Review Comment: The extra whitespace. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542877714 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; Review Comment: Yes, exactly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542870129 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; Review Comment: You mean, I do the check right before I call `func.transform_literal(...)`. I have to do it in every match arm, since I need access to the `Datum` in order to get the `PrimitiveType` as input for `result_type()`? Do I understand you correctly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542837809 ## crates/iceberg/src/transform/mod.rs: ## @@ -16,6 +16,7 @@ // under the License. //! Transform function used to compute partition values. + Review Comment: you mean the comment? or the extra whitespace? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542833995 ## crates/iceberg/src/spec/values.rs: ## @@ -683,6 +684,47 @@ impl Datum { pub fn data_type() -> { #type } + +/// Create a new `Datum` with adjusted boundary for projection +pub fn boundary(, op: ) -> Result> { Review Comment: How about moving this to transform module since this is only useful for transform? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2025035027 > Basically, I got rid of the trait and implemented boundary on Datum itself - which makes more sense to me, now that I have a better overall picture with the dates transformations in mind. @marvinlanhenke Thanks for this, I also an boundary trait is a little over design, and implementing them directly on `Datum` seems better to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542630137 ## crates/iceberg/src/spec/transform.rs: ## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + +/// Projects a given predicate according to the transformation +/// specified by the `Transform` instance. +/// +/// This allows predicates to be effectively applied to data +/// that has undergone transformation, enabling efficient querying +/// and filtering based on the original, untransformed data. +/// +/// # Example +/// Suppose, we have row filter `a = 10`, and a partition spec +/// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition +/// value should match `bucket(10, 37) as bs`, and we project `a = 10` to +/// `bs = bucket(10, 37)` +pub fn project(, name: String, predicate: ) -> Result> { +let func = create_transform_function(self)?; + +let projection = match self { +Transform::Bucket(_) => match predicate { +BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +BoundPredicate::Binary(expr) => { +if expr.op() != PredicateOperator::Eq { +return Ok(None); +} + +let new_datum = func.transform_literal(expr.literal())?.ok_or_else(|| { +Error::new( +ErrorKind::DataInvalid, +"Transformed datum must not be 'None'", +) +})?; + +Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +new_datum, +))) +} +BoundPredicate::Set(expr) => { +if expr.op() != PredicateOperator::In { +return Ok(None); +} + +Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +self.apply_transform_on_set(expr.literals(), )?, +))) +} +_ => None, +}, +Transform::Identity => match predicate { +BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +BoundPredicate::Binary(expr) => Some(Predicate::Binary(BinaryExpression::new( +expr.op(), +Reference::new(name), +expr.literal().to_owned(), +))), +BoundPredicate::Set(expr) => Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +expr.literals().to_owned(), +))), +_ => None, +}, +Transform::Truncate(_) => match predicate { +BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( +expr.op(), +Reference::new(name), +))), +BoundPredicate::Binary(expr) => { +let op = expr.op(); +let primitive = expr.literal().literal(); + +match primitive { +PrimitiveLiteral::Int(v) => { +self.apply_transform_boundary(name, v, op, )? +} +PrimitiveLiteral::Long(v) => { +self.apply_transform_boundary(name, v, op, )? +} +PrimitiveLiteral::Decimal(v) => { +self.apply_transform_boundary(name, v, op, )? +} +PrimitiveLiteral::Fixed(v) => { +self.apply_transform_boundary(name, v, op, )? +} +_ => return Ok(None), +} +} +BoundPredicate::Set(expr) => { +if expr.op() != PredicateOperator::In { +return Ok(None); +} + +Some(Predicate::Set(SetExpression::new( +expr.op(), +Reference::new(name), +self.apply_transform_on_set(expr.literals(), )?, +))) +} +_ => None, +}, +_ => None, +}; + +Ok(projection) +} + +/// Transform each
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2024950370 @Fokko Is it correct to assume, as far as I understood the Java implementation, that we support `dates` projection only for `year, month, and day`? @liurenjie1024 I converted this to a draft - since I had to change the design. I'll push this draft, so you can verify. Basically, I got rid of the trait and implemented `boundary` on `Datum` itself - which makes more sense to me, now that I have a better overall picture with the dates transformations in mind. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2024663637 > Thanks for picking this up @marvinlanhenke > > The most important part of adding projection is making that they are absolutely correct. If rust would generate something different than Java of Python, leads to data incorrectness since it would not correctly evaluate the partition predicates. @Fokko Thanks for the extensive review. Not only did I miss some test but the complete implementation for `Dates` : https://github.com/apache/iceberg/blob/d350c9b8c995a2953aa8b80a0a1fc7cadc4dd16a/api/src/main/java/org/apache/iceberg/transforms/Dates.java#L123 // I was just looking for the transforms for Years, Months, etc. So thank you for hinting me at this - I'll implement this as well - and try to port the Java Testsuite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
Fokko commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542486911 ## crates/iceberg/src/spec/transform.rs: ## @@ -398,6 +679,80 @@ mod tests { } } +#[test] +fn test_none_projection() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Void; +let result_unary = transform.project(name.clone(), )?; +assert!(result_unary.is_none()); + +let transform = Transform::Year; +let result_binary = transform.project(name.clone(), )?; +assert!(result_binary.is_none()); + +let transform = Transform::Month; +let result_set = transform.project(name.clone(), )?; +assert!(result_set.is_none()); + +Ok(()) +} + +#[test] +fn test_truncate_project() -> Result<()> { Review Comment: Could you port all the tests from Java: https://github.com/apache/iceberg/blob/main/api/src/test/java/org/apache/iceberg/transforms/TestTruncatesProjection.java ## crates/iceberg/src/spec/transform.rs: ## @@ -398,6 +679,80 @@ mod tests { } } +#[test] +fn test_none_projection() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Void; +let result_unary = transform.project(name.clone(), )?; +assert!(result_unary.is_none()); + +let transform = Transform::Year; +let result_binary = transform.project(name.clone(), )?; +assert!(result_binary.is_none()); + +let transform = Transform::Month; +let result_set = transform.project(name.clone(), )?; +assert!(result_set.is_none()); + +Ok(()) +} + +#[test] +fn test_truncate_project() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Truncate(10); + +let result_unary = transform.project(name.clone(), )?.unwrap(); +let result_binary = transform.project(name.clone(), )?.unwrap(); +let result_set = transform.project(name.clone(), )?.unwrap(); + +assert_eq!(format!("{}", result_unary), "projected_name IS NULL"); +assert_eq!(format!("{}", result_binary), "projected_name = 0"); +assert_eq!(format!("{}", result_set), "projected_name IN (0)"); + +Ok(()) +} + +#[test] +fn test_identity_project() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Identity; + +let result_unary = transform.project(name.clone(), )?.unwrap(); +let result_binary = transform.project(name.clone(), )?.unwrap(); +let result_set = transform.project(name.clone(), )?.unwrap(); + +assert_eq!(format!("{}", result_unary), "projected_name IS NULL"); +assert_eq!(format!("{}", result_binary), "projected_name = 5"); +assert_eq!(format!("{}", result_set), "projected_name IN (5, 6)"); + +Ok(()) +} + +#[test] +fn test_bucket_project() -> Result<()> { Review Comment: Can you port the test suite from Java to Rust: https://github.com/apache/iceberg/blob/main/api/src/test/java/org/apache/iceberg/transforms/TestBucketingProjection.java ## crates/iceberg/src/spec/transform.rs: ## @@ -398,6 +679,80 @@ mod tests { } } +#[test] +fn test_none_projection() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Void; +let result_unary = transform.project(name.clone(), )?; +assert!(result_unary.is_none()); + +let transform = Transform::Year; +let result_binary = transform.project(name.clone(), )?; +assert!(result_binary.is_none()); + +let transform = Transform::Month; +let result_set = transform.project(name.clone(), )?; +assert!(result_set.is_none()); + +Ok(()) +} + +#[test] +fn test_truncate_project() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Truncate(10); + +let result_unary = transform.project(name.clone(), )?.unwrap(); +let result_binary = transform.project(name.clone(), )?.unwrap(); +let result_set = transform.project(name.clone(), )?.unwrap(); + +assert_eq!(format!("{}", result_unary), "projected_name IS NULL"); +assert_eq!(format!("{}", result_binary), "projected_name = 0"); +assert_eq!(format!("{}", result_set), "projected_name IN (0)"); + +Ok(()) +} + +#[test] +fn test_identity_project() -> Result<()> { +let name = "projected_name".to_string(); +
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2024617450 > LGTM, with the caveat that I'm not an expert on this part of the spec, but the PR seems well-structured. > > I wouldn't mind seeing more comments in the tests though to explain the "why" aspect of each test. Thanks for your feedback. If the others approve "the correctness" of the PR - I'll add those comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542468150 ## crates/iceberg/src/spec/transform.rs: ## @@ -398,6 +679,80 @@ mod tests { } } +#[test] +fn test_none_projection() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Void; +let result_unary = transform.project(name.clone(), )?; +assert!(result_unary.is_none()); + +let transform = Transform::Year; +let result_binary = transform.project(name.clone(), )?; +assert!(result_binary.is_none()); + +let transform = Transform::Month; +let result_set = transform.project(name.clone(), )?; +assert!(result_set.is_none()); + +Ok(()) +} + +#[test] +fn test_truncate_project() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Truncate(10); + +let result_unary = transform.project(name.clone(), )?.unwrap(); +let result_binary = transform.project(name.clone(), )?.unwrap(); +let result_set = transform.project(name.clone(), )?.unwrap(); + +assert_eq!(format!("{}", result_unary), "projected_name IS NULL"); +assert_eq!(format!("{}", result_binary), "projected_name = 0"); +assert_eq!(format!("{}", result_set), "projected_name IN (0)"); Review Comment: Yes, I think your understanding is correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
sdd commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542459274 ## crates/iceberg/src/spec/transform.rs: ## @@ -398,6 +679,80 @@ mod tests { } } +#[test] +fn test_none_projection() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Void; +let result_unary = transform.project(name.clone(), )?; +assert!(result_unary.is_none()); + +let transform = Transform::Year; +let result_binary = transform.project(name.clone(), )?; +assert!(result_binary.is_none()); + +let transform = Transform::Month; +let result_set = transform.project(name.clone(), )?; +assert!(result_set.is_none()); + +Ok(()) +} + +#[test] +fn test_truncate_project() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Truncate(10); + +let result_unary = transform.project(name.clone(), )?.unwrap(); +let result_binary = transform.project(name.clone(), )?.unwrap(); +let result_set = transform.project(name.clone(), )?.unwrap(); + +assert_eq!(format!("{}", result_unary), "projected_name IS NULL"); +assert_eq!(format!("{}", result_binary), "projected_name = 0"); +assert_eq!(format!("{}", result_set), "projected_name IN (0)"); Review Comment: Just trying to follow what's happening here so that I understand. So in the case of `result_binary`, the value of 5 gets truncated to 0, since the truncate transform when applied to an int effectively divides by 10, rounding down for all fractional values (ie, 1-9 get truncated to 0, 10-19 get truncated to 1, 20-29 get truncated to 2)? And in the case of `result_set`, since both 5 and 6 get truncated to the same value of 0, a set of (5, 6) becomes a set of (0)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
sdd commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542459274 ## crates/iceberg/src/spec/transform.rs: ## @@ -398,6 +679,80 @@ mod tests { } } +#[test] +fn test_none_projection() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Void; +let result_unary = transform.project(name.clone(), )?; +assert!(result_unary.is_none()); + +let transform = Transform::Year; +let result_binary = transform.project(name.clone(), )?; +assert!(result_binary.is_none()); + +let transform = Transform::Month; +let result_set = transform.project(name.clone(), )?; +assert!(result_set.is_none()); + +Ok(()) +} + +#[test] +fn test_truncate_project() -> Result<()> { +let name = "projected_name".to_string(); +let preds = TestPredicates::new(); + +let transform = Transform::Truncate(10); + +let result_unary = transform.project(name.clone(), )?.unwrap(); +let result_binary = transform.project(name.clone(), )?.unwrap(); +let result_set = transform.project(name.clone(), )?.unwrap(); + +assert_eq!(format!("{}", result_unary), "projected_name IS NULL"); +assert_eq!(format!("{}", result_binary), "projected_name = 0"); +assert_eq!(format!("{}", result_set), "projected_name IN (0)"); Review Comment: Just trying to follow what's happening here so that I understand. So in the case of `result_binary`, the value of 5 gets truncated to 0, since the truncate transform when applied to an int effectively divides by 10, rounding down for all fractional values (ie, 1-9 get truncated to 0, 10-19 get truncated to 1, 20-29 get truncated to 2)? And in the case of the set, since both 5 and 6 get truncated to the same value of 0, a set of (5, 6) becomes a set of (0)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Project transform [iceberg-rust]
marvinlanhenke commented on PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#issuecomment-2023840212 @liurenjie1024 @ZENOTME @sdd PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] feat: Project transform [iceberg-rust]
marvinlanhenke opened a new pull request, #309: URL: https://github.com/apache/iceberg-rust/pull/309 ### Which issue does this PR close? Closes #264 ### Rationale for this change The ability to project row_filter to `Transform` partition. This will unblock the Manifest & PartitionEvaluator, which will enable the pruning of manifest files in `fn plan_files`. ### What changes are included in this PR? - add `fn project(...)` on `Transform` - mainly based on [Java ](https://github.com/apache/iceberg/tree/d350c9b8c995a2953aa8b80a0a1fc7cadc4dd16a/api/src/main/java/org/apache/iceberg/transforms) implementation ### Are these changes tested? Yes. Unit tests are included. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org