liurenjie1024 commented on code in PR #309: URL: https://github.com/apache/iceberg-rust/pull/309#discussion_r1542630137
########## crates/iceberg/src/spec/transform.rs: ########## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + + /// Projects a given predicate according to the transformation + /// specified by the `Transform` instance. + /// + /// This allows predicates to be effectively applied to data + /// that has undergone transformation, enabling efficient querying + /// and filtering based on the original, untransformed data. + /// + /// # Example + /// Suppose, we have row filter `a = 10`, and a partition spec + /// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition + /// value should match `bucket(10, 37) as bs`, and we project `a = 10` to + /// `bs = bucket(10, 37)` + pub fn project(&self, name: String, predicate: &BoundPredicate) -> Result<Option<Predicate>> { + let func = create_transform_function(self)?; + + let projection = match self { + Transform::Bucket(_) => match predicate { + BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( + expr.op(), + Reference::new(name), + ))), + BoundPredicate::Binary(expr) => { + if expr.op() != PredicateOperator::Eq { + return Ok(None); + } + + let new_datum = func.transform_literal(expr.literal())?.ok_or_else(|| { + Error::new( + ErrorKind::DataInvalid, + "Transformed datum must not be 'None'", + ) + })?; + + Some(Predicate::Binary(BinaryExpression::new( + expr.op(), + Reference::new(name), + new_datum, + ))) + } + BoundPredicate::Set(expr) => { + if expr.op() != PredicateOperator::In { + return Ok(None); + } + + Some(Predicate::Set(SetExpression::new( + expr.op(), + Reference::new(name), + self.apply_transform_on_set(expr.literals(), &func)?, + ))) + } + _ => None, + }, + Transform::Identity => match predicate { + BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( + expr.op(), + Reference::new(name), + ))), + BoundPredicate::Binary(expr) => Some(Predicate::Binary(BinaryExpression::new( + expr.op(), + Reference::new(name), + expr.literal().to_owned(), + ))), + BoundPredicate::Set(expr) => Some(Predicate::Set(SetExpression::new( + expr.op(), + Reference::new(name), + expr.literals().to_owned(), + ))), + _ => None, + }, + Transform::Truncate(_) => match predicate { + BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( + expr.op(), + Reference::new(name), + ))), + BoundPredicate::Binary(expr) => { + let op = expr.op(); + let primitive = expr.literal().literal(); + + match primitive { + PrimitiveLiteral::Int(v) => { + self.apply_transform_boundary(name, v, op, &func)? + } + PrimitiveLiteral::Long(v) => { + self.apply_transform_boundary(name, v, op, &func)? + } + PrimitiveLiteral::Decimal(v) => { + self.apply_transform_boundary(name, v, op, &func)? + } + PrimitiveLiteral::Fixed(v) => { + self.apply_transform_boundary(name, v, op, &func)? + } + _ => return Ok(None), + } + } + BoundPredicate::Set(expr) => { + if expr.op() != PredicateOperator::In { + return Ok(None); + } + + Some(Predicate::Set(SetExpression::new( + expr.op(), + Reference::new(name), + self.apply_transform_on_set(expr.literals(), &func)?, + ))) + } + _ => None, + }, + _ => None, + }; + + Ok(projection) + } + + /// Transform each literal value of `FnvHashSet<Datum>` + fn apply_transform_on_set( + &self, + literals: &FnvHashSet<Datum>, + func: &BoxedTransformFunction, + ) -> Result<FnvHashSet<Datum>> { + literals + .iter() + .map(|lit| { + func.transform_literal(lit).and_then(|d| { + d.ok_or_else(|| { + Error::new( + ErrorKind::DataInvalid, + "Transformed datum must not be 'None'", Review Comment: This appeared several times, how about we put them in a common place? For example in `TransformFunction`'s default method: ``` pub trait TransformFunction { fn transform_literal_result(&self, Datum) -> Result<Datum> { self.transform_literal(datum).ok_or_else(|| ...) } } ``` ########## crates/iceberg/src/transform/mod.rs: ########## @@ -16,6 +16,7 @@ // under the License. //! Transform function used to compute partition values. + Review Comment: Is this unnecessary? ########## crates/iceberg/src/spec/transform.rs: ########## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + + /// Projects a given predicate according to the transformation + /// specified by the `Transform` instance. + /// + /// This allows predicates to be effectively applied to data + /// that has undergone transformation, enabling efficient querying + /// and filtering based on the original, untransformed data. + /// + /// # Example + /// Suppose, we have row filter `a = 10`, and a partition spec + /// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition + /// value should match `bucket(10, 37) as bs`, and we project `a = 10` to + /// `bs = bucket(10, 37)` + pub fn project(&self, name: String, predicate: &BoundPredicate) -> Result<Option<Predicate>> { + let func = create_transform_function(self)?; + + let projection = match self { + Transform::Bucket(_) => match predicate { + BoundPredicate::Unary(expr) => Some(Predicate::Unary(UnaryExpression::new( + expr.op(), + Reference::new(name), + ))), + BoundPredicate::Binary(expr) => { + if expr.op() != PredicateOperator::Eq { + return Ok(None); + } + + let new_datum = func.transform_literal(expr.literal())?.ok_or_else(|| { + Error::new( + ErrorKind::DataInvalid, + "Transformed datum must not be 'None'", + ) Review Comment: ```suggestion Error::new( ErrorKind::Unexpected, format!("{self} returns 'None' for literal {expr.literal}"), ) ``` If we add check as I have mentioned, it's supposed to be an unexpected error. ########## crates/iceberg/src/spec/transform.rs: ########## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + + /// Projects a given predicate according to the transformation + /// specified by the `Transform` instance. + /// + /// This allows predicates to be effectively applied to data + /// that has undergone transformation, enabling efficient querying + /// and filtering based on the original, untransformed data. + /// + /// # Example + /// Suppose, we have row filter `a = 10`, and a partition spec + /// `bucket(a, 37) as bs`, if one row matches `a = 10`, then its partition + /// value should match `bucket(10, 37) as bs`, and we project `a = 10` to + /// `bs = bucket(10, 37)` + pub fn project(&self, name: String, predicate: &BoundPredicate) -> Result<Option<Predicate>> { + let func = create_transform_function(self)?; Review Comment: I think we should first test if this transform is applicable to data type by using [result_type](https://github.com/apache/iceberg-rust/blob/e0e2b1b65ef6d7530a21e29ffda123b26af3cc1f/crates/iceberg/src/spec/transform.rs#L129) method, if not, we should return `None`. ########## crates/iceberg/src/spec/transform.rs: ########## @@ -261,6 +270,174 @@ impl Transform { _ => self == other, } } + + /// Projects a given predicate according to the transformation + /// specified by the `Transform` instance. + /// + /// This allows predicates to be effectively applied to data + /// that has undergone transformation, enabling efficient querying + /// and filtering based on the original, untransformed data. Review Comment: It's important to state that: the projection guaranteed that if `predicate(v)` is true, the `project(apply(v))` is true, here `project` means returned predicate, and `apply(v)` means value applying transformation to v. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org