Fokko commented on code in PR #79:
URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1423535271
##########
crates/iceberg/src/spec/transform.rs:
##########
@@ -126,17 +126,20 @@ pub enum Transform {
impl Transform {
/// Get the return type of transform given the input type.
/// Returns `None` if it can't be transformed.
- pub fn result_type(&self, input_type: &Type) -> Option<Type> {
+ pub fn result_type(&self, input_type: &Type) -> Result<Type> {
Review Comment:
I think there is a difference, let me explain! In Iceberg it is possible to
have your custom transform, for example if you want to do something very
specific, or you want to influence the distribution when it comes to
partitioning. It can be that the transform is not available in the client, and
then we cast it to an `UnknownTransform`:
https://github.com/apache/iceberg-python/blob/8c8abb5c4c258e32941110a9ce0938e1328290b3/pyiceberg/transforms.py#L688
What this essentially means is that we don't know how it is transformed, and
we cannot apply speedups. For example, `bucket[16](34) = 3` then we know in
which bucket the data is that we're looking for. But if there is some custom
partitioning then we just have to look through all the buckets.
To this specific case of the type, in PyIceberg we return a `StringType()`,
that's borrowed from Java where you just cast it to a string. I don't think
that's used anywhere since we don't call this on the `UnknownTransform`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]