advancedxy commented on code in PR #8579: URL: https://github.com/apache/iceberg/pull/8579#discussion_r1448606533
########## format/spec.md: ########## @@ -322,14 +322,22 @@ The `void` transform may be used to replace the transform in an existing partiti #### Bucket Transform Details -Bucket partition transforms use a 32-bit hash of the source value. The 32-bit hash implementation is the 32-bit Murmur3 hash, x86 variant, seeded with 0. +Bucket partition transforms use a 32-bit hash of the source value or source value list. The 32-bit hash implementation is the 32-bit Murmur3 hash, x86 variant, seeded with 0. Transforms are parameterized by a number of buckets [1], `N`. The hash mod `N` must produce a positive value by first discarding the sign bit of the hash value. In pseudo-code, the function is: ``` def bucket_N(x) = (murmur3_x86_32_hash(x) & Integer.MAX_VALUE) % N ``` +When bucket transforming a list of values(a.k.a. multi-arg bucket), the input is treated as a struct. The struct fields are hashed and the hashes are combined using the same hash function. In pseudo-code, the hash function is: + +``` + def murmur3_x86_hash(struct(x1, x2, ..., xn)) = hasher.put(x1).put(x2)...put(xn).hash().asInt Review Comment: Answered in another thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org