thanks Peter, For `bindByTransformedValue`, from the signature, it would need to encode a conversion matrix between different transformations. we can also use `toSourceTypeValue` to implement `bindByTransformedValue` if you think `bindByTransformedValue` as an api is semantically more meaningful. -Dyno
On Tue, Dec 16, 2025 at 2:57 AM Péter Váry <[email protected]> wrote: > Hi Team, > > Thanks Dyno for bringing this up on the dev list! > > For the others, the original goal is that if we have two transformations > where *T1.satisfiesOrderOf(T2)*, then given a partition value P1 for T1, > we should be able to derive the corresponding partition value P2 for T2 > (for example, the day 2025-10-18 exactly determines the month 2025-10). One > possible approach is the API Dyno proposed, which would be part of the > Transform interface. I’ve included your suggested Javadoc at the end of > this message for reference. > > The alternative we discussed was something like: > > *<P> SerializableFunction<S, T> bindByTransformedValue(Transform<?, P> > otherTransform, P otherOutput)* > > > This is a very low-level API, and I’d prefer to extend it only if no > better alternative exists. If you have other ideas or suggestions, we’d be > happy to hear them. > > Thanks, > Peter > > The javadoc for the API proposed by Dyno: > > > * /*** > * * Converts a transformed partition value back to a representative > source type value.* > * ** > * * <p>This method returns a source value that would produce the given > transformed value when this* > * * transform is applied. For temporal transforms, this returns the > start of the period (e.g.,* > * * start of hour, day, month, or year). For truncate transforms, this > returns the truncated value* > * * as-is since it preserves the source type.* > * ** > * * <p>This is useful for chaining transforms when {@link > #satisfiesOrderOf(Transform)} is true,* > * * allowing conversion from a finer granularity to a coarser one by > converting back to source type* > * * and reapplying the coarser transform.* > * ** > * * @param sourceType the source type for this transform* > * * @param transformedValue the transformed partition value* > * * @return a source value that would produce this transformed value, or > null if the input is null* > * * @throws UnsupportedOperationException if this transform does not > support conversion back to* > * * source type* > * */* > default S toSourceTypeValue(Type sourceType, T transformedValue) { > > > > Dyno Fu <[email protected]> ezt írta (időpont: 2025. dec. 15., H, 20:53): > >> Hello Iceberg devs, >> >> I’d like to reopen the discussion on >> https://github.com/apache/iceberg/pull/14281 (“Core: Group binpack >> fileGroup by output partitionSpec”) that was marked as stable last week. >> >> This patch introduces an enhancement to the rewrite_data_files action: >> instead of grouping files by the current table partition spec, it groups >> them by the output partition spec provided in the rewrite parameters. This >> behavior enables more efficient bin-packing of small files when rolling >> data up into a coarser or alternate partition layout. >> >> the current concern for the implementation is the introduce of the the >> new api >> >> default S toSourceTypeValue(Type sourceType, T transformedValue) >> >> which is used to normalize the partition value back to the source type. >> for example an hour transform value of `489118` to a timestamp `2025-10-18 >> 22:00:00` so that a different partition transform (e.g. day transform) can >> apply to it. >> >> what's your opinion on whether this is the right abstraction or any >> alternative? >> @pvary please share your thoughts as our discussion over slack. >> appreciated. thanks. >> >> regards, >> Dyno >> >> -- >> reality, with all its ambiguities, does the job just fine. >> > -- reality, with all its ambiguities, does the job just fine.
