thanks Peter,
  For `bindByTransformedValue`, from the signature, it would need to encode
a conversion matrix between different transformations. we can also use
`toSourceTypeValue` to implement `bindByTransformedValue` if you think
`bindByTransformedValue` as an api is semantically more meaningful.
-Dyno

On Tue, Dec 16, 2025 at 2:57 AM Péter Váry <[email protected]>
wrote:

> Hi Team,
>
> Thanks Dyno for bringing this up on the dev list!
>
> For the others, the original goal is that if we have two transformations
> where *T1.satisfiesOrderOf(T2)*, then given a partition value P1 for T1,
> we should be able to derive the corresponding partition value P2 for T2
> (for example, the day 2025-10-18 exactly determines the month 2025-10). One
> possible approach is the API Dyno proposed, which would be part of the
> Transform interface. I’ve included your suggested Javadoc at the end of
> this message for reference.
>
> The alternative we discussed was something like:
>
> *<P> SerializableFunction<S, T> bindByTransformedValue(Transform<?, P>
> otherTransform, P otherOutput)*
>
>
> This is a very low-level API, and I’d prefer to extend it only if no
> better alternative exists. If you have other ideas or suggestions, we’d be
> happy to hear them.
>
> Thanks,
> Peter
>
> The javadoc for the API proposed by Dyno:
>
>
> *  /***
> *   * Converts a transformed partition value back to a representative
> source type value.*
> *   **
> *   * <p>This method returns a source value that would produce the given
> transformed value when this*
> *   * transform is applied. For temporal transforms, this returns the
> start of the period (e.g.,*
> *   * start of hour, day, month, or year). For truncate transforms, this
> returns the truncated value*
> *   * as-is since it preserves the source type.*
> *   **
> *   * <p>This is useful for chaining transforms when {@link
> #satisfiesOrderOf(Transform)} is true,*
> *   * allowing conversion from a finer granularity to a coarser one by
> converting back to source type*
> *   * and reapplying the coarser transform.*
> *   **
> *   * @param sourceType the source type for this transform*
> *   * @param transformedValue the transformed partition value*
> *   * @return a source value that would produce this transformed value, or
> null if the input is null*
> *   * @throws UnsupportedOperationException if this transform does not
> support conversion back to*
> *   *     source type*
> *   */*
> default S toSourceTypeValue(Type sourceType, T transformedValue) {
>
>
>
> Dyno Fu <[email protected]> ezt írta (időpont: 2025. dec. 15., H, 20:53):
>
>> Hello Iceberg devs,
>>
>> I’d like to reopen the discussion on
>> https://github.com/apache/iceberg/pull/14281 (“Core: Group binpack
>> fileGroup by output partitionSpec”) that was marked as stable last week.
>>
>> This patch introduces an enhancement to the rewrite_data_files action:
>> instead of grouping files by the current table partition spec, it groups
>> them by the output partition spec provided in the rewrite parameters. This
>> behavior enables more efficient bin-packing of small files when rolling
>> data up into a coarser or alternate partition layout.
>>
>> the current concern for the implementation is the introduce of the the
>> new api
>>
>> default S toSourceTypeValue(Type sourceType, T transformedValue)
>>
>> which is used to normalize the partition value back to the source type.
>> for example an hour transform value of `489118` to a timestamp `2025-10-18
>> 22:00:00` so that a different partition transform (e.g. day transform) can
>> apply to it.
>>
>> what's your opinion on whether this is the right abstraction or any
>> alternative?
>> @pvary please share your thoughts as our discussion over slack.
>> appreciated. thanks.
>>
>> regards,
>> Dyno
>>
>> --
>> reality, with all its ambiguities, does the job just fine.
>>
>

-- 
reality, with all its ambiguities, does the job just fine.

Reply via email to