This has already been proposed as part of v4, see Edwards column metrics
expansion proposal

On Thu, Sep 11, 2025 at 4:54 AM rice Zhang <[email protected]> wrote:

> Hi, Junwang
>
> We're discussing the storage of lower and upper bounds for decimal values
> in manifest files and their compatibility after type evolution. The bounds
> are stored as unscaled values without their original scale, so when the
> decimal type changes, we can't correctly interpret these historical bounds
> even though we know the current type from metadata.
>
> Minglei.
>
> Junwang Zhao <[email protected]> 于2025年9月11日周四 17:46写道:
>
>> Hi Minglei,
>>
>> On Thu, Sep 11, 2025 at 5:35 PM rice Zhang <[email protected]> wrote:
>> >
>> > Hi Ryan,
>> >
>> > Thank you for your detailed response. I've discussed this issue offline
>> with my team lead, and we've done some deeper investigation into the
>> problem. After reviewing the Decimal Type serialization code in Iceberg, we
>> confirmed that currently only the unscaled value is serialized without
>> storing the scale value. This indeed makes type evolution more complex than
>> initially anticipated. Regarding your mention of v4 adopting columnar
>> metadata for manifests, while I'm not certain which specific format Iceberg
>> will use (perhaps Parquet?), I agree this is a positive direction. However,
>> to properly support decimal scale evolution, I believe Iceberg would need
>> to fundamentally change how decimal types are serialized, regardless of
>> whether using Avro or Parquet. Specifically, we'd need to serialize both
>> the unscaled value AND the scale, not just the unscaled value.
>> >
>> > Here's an example: Consider a field initially defined as DECIMAL(5,2)
>> with value 123.45 (the serialized unscaled value is 12345). If a user later
>> changes the type to DECIMAL(6,3) - which follows SQL:2011 rules since (p-s)
>> doesn't decrease - reading the old data with the new type would be
>> problematic. Without the original scale being serialized, we can't
>> distinguish whether 12345 represents 123.45 (scale=2) or 12.345 (scale=3),
>> potentially leading to incorrect data interpretation. By serializing the
>> scale alongside the unscaled value, we could correctly read 12345 with
>> scale=2 as 123.450 under the new DECIMAL(6,3) type, avoiding data
>> corruption.
>>
>> The metadata should have the data type, which includes the scale and
>> precision, isn't that enough to describe the decimal? Correct me if
>> I'm wrong :)
>>
>> >
>> > I'd like to confirm whether this approach of serializing the scale
>> value is something you consider viable? Or does the community have other
>> better solutions for supporting decimal scale evolution? Also, I'm
>> wondering if you've already discussed specific implementation approaches
>> for decimal type changes? I'm very interested in understanding how v4 plans
>> to address this issue.
>> >
>> > Minglei
>> >
>> > Ryan Blue <[email protected]> 于2025年9月11日周四 03:53写道:
>> >>
>> >> Hi Minglei, thanks for the proposal.
>> >>
>> >> v3 is now closed, so we can't introduce a breaking change like this
>> until v4. We looked into decimal type evolution in v3 and found that due to
>> the way that we currently store lower and upper bounds for decimal values,
>> we can't safely support this in v3 Iceberg manifests. We will need to wait
>> until v4 manifests are introduced with columnar metadata to make this
>> change.
>> >>
>> >> Ryan
>> >>
>> >> On Wed, Sep 10, 2025 at 12:28 AM rice Zhang <[email protected]>
>> wrote:
>> >>>
>> >>> Hi Iceberg Community,
>> >>>
>> >>> I'd like to propose extending Iceberg's type promotion rules to
>> support DECIMAL type evolution with scale changes, aligning with the
>> SQL:2011 standard.
>> >>>
>> >>> Current Limitation
>> >>>   Currently, Iceberg only supports DECIMAL type promotion when:
>> >>>   - Scale remains the same
>> >>>   - Precision can be increased
>> >>>
>> >>>   This means DECIMAL(10,2) can evolve to DECIMAL(12,2), but not to
>> DECIMAL(12,4).
>> >>>
>> >>> Proposed Change
>> >>>   Allow DECIMAL type evolution when:
>> >>>   1. Target scale >= source scale
>> >>>   2. Target precision >= source precision
>> >>>   3. Integer part capacity is preserved: (target_precision -
>> target_scale) >= (source_precision - source_scale)
>> >>>
>> >>> Examples
>> >>>   With this change:
>> >>>   - DECIMAL(10,2) → DECIMAL(12,4) ✓ (integer part: 8 → 8, scale: 2 →
>> 4)
>> >>>   - DECIMAL(10,2) → DECIMAL(15,5) ✓ (integer part: 8 → 10, scale: 2 →
>> 5)
>> >>>   - DECIMAL(10,2) → DECIMAL(10,4) ✗ (integer part: 8 → 6, would lose
>> integer capacity)
>> >>>
>> >>> Rationale
>> >>>   1. SQL:2011 Compliance: This behavior aligns with SQL:2011 standard
>> expectations
>> >>>   2. User Experience: Many users coming from traditional databases
>> expect this type evolution to work
>> >>>   3. Data Safety: The proposed rules ensure no data loss - existing
>> values can always be represented in the new
>> >>>   type
>> >>>   4. Real-world Use Cases: Common scenarios like adding more decimal
>> precision for currency calculations would
>> >>>   be supported
>> >>>
>> >>> Implementation
>> >>>   I've created a proof-of-concept implementation:
>> https://github.com/apache/iceberg/issues/14037
>> >>>
>> >>> Questions for Discussion
>> >>>   1. Should this be part of the spec v3, or wait for a future version?
>> >>>   2. Are there any backward compatibility concerns we should address?
>> >>>
>> >>> Looking forward to your feedback and thoughts on this proposal.
>> >>>
>> >>> Best regards,
>> >>> Minglei
>>
>>
>>
>> --
>> Regards
>> Junwang Zhao
>>
>

Reply via email to