wombatu-kun commented on issue #16238:
URL: https://github.com/apache/iceberg/issues/16238#issuecomment-4533267778

   I looked into this and can confirm it's a divergence between two default 
partition-name generators in core/api, not a Spark-side issue.
   
   - Table creation (including Spark `createOrReplace`, which routes through 
`Spark3Util.toPartitionSpec` → `PartitionSpec.Builder.bucket(...)`) generates 
`col_bucket` / `col_trunc` — without the parameter.
   - `ALTER TABLE ADD PARTITION FIELD` routes through 
`BaseUpdatePartitionSpec.PartitionNameGenerator`, which generates 
`col_bucket_<n>` / `col_trunc_<width>` — with the parameter.
   - Time transforms (`year`/`month`/`day`/`hour`) already agree across both 
paths.
   - For reference, the spec's own example uses the no-parameter form: the 
partition-spec example in `format/spec.md` shows `"name": "id_bucket"` for a 
`bucket[16]` transform.
   
   Both forms are currently asserted by tests on each side, so this looks like 
a deliberate-but-unreconciled difference rather than an accidental bug, and 
standardizing it would be a user-visible, cross-engine behavior change. Before 
anyone implements a fix, it would be good to settle the direction:
   
   1. Align `ALTER` to the creation/spec form (`col_bucket`). Smaller footprint 
and matches the spec example. The `_<n>` suffix in `BaseUpdatePartitionSpec` 
mainly disambiguates adding two different bucket widths on the same source 
column within the current spec (the case pinned by `testAddMultipleBuckets`); 
collisions with previously-dropped (void) fields are already handled separately 
by renaming them to `name_<fieldId>`. So this direction could keep the bare 
`col_bucket` name and append `_<n>` only on an actual name conflict, which 
preserves the multi-width case while making the common `ALTER` match creation.
   2. Align creation to the `ALTER` form (`col_bucket_<n>`), which is the 
preference noted in this issue and makes the builder strictly more expressive. 
The downside is that it changes the default name for the most common path 
across every engine, deviates from the spec example, and touches a large number 
of existing tests.
   
   Separately, the "add multiple widths on the same source column" behavior is 
currently covered by a positive test only for `bucket` 
(`testAddMultipleBuckets`), not for `truncate`, even though both behave 
identically.
   
   Could a maintainer weigh in on the preferred direction (or whether this is 
intended and should just be documented)? Happy to put up the PR once the 
direction is decided.
   
   cc @rdblue @RussellSpitzer
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to