advancedxy commented on code in PR #8579:
URL: https://github.com/apache/iceberg/pull/8579#discussion_r1469025897
##########
format/spec.md:
##########
@@ -1128,12 +1128,17 @@ Each partition field in the fields list is stored as an
object. See the table fo
|**`month`**|`JSON string: "month"`|`"month"`|
|**`day`**|`JSON string: "day"`|`"day"`|
|**`hour`**|`JSON string: "hour"`|`"hour"`|
-|**`Partition Field`**|`JSON object: {`<br /> `"source-id": <id
int>,`<br /> `"field-id": <field id int>,`<br /> `"name":
<name string>,`<br /> `"transform": <transform JSON>`<br
/>`}`|`{`<br /> `"source-id": 1,`<br /> `"field-id":
1000,`<br /> `"name": "id_bucket",`<br /> `"transform":
"bucket[16]"`<br />`}`|
+|**`Partition Field`** [1,2]|`JSON object: {`<br /> `"source-id":
<id int>,`<br /> `"field-id": <field id int>,`<br
/> `"name": <name string>,`<br /> `"transform":
<transform JSON>`<br />`}`|`{`<br /> `"source-id": 1,`<br
/> `"field-id": 1000,`<br /> `"name": "id_bucket",`<br
/> `"transform": "bucket[16]"`<br />`}`|
In some cases partition specs are stored using only the field list instead of
the object format that includes the spec ID, like the deprecated
`partition-spec` field in table metadata. The object format should be used
unless otherwise noted in this spec.
The `field-id` property was added for each partition field in v2. In v1, the
reference implementation assigned field ids sequentially in each spec starting
at 1,000. See Partition Evolution for more details.
+Notes:
+
+1. For partition fields with a transform with a single argument, the ID of the
source field is set on `source-id`, and `source-ids` is omitted.
+2. For partition fields with a transform of multiple arguments, the IDs of the
source fields are set on `source-ids`. To preserve backward compatibility,
`source-id` is set to -1.
Review Comment:
> It seems that it would be better to choose a field ID from the existing
range for reserved field IDs (e.g. MAX_INT-200) then to use -1,
Per my understanding, multi-arg transforms will mostly get a new transform
name rather than the existing ones. Older readers will treat this multi-arg
transform as an `UnknownTransform`, the persisted `source-id` is just to make
old code happy, see this reply as well:
https://github.com/apache/iceberg/pull/8579#discussion_r1454350089. So the
`value` of `source-id` is just a place holder and doesn't make too much sense.
It could be a field ID from the reserved range or a negative one since the
current reference implementation wouldn't produce a negative field id.
I simply choose `-1` as it seems more nature and doesn't need to put a
somehow weird reserved field in the `MetadataColumns.java` , but I think we
make always make follow-up pr if there's valid concerns/solutions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]