laskoviymishka commented on code in PR #1069:
URL: https://github.com/apache/iceberg-go/pull/1069#discussion_r3237322389


##########
table/metadata.go:
##########
@@ -1324,6 +1324,20 @@ var (
        ErrPartitionSpecNotFound        = errors.New("partition spec not found")
 )
 
+func rejectV3OnlyFields(version int, nextRowID *int64, encryptionKeys 
[]EncryptionKey) error {

Review Comment:
   Two concerns stacked here.
   
   First, `version` is only used in the error message and callers pass `1` and 
`2` as literals — no validation, no typed constant. Either drop the parameter 
(the field name already carries the diagnostic weight) or read 
`aux.FormatVersion` to avoid the literal-drift risk.
   
   Second, this signature won't scale. V3 has more fields landing (row-lineage 
tracking, etc.) and each new gate means adding a parameter, updating both call 
sites, and adding four new tests. A variadic `rejectV3OnlyField(version, name, 
present)` called once per field, or a slice of `{name, present}` checks, keeps 
the signature stable as V3 grows.



##########
table/metadata_internal_test.go:
##########
@@ -1854,3 +1854,91 @@ func TestZstdGoldenFixture(t *testing.T) {
 
        assert.True(t, expected.Equals(meta))
 }
+
+func TestV1RejectsNextRowID(t *testing.T) {
+       data := `{
+               "format-version": 1,
+               "table-uuid": "d20125c8-7284-442c-9aea-15fee620737c",
+               "location": "s3://bucket/test/location",
+               "last-updated-ms": 1602638573874,
+               "last-column-id": 3,
+               "schemas": 
[{"type":"struct","schema-id":0,"fields":[{"id":1,"name":"x","required":true,"type":"long"}]}],
+               "current-schema-id": 0,
+               "partition-specs": [{"spec-id": 0, "fields": []}],
+               "default-spec-id": 0,
+               "sort-orders": [{"order-id": 0, "fields": []}],
+               "default-sort-order-id": 0,
+               "next-row-id": 42
+       }`
+
+       _, err := ParseMetadataBytes([]byte(data))
+       require.ErrorIs(t, err, ErrInvalidMetadata)
+       assert.ErrorContains(t, err, "v3-only field 'next-row-id' present in v1 
metadata")
+}
+
+func TestV1RejectsEncryptionKeys(t *testing.T) {
+       data := `{
+               "format-version": 1,
+               "table-uuid": "d20125c8-7284-442c-9aea-15fee620737c",
+               "location": "s3://bucket/test/location",
+               "last-updated-ms": 1602638573874,
+               "last-column-id": 3,
+               "schemas": 
[{"type":"struct","schema-id":0,"fields":[{"id":1,"name":"x","required":true,"type":"long"}]}],
+               "current-schema-id": 0,
+               "partition-specs": [{"spec-id": 0, "fields": []}],
+               "default-spec-id": 0,
+               "sort-orders": [{"order-id": 0, "fields": []}],
+               "default-sort-order-id": 0,
+               "encryption-keys": [{"key-id": "k1", "encrypted-key-metadata": 
"abc123"}]
+       }`
+
+       _, err := ParseMetadataBytes([]byte(data))
+       require.ErrorIs(t, err, ErrInvalidMetadata)
+       assert.ErrorContains(t, err, "v3-only field 'encryption-keys' present 
in v1 metadata")
+}
+
+func TestV2RejectsNextRowID(t *testing.T) {
+       data := `{
+               "format-version": 2,
+               "table-uuid": "d20125c8-7284-442c-9aea-15fee620737c",
+               "location": "s3://bucket/test/location",
+               "last-updated-ms": 1602638573874,
+               "last-column-id": 3,
+               "last-sequence-number": 0,
+               "schemas": 
[{"type":"struct","schema-id":0,"fields":[{"id":1,"name":"x","required":true,"type":"long"}]}],
+               "current-schema-id": 0,
+               "partition-specs": [{"spec-id": 0, "fields": []}],
+               "default-spec-id": 0,
+               "last-partition-id": 1000,
+               "sort-orders": [{"order-id": 0, "fields": []}],
+               "default-sort-order-id": 0,
+               "next-row-id": 42
+       }`
+
+       _, err := ParseMetadataBytes([]byte(data))
+       require.ErrorIs(t, err, ErrInvalidMetadata)
+       assert.ErrorContains(t, err, "v3-only field 'next-row-id' present in v2 
metadata")
+}
+
+func TestV2RejectsEncryptionKeys(t *testing.T) {
+       data := `{
+               "format-version": 2,
+               "table-uuid": "d20125c8-7284-442c-9aea-15fee620737c",
+               "location": "s3://bucket/test/location",
+               "last-updated-ms": 1602638573874,
+               "last-column-id": 3,
+               "last-sequence-number": 0,
+               "schemas": 
[{"type":"struct","schema-id":0,"fields":[{"id":1,"name":"x","required":true,"type":"long"}]}],
+               "current-schema-id": 0,
+               "partition-specs": [{"spec-id": 0, "fields": []}],
+               "default-spec-id": 0,
+               "last-partition-id": 1000,
+               "sort-orders": [{"order-id": 0, "fields": []}],
+               "default-sort-order-id": 0,
+               "encryption-keys": [{"key-id": "k1", "encrypted-key-metadata": 
"abc123"}]
+       }`
+
+       _, err := ParseMetadataBytes([]byte(data))
+       require.ErrorIs(t, err, ErrInvalidMetadata)
+       assert.ErrorContains(t, err, "v3-only field 'encryption-keys' present 
in v2 metadata")
+}

Review Comment:
   Four structurally identical functions with copy-pasted JSON bodies; the only 
variation is version, field name, and expected substring. Collapse into one 
table-driven `TestRejectV3OnlyFields(t *testing.T)` with subtests and a small 
JSON-builder helper — each new V3 field becomes a one-line table entry, which 
matters more once the helper grows (see comment on metadata.go:1327).
   
   Also missing two cases worth adding:
   1. A positive `TestV3AcceptsV3OnlyFields` confirming v3 metadata containing 
`encryption-keys` parses without error — `ExampleTableMetadataV3` doesn't cover 
this, so the "v3 is OK" path is currently untested.
   2. A `TestV1AcceptsEmptyEncryptionKeys` covering `encryption-keys: []` — the 
`len(...) > 0` guard is specifically designed for that boundary but nothing 
exercises it, so a refactor to `encryptionKeys != nil` would silently break 
interop without test failure.



##########
table/metadata.go:
##########
@@ -1867,6 +1881,10 @@ func (m *metadataV1) UnmarshalJSON(b []byte) error {
                return err
        }
 
+       if err := rejectV3OnlyFields(1, aux.NextRowID, aux.EncryptionKeyList); 
err != nil {

Review Comment:
   This is the main concern, and it applies symmetrically to the v2 call site 
at line 1954.
   
   Java's `TableMetadataParser.fromJson` doesn't error on these fields in v1/v2 
— it reads `next-row-id` only when `formatVersion >= 3` (defaulting to 
`INITIAL_ROW_ID` otherwise) and has no version guard on `encryption-keys` at 
all. PyIceberg silently drops unknown fields via Pydantic's model boundary; 
iceberg-rust does the same via serde. A downgrade tool that produces v2 
metadata with a populated `encryption-keys` is readable by every other client 
and rejected only by Go.
   
   If the intent here is genuinely to be stricter than the reference (worth 
confirming in #1006 discussion), at minimum the sentinel should be 
`ErrInvalidMetadataFormatVersion` (or a new dedicated 
`ErrV3FieldInPreV3Metadata`) rather than `ErrInvalidMetadata` — callers 
`errors.Is`-ing on `ErrInvalidMetadata` today expect "corrupt file," not 
"forward-compat version mismatch."
   
   Related: `last-sequence-number` is a v2+ field on the shared 
`commonMetadata` and is silently ignored on v1 reads today. If the project 
keeps strict-rejection here, the same logic should apply there for consistency 
— otherwise the gate isn't principled. (Probably a follow-up rather than 
blocking.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to