peterphitran commented on issue #38925:
URL: https://github.com/apache/beam/issues/38925#issuecomment-4756870724

   **Validated design — moving to PRs**
   
   After parsing Iceberg 1.11.0 sources end-to-end and running a POM-level 
dependency diff against 1.10.0, the migration is scoped and tractable. 
Splitting into two PRs for bisectability:
   
   **PR1 — Dependency bump (zero behavior change).** Iceberg 1.10.0 → 1.11.0 in 
`sdks/java/io/iceberg/build.gradle`. Adds `resolutionStrategy.force` pins for 
`parquet-avro` and `parquet-hadoop` to hold them at 1.16.0 (Iceberg 1.11.0 
transitively pulls 1.17.1, which would cross Beam's existing pin). Old 
per-format builders (`Parquet.read()`, `Avro.writeData()`, etc.) are only 
deprecated in 1.11.0, not removed — PR1 compiles unchanged.
   
   **PR2 — FormatModel SPI migration.** Replaces three `switch (fileFormat)` 
blocks with `FormatModelRegistry` lookups:
   - `RecordWriter.java:82-107` — write switch
   - `ScanTaskReader.java:128-181` — read switch
   - `ReadUtils.java:76-143` — CDC read path
   
   Free wins: ORC writes start working (`RecordWriter:104` currently throws 
`UnsupportedOperationException`, which the latent 
`IcebergWriteSchemaTransformProviderTest` `write.format.default=orc` case 
reaches); ~80 lines of dispatch logic deleted; `idToConstants` becomes a 
builder arg instead of a closure capture; `EncryptedOutputFile` no longer needs 
to be unwrapped.
   
   Out of scope: `AddFiles.getFileMetrics` per-format metric extraction 
(deferred — new SPI is read/write-centric), test scaffolding (hardcoded Parquet 
fixtures, no switch to delete), `AppendFilesToTables` manifest format 
(Iceberg-spec-mandated Avro, unrelated). No public API change — `IcebergIO` is 
internal-only via `Managed.write/read(ICEBERG)`.
   
   Design doc (ASF template, comments open): 
https://docs.google.com/document/d/1rXYP4kgpiIPfZtX5s8bFGkIAbtgYaavi470pEo4E554/edit
   
   Will also announce on [email protected] with `[DISCUSS]` for committer 
feedback before PR2 lands; PR1 will proceed in parallel since it's 
zero-behavior-change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to