Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/22573
@dongjoon-hyun, Iceberg schema evolution is based on the field IDs, not on
names. The current table schema's names are the runtime names for columns in
that table, and all reads happen by first translating those names to IDs and
projecting the IDs from the data files. That way, renames can never cause you
to get incorrect data.
You're mostly right that Spark has a problem with schema evolution for
HadoopFS tables. That wouldn't affect my suggestion here, though. If you're
filtering or projecting field `m.n`, then Spark currently handles that by
matching columns by name. If you're matching by name, then `m.n` can't change
across versions, or at least you can always project `m.n` from the data (in the
case of Avro).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]