Hi Iceberg Community, I'm working with Iceberg v3 and trying to understand the practical use cases for the unknown type, especially in relation to the variant type.
The variant type handles both semi-structured data (JSON, nested objects/arrays) and primitive types (strings, integers, booleans, dates, timestamps, etc.) with efficient binary encoding. It supports schema evolution and provides good query performance. The unknown type is described as being for "evolving schemas without forcing immediate resolution" and must always default to null. 1. Given that variant can store any data type (both structured and primitive), I'm unclear when unknown would be preferred as similar behavior could be achieved by adding nullable variant columns? It seems like variant could handle most schema evolution scenarios. Are there specific situations where unknown is the better choice? 2. Also, is unknown intended for explicit use in DDL? Meaning, should users write DDL like: CREATE TABLE foo (col1 unknown)ALTER TABLE foo ADD COLUMN col2 unknown Or is unknown an internal type that engines use automatically during schema evolution? Cheers, Joana Hrotkó
