Hi Iceberg Community,

I'm working with Iceberg v3 and trying to understand the practical use
cases for the unknown type, especially in relation to the variant type.

The variant type handles both semi-structured data (JSON, nested
objects/arrays) and primitive types (strings, integers, booleans, dates,
timestamps, etc.) with efficient binary encoding. It supports schema
evolution and provides good query performance.

The unknown type is described as being for "evolving schemas without
forcing immediate resolution" and must always default to null.

1. Given that variant can store any data type (both structured and
primitive), I'm unclear when unknown would be preferred as similar behavior
could be achieved by adding nullable variant columns? It seems like variant
could handle most schema evolution scenarios. Are there specific situations
where unknown is the better choice?

2. Also, is unknown intended for explicit use in DDL? Meaning, should users
write DDL like:

CREATE TABLE foo (col1 unknown)ALTER TABLE foo ADD COLUMN col2 unknown

Or is unknown an internal type that engines use automatically during schema
evolution?

Cheers,

Joana Hrotkó

Reply via email to