Accuracy review on Variant contribution description.

Julien Le Dem Tue, 09 Dec 2025 18:47:56 -0800

Hello all,
I'm writing a blog post on my personal blog and I have a section where I
use Variant as an example of collaboration (see content below). I'm trying
to give credit to everyone involved but I'm sure I'm forgetting someone.
Could you please tell me if you think I should change something or add
someone? Either on this thread or privately. I'll be happy to fix it.
(NB: This is not a substitute for a Variant post on the Parquet blog that
some of you would get the fame of being the author of. nudge nudge :) )
Thank you!
 The excerpt:


> ## Case Study: The Variant Type


> To give you an example of how bigger changes make their way into Parquet,
> about a year ago, engineers made an initial proposal to find a neutral home
> for the [variant type](
> https://github.com/apache/parquet-format/blob/master/VariantEncoding.md)
> that was [at the time in Spark](
> https://github.com/apache/spark/blob/d84f1a3575c4125009374521d2f179089ebd71ad/common/variant/README.md).
> Variant is akin to a binary representation of JSON. It separates the field
> names in one column and the values in another. You can selectively shred a
> subset of the fields into their own column. It is useful when you have
> unknown field cardinality or too many sparse fields in your data.
> The big question was [whether this new type should be defined in Spark,
> Arrow, Iceberg or Parquet](
> https://lists.apache.org/thread/6h58hj39lhqtcyd2hlsyvqm4lzdh4b9z). What
> made the most sense, knowing that all of those projects (and more) would
> end up using it?


> We agreed to put it in Parquet. Then we worked as a community to [finalize
> a consensus on the spec](
> https://lists.apache.org/thread/obn1yzhgm5zlznwrdpg7f66mswwooxw7). We
> needed to make sure everybody was on the same page. We changed a few
> things, made sure we all agreed, and then implemented it across the
> ecosystem. (Thanks to Gang, Aihua, Gene, Micah, Andrew, Ryan, Yufei,
> Jiaying, Martin, Aditya, Matt, Antoine, Daniel, Russell and many others)


> The community produced multiple implementations in multiple systems, open
> source or not and collaborated on cross-compatibility tests to make sure we
> were building compatible systems. This included individuals from
> Databricks, Snowflake, Google, Tabular, Datadog, CMU, InfluxData, Dremio
> and more (I'm sorry, if I forgot you, please reach out and I'll add you
> here).


> Now we know that when a Variant is written in one system, it's going to be
> read correctly in another. From Databricks to Snowflake and BigQuery and
> from Datafusion to Duckdb and Spark, No surprises. (And Dremio, and
> InfluxDB, etc)

Accuracy review on Variant contribution description.

Reply via email to