Re: [Early Feedback] Variant and Subcolumnarization Support

Russell Spitzer Fri, 12 Jul 2024 13:43:42 -0700

Just talked with Aihua and he's working on the Spec PR now. We can get
feedback there from everyone.


On Fri, Jul 12, 2024 at 3:41 PM Ryan Blue <b...@databricks.com.invalid>
wrote:

> Good idea, but I'm hoping that we can continue to get their feedback in
> parallel to getting the spec changes started. Piotr didn't seem to object
> to the encoding from what I read of his comments. Hopefully he (and others)
> chime in here.
>
> On Fri, Jul 12, 2024 at 1:32 PM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> I just want to make sure we get Piotr and Peter on board as
>> representatives of Flink and Trino engines. Also make sure we have anyone
>> else chime in who has experience with Ray if possible.
>>
>> Spec changes feel like the right next step.
>>
>> On Fri, Jul 12, 2024 at 3:14 PM Ryan Blue <b...@databricks.com.invalid>
>> wrote:
>>
>>> Okay, what are the next steps here? This proposal has been out for quite
>>> a while and I don't see any major objections to using the Spark encoding.
>>> It's quite well designed and fits the need well. It can also be extended to
>>> support additional types that are missing if that's a priority.
>>>
>>> Should we move forward by starting a draft of the changes to the table
>>> spec? Then we can vote on committing those changes and get moving on an
>>> implementation (or possibly do the implementation in parallel).
>>>
>>> On Fri, Jul 12, 2024 at 1:08 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> That's fair, I'm sold on an Iceberg Module.
>>>>
>>>> On Fri, Jul 12, 2024 at 2:53 PM Ryan Blue <b...@databricks.com.invalid>
>>>> wrote:
>>>>
>>>>> > Feels like eventually the encoding should land in parquet proper
>>>>> right?
>>>>>
>>>>> What about using it in ORC? I don't know where it should end up. Maybe
>>>>> Iceberg should make a standalone module from it?
>>>>>
>>>>> On Fri, Jul 12, 2024 at 12:38 PM Russell Spitzer <
>>>>> russell.spit...@gmail.com> wrote:
>>>>>
>>>>>> Feels like eventually the encoding should land in parquet proper
>>>>>> right? I'm fine with us just copying into Iceberg though for the time
>>>>>> being.
>>>>>>
>>>>>> On Fri, Jul 12, 2024 at 2:31 PM Ryan Blue <b...@databricks.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> Oops, it looks like I missed where Aihua brought this up in his last
>>>>>>> email:
>>>>>>>
>>>>>>> > do we have an issue to directly use Spark implementation in
>>>>>>> Iceberg?
>>>>>>>
>>>>>>> Yes, I think that we do have an issue using the Spark library. What
>>>>>>> do you think about a Java implementation in Iceberg?
>>>>>>>
>>>>>>> Ryan
>>>>>>>
>>>>>>> On Fri, Jul 12, 2024 at 12:28 PM Ryan Blue <b...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I raised the same point from Peter's email in a comment on the doc
>>>>>>>> as well. There is a spark-variant_2.13 artifact that would be a much
>>>>>>>> smaller scope than relying on large portions of Spark, but I even then 
>>>>>>>> I
>>>>>>>> doubt that it is a good idea for Iceberg to depend on that because it 
>>>>>>>> is a
>>>>>>>> Scala artifact and we would need to bring in a ton of Scala libs. I 
>>>>>>>> think
>>>>>>>> what makes the most sense is to have an independent implementation of 
>>>>>>>> the
>>>>>>>> spec in Iceberg.
>>>>>>>>
>>>>>>>> On Fri, Jul 12, 2024 at 11:51 AM Péter Váry <
>>>>>>>> peter.vary.apa...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Aihua,
>>>>>>>>> Long time no see :)
>>>>>>>>> Would this mean, that every engine which plans to support Variant
>>>>>>>>> data type needs to add Spark as a dependency? Like Flink/Trino/Hive 
>>>>>>>>> etc?
>>>>>>>>> Thanks, Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 12, 2024, 19:10 Aihua Xu <aihu...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Ryan.
>>>>>>>>>>
>>>>>>>>>> Yeah. That's another reason we want to pursue Spark encoding to
>>>>>>>>>> keep compatibility for the open source engines.
>>>>>>>>>>
>>>>>>>>>> One more question regarding the encoding implementation: do we
>>>>>>>>>> have an issue to directly use Spark implementation in Iceberg? 
>>>>>>>>>> Russell
>>>>>>>>>> pointed out that Trino doesn't have Spark dependency and that could 
>>>>>>>>>> be a
>>>>>>>>>> problem?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Aihua
>>>>>>>>>>
>>>>>>>>>> On 2024/07/12 15:02:06 Ryan Blue wrote:
>>>>>>>>>> > Thanks, Aihua!
>>>>>>>>>> >
>>>>>>>>>> > I think that the encoding choice in the current doc is a good
>>>>>>>>>> one. I went
>>>>>>>>>> > through the Spark encoding in detail and it looks like a better
>>>>>>>>>> choice than
>>>>>>>>>> > the other candidate encodings for quickly accessing nested
>>>>>>>>>> fields.
>>>>>>>>>> >
>>>>>>>>>> > Another reason to use the Spark type is that this is what
>>>>>>>>>> Delta's variant
>>>>>>>>>> > type is based on, so Parquet files in tables written by Delta
>>>>>>>>>> could be
>>>>>>>>>> > converted or used in Iceberg tables without needing to rewrite
>>>>>>>>>> variant
>>>>>>>>>> > data. (Also, note that I work at Databricks and have an
>>>>>>>>>> interest in
>>>>>>>>>> > increasing format compatibility.)
>>>>>>>>>> >
>>>>>>>>>> > Ryan
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Jul 11, 2024 at 11:21 AM Aihua Xu <
>>>>>>>>>> aihua...@snowflake.com.invalid>
>>>>>>>>>> > wrote:
>>>>>>>>>> >
>>>>>>>>>> > > [Discuss] Consensus for Variant Encoding
>>>>>>>>>> > >
>>>>>>>>>> > > It’s great to be able to present the Variant type proposal in
>>>>>>>>>> the
>>>>>>>>>> > > community sync yesterday and I’m looking to host a meeting
>>>>>>>>>> next week
>>>>>>>>>> > > (targeting for 9am, July 17th) to go over any further
>>>>>>>>>> concerns about the
>>>>>>>>>> > > encoding of the Variant type and any other questions on the
>>>>>>>>>> first phase of
>>>>>>>>>> > > the proposal
>>>>>>>>>> > > <
>>>>>>>>>> https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit
>>>>>>>>>> >.
>>>>>>>>>> > > We are hoping that anyone who is interested in the proposal
>>>>>>>>>> can either join
>>>>>>>>>> > > or reply with their comments so we can discuss them. Summary
>>>>>>>>>> of the
>>>>>>>>>> > > discussion and notes will be sent to the mailing list for
>>>>>>>>>> further comment
>>>>>>>>>> > > there.
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >    -
>>>>>>>>>> > >
>>>>>>>>>> > >    What should be the underlying binary representation
>>>>>>>>>> > >
>>>>>>>>>> > > We have evaluated a few encodings in the doc including ION,
>>>>>>>>>> JSONB, and
>>>>>>>>>> > > Spark encoding.Choosing the underlying encoding is an
>>>>>>>>>> important first step
>>>>>>>>>> > > here and we believe we have general support for Spark’s
>>>>>>>>>> Variant encoding.
>>>>>>>>>> > > We would like to hear if anyone else has strong opinions in
>>>>>>>>>> this space.
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >    -
>>>>>>>>>> > >
>>>>>>>>>> > >    Should we support multiple logical types or just Variant?
>>>>>>>>>> Variant vs.
>>>>>>>>>> > >    Variant + JSON.
>>>>>>>>>> > >
>>>>>>>>>> > > This is to discuss what logical data type(s) to be supported
>>>>>>>>>> in Iceberg -
>>>>>>>>>> > > Variant only vs. Variant + JSON. Both types would share the
>>>>>>>>>> same underlying
>>>>>>>>>> > > encoding but would imply different limitations on engines
>>>>>>>>>> working with
>>>>>>>>>> > > those types.
>>>>>>>>>> > >
>>>>>>>>>> > > From the sync up meeting, we are more favoring toward
>>>>>>>>>> supporting Variant
>>>>>>>>>> > > only and we want to have a consensus on the supported type(s).
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > >    -
>>>>>>>>>> > >
>>>>>>>>>> > >    How should we move forward with Subcolumnization?
>>>>>>>>>> > >
>>>>>>>>>> > > Subcolumnization is an optimization for Variant type by
>>>>>>>>>> separating out
>>>>>>>>>> > > subcolumns with their own metadata. This is not critical for
>>>>>>>>>> choosing the
>>>>>>>>>> > > initial encoding of the Variant type so we were hoping to
>>>>>>>>>> gain consensus on
>>>>>>>>>> > > leaving that for a follow up spec.
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks
>>>>>>>>>> > >
>>>>>>>>>> > > Aihua
>>>>>>>>>> > >
>>>>>>>>>> > > Meeting invite:
>>>>>>>>>> > >
>>>>>>>>>> > > Wednesday, July 17 · 9:00 – 10:00am
>>>>>>>>>> > > Time zone: America/Los_Angeles
>>>>>>>>>> > > Google Meet joining info
>>>>>>>>>> > > Video call link: https://meet.google.com/pbm-ovzn-aoq
>>>>>>>>>> > > Or dial: ‪(US) +1 650-449-9343‬ PIN: ‪170 576 525‬#
>>>>>>>>>> > > More phone numbers:
>>>>>>>>>> https://tel.meet/pbm-ovzn-aoq?pin=4079632691790
>>>>>>>>>> > >
>>>>>>>>>> > > On Tue, May 28, 2024 at 9:21 PM Aihua Xu <
>>>>>>>>>> aihua...@snowflake.com> wrote:
>>>>>>>>>> > >
>>>>>>>>>> > >> Hello,
>>>>>>>>>> > >>
>>>>>>>>>> > >> We have drafted the proposal
>>>>>>>>>> > >> <
>>>>>>>>>> https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit
>>>>>>>>>> >
>>>>>>>>>> > >> for Variant data type. Please help review and comment.
>>>>>>>>>> > >>
>>>>>>>>>> > >> Thanks,
>>>>>>>>>> > >> Aihua
>>>>>>>>>> > >>
>>>>>>>>>> > >> On Thu, May 16, 2024 at 12:45 PM Jack Ye <
>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>> > >>
>>>>>>>>>> > >>> +10000 for a JSON/BSON type. We also had the same
>>>>>>>>>> discussion internally
>>>>>>>>>> > >>> and a JSON type would really play well with for example the
>>>>>>>>>> SUPER type in
>>>>>>>>>> > >>> Redshift:
>>>>>>>>>> > >>>
>>>>>>>>>> https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html,
>>>>>>>>>> and
>>>>>>>>>> > >>> can also provide better integration with the Trino JSON
>>>>>>>>>> type.
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> Looking forward to the proposal!
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> Best,
>>>>>>>>>> > >>> Jack Ye
>>>>>>>>>> > >>>
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> On Wed, May 15, 2024 at 9:37 AM Tyler Akidau
>>>>>>>>>> > >>> <tyler.aki...@snowflake.com.invalid> wrote:
>>>>>>>>>> > >>>
>>>>>>>>>> > >>>> On Tue, May 14, 2024 at 7:58 PM Gang Wu <ust...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>>> > We may need some guidance on just how many we need to
>>>>>>>>>> look at;
>>>>>>>>>> > >>>>> > we were planning on Spark and Trino, but weren't sure
>>>>>>>>>> how much
>>>>>>>>>> > >>>>> > further down the rabbit hole we needed to go。
>>>>>>>>>> > >>>>>
>>>>>>>>>> > >>>>> There are some engines living outside the Java world. It
>>>>>>>>>> would be
>>>>>>>>>> > >>>>> good if the proposal could cover the effort it takes to
>>>>>>>>>> integrate
>>>>>>>>>> > >>>>> variant type to them (e.g. velox, datafusion, etc.). This
>>>>>>>>>> is something
>>>>>>>>>> > >>>>> that
>>>>>>>>>> > >>>>> some proprietary iceberg vendors also care about.
>>>>>>>>>> > >>>>>
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> Ack, makes sense. We can make sure to share some
>>>>>>>>>> perspective on this.
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> > Not necessarily, no. As long as there's a binary type
>>>>>>>>>> and Iceberg and
>>>>>>>>>> > >>>>> > the query engines are aware that the binary column
>>>>>>>>>> needs to be
>>>>>>>>>> > >>>>> > interpreted as a variant, that should be sufficient.
>>>>>>>>>> > >>>>>
>>>>>>>>>> > >>>>> From the perspective of interoperability, it would be
>>>>>>>>>> good to support
>>>>>>>>>> > >>>>> native
>>>>>>>>>> > >>>>> type from file specs. Life will be easier for projects
>>>>>>>>>> like Apache
>>>>>>>>>> > >>>>> XTable.
>>>>>>>>>> > >>>>> File format could also provide finer-grained statistics
>>>>>>>>>> for variant
>>>>>>>>>> > >>>>> type which
>>>>>>>>>> > >>>>> facilitates data skipping.
>>>>>>>>>> > >>>>>
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> Agreed, there can definitely be additional value in native
>>>>>>>>>> file format
>>>>>>>>>> > >>>> integration. Just wanted to highlight that it's not a
>>>>>>>>>> strict requirement.
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> -Tyler
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>>>
>>>>>>>>>> > >>>>> Gang
>>>>>>>>>> > >>>>>
>>>>>>>>>> > >>>>> On Wed, May 15, 2024 at 6:49 AM Tyler Akidau
>>>>>>>>>> > >>>>> <tyler.aki...@snowflake.com.invalid> wrote:
>>>>>>>>>> > >>>>>
>>>>>>>>>> > >>>>>> Good to see you again as well, JB! Thanks!
>>>>>>>>>> > >>>>>>
>>>>>>>>>> > >>>>>> -Tyler
>>>>>>>>>> > >>>>>>
>>>>>>>>>> > >>>>>>
>>>>>>>>>> > >>>>>> On Tue, May 14, 2024 at 1:04 PM Jean-Baptiste Onofré <
>>>>>>>>>> j...@nanthrax.net>
>>>>>>>>>> > >>>>>> wrote:
>>>>>>>>>> > >>>>>>
>>>>>>>>>> > >>>>>>> Hi Tyler,
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> > >>>>>>> Super happy to see you there :) It reminds me our
>>>>>>>>>> discussions back in
>>>>>>>>>> > >>>>>>> the start of Apache Beam :)
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> > >>>>>>> Anyway, the thread is pretty interesting. I remember
>>>>>>>>>> some discussions
>>>>>>>>>> > >>>>>>> about JSON datatype for spec v3. The binary data type
>>>>>>>>>> is already
>>>>>>>>>> > >>>>>>> supported in the spec v2.
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> > >>>>>>> I'm looking forward to the proposal and happy to help
>>>>>>>>>> on this !
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> > >>>>>>> Regards
>>>>>>>>>> > >>>>>>> JB
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> > >>>>>>> On Sat, May 11, 2024 at 7:06 AM Tyler Akidau
>>>>>>>>>> > >>>>>>> <tyler.aki...@snowflake.com.invalid> wrote:
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > Hello,
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > We (Tyler, Nileema, Selcuk, Aihua) are working on a
>>>>>>>>>> proposal for
>>>>>>>>>> > >>>>>>> which we’d like to get early feedback from the
>>>>>>>>>> community. As you may know,
>>>>>>>>>> > >>>>>>> Snowflake has embraced Iceberg as its open Data Lake
>>>>>>>>>> format. Having made
>>>>>>>>>> > >>>>>>> good progress on our own adoption of the Iceberg
>>>>>>>>>> standard, we’re now in a
>>>>>>>>>> > >>>>>>> position where there are features not yet supported in
>>>>>>>>>> Iceberg which we
>>>>>>>>>> > >>>>>>> think would be valuable for our users, and that we
>>>>>>>>>> would like to discuss
>>>>>>>>>> > >>>>>>> with and help contribute to the Iceberg community.
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > The first two such features we’d like to discuss are
>>>>>>>>>> in support of
>>>>>>>>>> > >>>>>>> efficient querying of dynamically typed,
>>>>>>>>>> semi-structured data: variant data
>>>>>>>>>> > >>>>>>> types, and subcolumnarization of variant columns. In
>>>>>>>>>> more detail, for
>>>>>>>>>> > >>>>>>> anyone who may not already be familiar:
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > 1. Variant data types
>>>>>>>>>> > >>>>>>> > Variant types allow for the efficient binary encoding
>>>>>>>>>> of dynamic
>>>>>>>>>> > >>>>>>> semi-structured data such as JSON, Avro, etc. By
>>>>>>>>>> encoding semi-structured
>>>>>>>>>> > >>>>>>> data as a variant column, we retain the flexibility of
>>>>>>>>>> the source data,
>>>>>>>>>> > >>>>>>> while allowing query engines to more efficiently
>>>>>>>>>> operate on the data.
>>>>>>>>>> > >>>>>>> Snowflake has supported the variant data type on
>>>>>>>>>> Snowflake tables for many
>>>>>>>>>> > >>>>>>> years [1]. As more and more users utilize Iceberg
>>>>>>>>>> tables in Snowflake,
>>>>>>>>>> > >>>>>>> we’re hearing an increasing chorus of requests for
>>>>>>>>>> variant support.
>>>>>>>>>> > >>>>>>> Additionally, other query engines such as Apache Spark
>>>>>>>>>> have begun adding
>>>>>>>>>> > >>>>>>> variant support [2]. As such, we believe it would be
>>>>>>>>>> beneficial to the
>>>>>>>>>> > >>>>>>> Iceberg community as a whole to standardize on the
>>>>>>>>>> variant data type
>>>>>>>>>> > >>>>>>> encoding used across Iceberg tables.
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > One specific point to make here is that, since an
>>>>>>>>>> Apache OSS
>>>>>>>>>> > >>>>>>> version of variant encoding already exists in Spark, it
>>>>>>>>>> likely makes sense
>>>>>>>>>> > >>>>>>> to simply adopt the Spark encoding as the Iceberg
>>>>>>>>>> standard as well. The
>>>>>>>>>> > >>>>>>> encoding we use internally today in Snowflake is
>>>>>>>>>> slightly different, but
>>>>>>>>>> > >>>>>>> essentially equivalent, and we see no particular value
>>>>>>>>>> in trying to clutter
>>>>>>>>>> > >>>>>>> the space with another equivalent-but-incompatible
>>>>>>>>>> encoding.
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > 2. Subcolumnarization
>>>>>>>>>> > >>>>>>> > Subcolumnarization of variant columns allows query
>>>>>>>>>> engines to
>>>>>>>>>> > >>>>>>> efficiently prune datasets when subcolumns (i.e.,
>>>>>>>>>> nested fields) within a
>>>>>>>>>> > >>>>>>> variant column are queried, and also allows optionally
>>>>>>>>>> materializing some
>>>>>>>>>> > >>>>>>> of the nested fields as a column on their own,
>>>>>>>>>> affording queries on these
>>>>>>>>>> > >>>>>>> subcolumns the ability to read less data and spend less
>>>>>>>>>> CPU on extraction.
>>>>>>>>>> > >>>>>>> When subcolumnarizing, the system managing table
>>>>>>>>>> metadata and data tracks
>>>>>>>>>> > >>>>>>> individual pruning statistics (min, max, null, etc.)
>>>>>>>>>> for some subset of the
>>>>>>>>>> > >>>>>>> nested fields within a variant, and also manages any
>>>>>>>>>> optional
>>>>>>>>>> > >>>>>>> materialization. Without subcolumnarization, any query
>>>>>>>>>> which touches a
>>>>>>>>>> > >>>>>>> variant column must read, parse, extract, and filter
>>>>>>>>>> every row for which
>>>>>>>>>> > >>>>>>> that column is non-null. Thus, by providing a
>>>>>>>>>> standardized way of tracking
>>>>>>>>>> > >>>>>>> subcolum metadata and data for variant columns, Iceberg
>>>>>>>>>> can make
>>>>>>>>>> > >>>>>>> subcolumnar optimizations accessible across various
>>>>>>>>>> catalogs and query
>>>>>>>>>> > >>>>>>> engines.
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > Subcolumnarization is a non-trivial topic, so we
>>>>>>>>>> expect any
>>>>>>>>>> > >>>>>>> concrete proposal to include not only the set of
>>>>>>>>>> changes to Iceberg
>>>>>>>>>> > >>>>>>> metadata that allow compatible query engines to
>>>>>>>>>> interopate on
>>>>>>>>>> > >>>>>>> subcolumnarization data for variant columns, but also
>>>>>>>>>> reference
>>>>>>>>>> > >>>>>>> documentation explaining subcolumnarization principles
>>>>>>>>>> and recommended best
>>>>>>>>>> > >>>>>>> practices.
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > It sounds like the recent Geo proposal [3] may be a
>>>>>>>>>> good starting
>>>>>>>>>> > >>>>>>> point for how to approach this, so our plan is to write
>>>>>>>>>> something up in
>>>>>>>>>> > >>>>>>> that vein that covers the proposed spec changes,
>>>>>>>>>> backwards compatibility,
>>>>>>>>>> > >>>>>>> implementor burdens, etc. But we wanted to first reach
>>>>>>>>>> out to the community
>>>>>>>>>> > >>>>>>> to introduce ourselves and the idea, and see if there’s
>>>>>>>>>> any early feedback
>>>>>>>>>> > >>>>>>> we should incorporate before we spend too much time on
>>>>>>>>>> a concrete proposal.
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > Thank you!
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > [1]
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-semistructured
>>>>>>>>>> > >>>>>>> > [2]
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> https://github.com/apache/spark/blob/master/common/variant/README.md
>>>>>>>>>> > >>>>>>> > [3]
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>> > -Tyler, Nileema, Selcuk, Aihua
>>>>>>>>>> > >>>>>>> >
>>>>>>>>>> > >>>>>>>
>>>>>>>>>> > >>>>>>
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Ryan Blue
>>>>>>>>>> > Databricks
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ryan Blue
>>>>>>>> Databricks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Databricks
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Databricks
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Databricks
>>>
>>
>
> --
> Ryan Blue
> Databricks
>

Re: [Early Feedback] Variant and Subcolumnarization Support

Reply via email to