Just talked with Aihua and he's working on the Spec PR now. We can get feedback there from everyone.
On Fri, Jul 12, 2024 at 3:41 PM Ryan Blue <b...@databricks.com.invalid> wrote: > Good idea, but I'm hoping that we can continue to get their feedback in > parallel to getting the spec changes started. Piotr didn't seem to object > to the encoding from what I read of his comments. Hopefully he (and others) > chime in here. > > On Fri, Jul 12, 2024 at 1:32 PM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> I just want to make sure we get Piotr and Peter on board as >> representatives of Flink and Trino engines. Also make sure we have anyone >> else chime in who has experience with Ray if possible. >> >> Spec changes feel like the right next step. >> >> On Fri, Jul 12, 2024 at 3:14 PM Ryan Blue <b...@databricks.com.invalid> >> wrote: >> >>> Okay, what are the next steps here? This proposal has been out for quite >>> a while and I don't see any major objections to using the Spark encoding. >>> It's quite well designed and fits the need well. It can also be extended to >>> support additional types that are missing if that's a priority. >>> >>> Should we move forward by starting a draft of the changes to the table >>> spec? Then we can vote on committing those changes and get moving on an >>> implementation (or possibly do the implementation in parallel). >>> >>> On Fri, Jul 12, 2024 at 1:08 PM Russell Spitzer < >>> russell.spit...@gmail.com> wrote: >>> >>>> That's fair, I'm sold on an Iceberg Module. >>>> >>>> On Fri, Jul 12, 2024 at 2:53 PM Ryan Blue <b...@databricks.com.invalid> >>>> wrote: >>>> >>>>> > Feels like eventually the encoding should land in parquet proper >>>>> right? >>>>> >>>>> What about using it in ORC? I don't know where it should end up. Maybe >>>>> Iceberg should make a standalone module from it? >>>>> >>>>> On Fri, Jul 12, 2024 at 12:38 PM Russell Spitzer < >>>>> russell.spit...@gmail.com> wrote: >>>>> >>>>>> Feels like eventually the encoding should land in parquet proper >>>>>> right? I'm fine with us just copying into Iceberg though for the time >>>>>> being. >>>>>> >>>>>> On Fri, Jul 12, 2024 at 2:31 PM Ryan Blue <b...@databricks.com.invalid> >>>>>> wrote: >>>>>> >>>>>>> Oops, it looks like I missed where Aihua brought this up in his last >>>>>>> email: >>>>>>> >>>>>>> > do we have an issue to directly use Spark implementation in >>>>>>> Iceberg? >>>>>>> >>>>>>> Yes, I think that we do have an issue using the Spark library. What >>>>>>> do you think about a Java implementation in Iceberg? >>>>>>> >>>>>>> Ryan >>>>>>> >>>>>>> On Fri, Jul 12, 2024 at 12:28 PM Ryan Blue <b...@databricks.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I raised the same point from Peter's email in a comment on the doc >>>>>>>> as well. There is a spark-variant_2.13 artifact that would be a much >>>>>>>> smaller scope than relying on large portions of Spark, but I even then >>>>>>>> I >>>>>>>> doubt that it is a good idea for Iceberg to depend on that because it >>>>>>>> is a >>>>>>>> Scala artifact and we would need to bring in a ton of Scala libs. I >>>>>>>> think >>>>>>>> what makes the most sense is to have an independent implementation of >>>>>>>> the >>>>>>>> spec in Iceberg. >>>>>>>> >>>>>>>> On Fri, Jul 12, 2024 at 11:51 AM Péter Váry < >>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Aihua, >>>>>>>>> Long time no see :) >>>>>>>>> Would this mean, that every engine which plans to support Variant >>>>>>>>> data type needs to add Spark as a dependency? Like Flink/Trino/Hive >>>>>>>>> etc? >>>>>>>>> Thanks, Peter >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jul 12, 2024, 19:10 Aihua Xu <aihu...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> Thanks Ryan. >>>>>>>>>> >>>>>>>>>> Yeah. That's another reason we want to pursue Spark encoding to >>>>>>>>>> keep compatibility for the open source engines. >>>>>>>>>> >>>>>>>>>> One more question regarding the encoding implementation: do we >>>>>>>>>> have an issue to directly use Spark implementation in Iceberg? >>>>>>>>>> Russell >>>>>>>>>> pointed out that Trino doesn't have Spark dependency and that could >>>>>>>>>> be a >>>>>>>>>> problem? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Aihua >>>>>>>>>> >>>>>>>>>> On 2024/07/12 15:02:06 Ryan Blue wrote: >>>>>>>>>> > Thanks, Aihua! >>>>>>>>>> > >>>>>>>>>> > I think that the encoding choice in the current doc is a good >>>>>>>>>> one. I went >>>>>>>>>> > through the Spark encoding in detail and it looks like a better >>>>>>>>>> choice than >>>>>>>>>> > the other candidate encodings for quickly accessing nested >>>>>>>>>> fields. >>>>>>>>>> > >>>>>>>>>> > Another reason to use the Spark type is that this is what >>>>>>>>>> Delta's variant >>>>>>>>>> > type is based on, so Parquet files in tables written by Delta >>>>>>>>>> could be >>>>>>>>>> > converted or used in Iceberg tables without needing to rewrite >>>>>>>>>> variant >>>>>>>>>> > data. (Also, note that I work at Databricks and have an >>>>>>>>>> interest in >>>>>>>>>> > increasing format compatibility.) >>>>>>>>>> > >>>>>>>>>> > Ryan >>>>>>>>>> > >>>>>>>>>> > On Thu, Jul 11, 2024 at 11:21 AM Aihua Xu < >>>>>>>>>> aihua...@snowflake.com.invalid> >>>>>>>>>> > wrote: >>>>>>>>>> > >>>>>>>>>> > > [Discuss] Consensus for Variant Encoding >>>>>>>>>> > > >>>>>>>>>> > > It’s great to be able to present the Variant type proposal in >>>>>>>>>> the >>>>>>>>>> > > community sync yesterday and I’m looking to host a meeting >>>>>>>>>> next week >>>>>>>>>> > > (targeting for 9am, July 17th) to go over any further >>>>>>>>>> concerns about the >>>>>>>>>> > > encoding of the Variant type and any other questions on the >>>>>>>>>> first phase of >>>>>>>>>> > > the proposal >>>>>>>>>> > > < >>>>>>>>>> https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit >>>>>>>>>> >. >>>>>>>>>> > > We are hoping that anyone who is interested in the proposal >>>>>>>>>> can either join >>>>>>>>>> > > or reply with their comments so we can discuss them. Summary >>>>>>>>>> of the >>>>>>>>>> > > discussion and notes will be sent to the mailing list for >>>>>>>>>> further comment >>>>>>>>>> > > there. >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > - >>>>>>>>>> > > >>>>>>>>>> > > What should be the underlying binary representation >>>>>>>>>> > > >>>>>>>>>> > > We have evaluated a few encodings in the doc including ION, >>>>>>>>>> JSONB, and >>>>>>>>>> > > Spark encoding.Choosing the underlying encoding is an >>>>>>>>>> important first step >>>>>>>>>> > > here and we believe we have general support for Spark’s >>>>>>>>>> Variant encoding. >>>>>>>>>> > > We would like to hear if anyone else has strong opinions in >>>>>>>>>> this space. >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > - >>>>>>>>>> > > >>>>>>>>>> > > Should we support multiple logical types or just Variant? >>>>>>>>>> Variant vs. >>>>>>>>>> > > Variant + JSON. >>>>>>>>>> > > >>>>>>>>>> > > This is to discuss what logical data type(s) to be supported >>>>>>>>>> in Iceberg - >>>>>>>>>> > > Variant only vs. Variant + JSON. Both types would share the >>>>>>>>>> same underlying >>>>>>>>>> > > encoding but would imply different limitations on engines >>>>>>>>>> working with >>>>>>>>>> > > those types. >>>>>>>>>> > > >>>>>>>>>> > > From the sync up meeting, we are more favoring toward >>>>>>>>>> supporting Variant >>>>>>>>>> > > only and we want to have a consensus on the supported type(s). >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > - >>>>>>>>>> > > >>>>>>>>>> > > How should we move forward with Subcolumnization? >>>>>>>>>> > > >>>>>>>>>> > > Subcolumnization is an optimization for Variant type by >>>>>>>>>> separating out >>>>>>>>>> > > subcolumns with their own metadata. This is not critical for >>>>>>>>>> choosing the >>>>>>>>>> > > initial encoding of the Variant type so we were hoping to >>>>>>>>>> gain consensus on >>>>>>>>>> > > leaving that for a follow up spec. >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > Thanks >>>>>>>>>> > > >>>>>>>>>> > > Aihua >>>>>>>>>> > > >>>>>>>>>> > > Meeting invite: >>>>>>>>>> > > >>>>>>>>>> > > Wednesday, July 17 · 9:00 – 10:00am >>>>>>>>>> > > Time zone: America/Los_Angeles >>>>>>>>>> > > Google Meet joining info >>>>>>>>>> > > Video call link: https://meet.google.com/pbm-ovzn-aoq >>>>>>>>>> > > Or dial: (US) +1 650-449-9343 PIN: 170 576 525# >>>>>>>>>> > > More phone numbers: >>>>>>>>>> https://tel.meet/pbm-ovzn-aoq?pin=4079632691790 >>>>>>>>>> > > >>>>>>>>>> > > On Tue, May 28, 2024 at 9:21 PM Aihua Xu < >>>>>>>>>> aihua...@snowflake.com> wrote: >>>>>>>>>> > > >>>>>>>>>> > >> Hello, >>>>>>>>>> > >> >>>>>>>>>> > >> We have drafted the proposal >>>>>>>>>> > >> < >>>>>>>>>> https://docs.google.com/document/d/1QjhpG_SVNPZh3anFcpicMQx90ebwjL7rmzFYfUP89Iw/edit >>>>>>>>>> > >>>>>>>>>> > >> for Variant data type. Please help review and comment. >>>>>>>>>> > >> >>>>>>>>>> > >> Thanks, >>>>>>>>>> > >> Aihua >>>>>>>>>> > >> >>>>>>>>>> > >> On Thu, May 16, 2024 at 12:45 PM Jack Ye < >>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>> > >> >>>>>>>>>> > >>> +10000 for a JSON/BSON type. We also had the same >>>>>>>>>> discussion internally >>>>>>>>>> > >>> and a JSON type would really play well with for example the >>>>>>>>>> SUPER type in >>>>>>>>>> > >>> Redshift: >>>>>>>>>> > >>> >>>>>>>>>> https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html, >>>>>>>>>> and >>>>>>>>>> > >>> can also provide better integration with the Trino JSON >>>>>>>>>> type. >>>>>>>>>> > >>> >>>>>>>>>> > >>> Looking forward to the proposal! >>>>>>>>>> > >>> >>>>>>>>>> > >>> Best, >>>>>>>>>> > >>> Jack Ye >>>>>>>>>> > >>> >>>>>>>>>> > >>> >>>>>>>>>> > >>> On Wed, May 15, 2024 at 9:37 AM Tyler Akidau >>>>>>>>>> > >>> <tyler.aki...@snowflake.com.invalid> wrote: >>>>>>>>>> > >>> >>>>>>>>>> > >>>> On Tue, May 14, 2024 at 7:58 PM Gang Wu <ust...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> > >>>> >>>>>>>>>> > >>>>> > We may need some guidance on just how many we need to >>>>>>>>>> look at; >>>>>>>>>> > >>>>> > we were planning on Spark and Trino, but weren't sure >>>>>>>>>> how much >>>>>>>>>> > >>>>> > further down the rabbit hole we needed to go。 >>>>>>>>>> > >>>>> >>>>>>>>>> > >>>>> There are some engines living outside the Java world. It >>>>>>>>>> would be >>>>>>>>>> > >>>>> good if the proposal could cover the effort it takes to >>>>>>>>>> integrate >>>>>>>>>> > >>>>> variant type to them (e.g. velox, datafusion, etc.). This >>>>>>>>>> is something >>>>>>>>>> > >>>>> that >>>>>>>>>> > >>>>> some proprietary iceberg vendors also care about. >>>>>>>>>> > >>>>> >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> Ack, makes sense. We can make sure to share some >>>>>>>>>> perspective on this. >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> > Not necessarily, no. As long as there's a binary type >>>>>>>>>> and Iceberg and >>>>>>>>>> > >>>>> > the query engines are aware that the binary column >>>>>>>>>> needs to be >>>>>>>>>> > >>>>> > interpreted as a variant, that should be sufficient. >>>>>>>>>> > >>>>> >>>>>>>>>> > >>>>> From the perspective of interoperability, it would be >>>>>>>>>> good to support >>>>>>>>>> > >>>>> native >>>>>>>>>> > >>>>> type from file specs. Life will be easier for projects >>>>>>>>>> like Apache >>>>>>>>>> > >>>>> XTable. >>>>>>>>>> > >>>>> File format could also provide finer-grained statistics >>>>>>>>>> for variant >>>>>>>>>> > >>>>> type which >>>>>>>>>> > >>>>> facilitates data skipping. >>>>>>>>>> > >>>>> >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> Agreed, there can definitely be additional value in native >>>>>>>>>> file format >>>>>>>>>> > >>>> integration. Just wanted to highlight that it's not a >>>>>>>>>> strict requirement. >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> -Tyler >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> >>>>>>>>>> > >>>>> >>>>>>>>>> > >>>>> Gang >>>>>>>>>> > >>>>> >>>>>>>>>> > >>>>> On Wed, May 15, 2024 at 6:49 AM Tyler Akidau >>>>>>>>>> > >>>>> <tyler.aki...@snowflake.com.invalid> wrote: >>>>>>>>>> > >>>>> >>>>>>>>>> > >>>>>> Good to see you again as well, JB! Thanks! >>>>>>>>>> > >>>>>> >>>>>>>>>> > >>>>>> -Tyler >>>>>>>>>> > >>>>>> >>>>>>>>>> > >>>>>> >>>>>>>>>> > >>>>>> On Tue, May 14, 2024 at 1:04 PM Jean-Baptiste Onofré < >>>>>>>>>> j...@nanthrax.net> >>>>>>>>>> > >>>>>> wrote: >>>>>>>>>> > >>>>>> >>>>>>>>>> > >>>>>>> Hi Tyler, >>>>>>>>>> > >>>>>>> >>>>>>>>>> > >>>>>>> Super happy to see you there :) It reminds me our >>>>>>>>>> discussions back in >>>>>>>>>> > >>>>>>> the start of Apache Beam :) >>>>>>>>>> > >>>>>>> >>>>>>>>>> > >>>>>>> Anyway, the thread is pretty interesting. I remember >>>>>>>>>> some discussions >>>>>>>>>> > >>>>>>> about JSON datatype for spec v3. The binary data type >>>>>>>>>> is already >>>>>>>>>> > >>>>>>> supported in the spec v2. >>>>>>>>>> > >>>>>>> >>>>>>>>>> > >>>>>>> I'm looking forward to the proposal and happy to help >>>>>>>>>> on this ! >>>>>>>>>> > >>>>>>> >>>>>>>>>> > >>>>>>> Regards >>>>>>>>>> > >>>>>>> JB >>>>>>>>>> > >>>>>>> >>>>>>>>>> > >>>>>>> On Sat, May 11, 2024 at 7:06 AM Tyler Akidau >>>>>>>>>> > >>>>>>> <tyler.aki...@snowflake.com.invalid> wrote: >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > Hello, >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > We (Tyler, Nileema, Selcuk, Aihua) are working on a >>>>>>>>>> proposal for >>>>>>>>>> > >>>>>>> which we’d like to get early feedback from the >>>>>>>>>> community. As you may know, >>>>>>>>>> > >>>>>>> Snowflake has embraced Iceberg as its open Data Lake >>>>>>>>>> format. Having made >>>>>>>>>> > >>>>>>> good progress on our own adoption of the Iceberg >>>>>>>>>> standard, we’re now in a >>>>>>>>>> > >>>>>>> position where there are features not yet supported in >>>>>>>>>> Iceberg which we >>>>>>>>>> > >>>>>>> think would be valuable for our users, and that we >>>>>>>>>> would like to discuss >>>>>>>>>> > >>>>>>> with and help contribute to the Iceberg community. >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > The first two such features we’d like to discuss are >>>>>>>>>> in support of >>>>>>>>>> > >>>>>>> efficient querying of dynamically typed, >>>>>>>>>> semi-structured data: variant data >>>>>>>>>> > >>>>>>> types, and subcolumnarization of variant columns. In >>>>>>>>>> more detail, for >>>>>>>>>> > >>>>>>> anyone who may not already be familiar: >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > 1. Variant data types >>>>>>>>>> > >>>>>>> > Variant types allow for the efficient binary encoding >>>>>>>>>> of dynamic >>>>>>>>>> > >>>>>>> semi-structured data such as JSON, Avro, etc. By >>>>>>>>>> encoding semi-structured >>>>>>>>>> > >>>>>>> data as a variant column, we retain the flexibility of >>>>>>>>>> the source data, >>>>>>>>>> > >>>>>>> while allowing query engines to more efficiently >>>>>>>>>> operate on the data. >>>>>>>>>> > >>>>>>> Snowflake has supported the variant data type on >>>>>>>>>> Snowflake tables for many >>>>>>>>>> > >>>>>>> years [1]. As more and more users utilize Iceberg >>>>>>>>>> tables in Snowflake, >>>>>>>>>> > >>>>>>> we’re hearing an increasing chorus of requests for >>>>>>>>>> variant support. >>>>>>>>>> > >>>>>>> Additionally, other query engines such as Apache Spark >>>>>>>>>> have begun adding >>>>>>>>>> > >>>>>>> variant support [2]. As such, we believe it would be >>>>>>>>>> beneficial to the >>>>>>>>>> > >>>>>>> Iceberg community as a whole to standardize on the >>>>>>>>>> variant data type >>>>>>>>>> > >>>>>>> encoding used across Iceberg tables. >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > One specific point to make here is that, since an >>>>>>>>>> Apache OSS >>>>>>>>>> > >>>>>>> version of variant encoding already exists in Spark, it >>>>>>>>>> likely makes sense >>>>>>>>>> > >>>>>>> to simply adopt the Spark encoding as the Iceberg >>>>>>>>>> standard as well. The >>>>>>>>>> > >>>>>>> encoding we use internally today in Snowflake is >>>>>>>>>> slightly different, but >>>>>>>>>> > >>>>>>> essentially equivalent, and we see no particular value >>>>>>>>>> in trying to clutter >>>>>>>>>> > >>>>>>> the space with another equivalent-but-incompatible >>>>>>>>>> encoding. >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > 2. Subcolumnarization >>>>>>>>>> > >>>>>>> > Subcolumnarization of variant columns allows query >>>>>>>>>> engines to >>>>>>>>>> > >>>>>>> efficiently prune datasets when subcolumns (i.e., >>>>>>>>>> nested fields) within a >>>>>>>>>> > >>>>>>> variant column are queried, and also allows optionally >>>>>>>>>> materializing some >>>>>>>>>> > >>>>>>> of the nested fields as a column on their own, >>>>>>>>>> affording queries on these >>>>>>>>>> > >>>>>>> subcolumns the ability to read less data and spend less >>>>>>>>>> CPU on extraction. >>>>>>>>>> > >>>>>>> When subcolumnarizing, the system managing table >>>>>>>>>> metadata and data tracks >>>>>>>>>> > >>>>>>> individual pruning statistics (min, max, null, etc.) >>>>>>>>>> for some subset of the >>>>>>>>>> > >>>>>>> nested fields within a variant, and also manages any >>>>>>>>>> optional >>>>>>>>>> > >>>>>>> materialization. Without subcolumnarization, any query >>>>>>>>>> which touches a >>>>>>>>>> > >>>>>>> variant column must read, parse, extract, and filter >>>>>>>>>> every row for which >>>>>>>>>> > >>>>>>> that column is non-null. Thus, by providing a >>>>>>>>>> standardized way of tracking >>>>>>>>>> > >>>>>>> subcolum metadata and data for variant columns, Iceberg >>>>>>>>>> can make >>>>>>>>>> > >>>>>>> subcolumnar optimizations accessible across various >>>>>>>>>> catalogs and query >>>>>>>>>> > >>>>>>> engines. >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > Subcolumnarization is a non-trivial topic, so we >>>>>>>>>> expect any >>>>>>>>>> > >>>>>>> concrete proposal to include not only the set of >>>>>>>>>> changes to Iceberg >>>>>>>>>> > >>>>>>> metadata that allow compatible query engines to >>>>>>>>>> interopate on >>>>>>>>>> > >>>>>>> subcolumnarization data for variant columns, but also >>>>>>>>>> reference >>>>>>>>>> > >>>>>>> documentation explaining subcolumnarization principles >>>>>>>>>> and recommended best >>>>>>>>>> > >>>>>>> practices. >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > It sounds like the recent Geo proposal [3] may be a >>>>>>>>>> good starting >>>>>>>>>> > >>>>>>> point for how to approach this, so our plan is to write >>>>>>>>>> something up in >>>>>>>>>> > >>>>>>> that vein that covers the proposed spec changes, >>>>>>>>>> backwards compatibility, >>>>>>>>>> > >>>>>>> implementor burdens, etc. But we wanted to first reach >>>>>>>>>> out to the community >>>>>>>>>> > >>>>>>> to introduce ourselves and the idea, and see if there’s >>>>>>>>>> any early feedback >>>>>>>>>> > >>>>>>> we should incorporate before we spend too much time on >>>>>>>>>> a concrete proposal. >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > Thank you! >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > [1] >>>>>>>>>> > >>>>>>> >>>>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-semistructured >>>>>>>>>> > >>>>>>> > [2] >>>>>>>>>> > >>>>>>> >>>>>>>>>> https://github.com/apache/spark/blob/master/common/variant/README.md >>>>>>>>>> > >>>>>>> > [3] >>>>>>>>>> > >>>>>>> >>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> > -Tyler, Nileema, Selcuk, Aihua >>>>>>>>>> > >>>>>>> > >>>>>>>>>> > >>>>>>> >>>>>>>>>> > >>>>>> >>>>>>>>>> > >>>>>>>>>> > -- >>>>>>>>>> > Ryan Blue >>>>>>>>>> > Databricks >>>>>>>>>> > >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ryan Blue >>>>>>>> Databricks >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> Databricks >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Databricks >>>>> >>>> >>> >>> -- >>> Ryan Blue >>> Databricks >>> >> > > -- > Ryan Blue > Databricks >