Here is a PR to change the default format version in the library: https://github.com/apache/iceberg/pull/8381
There are some failing REST catalog tests, which look like bugs. I'd appreciate if someone could take a look. I will also check the remaining tests later today. - Anton On 2023/05/29 17:25:41 Ryan Blue wrote: > Since the last time we discussed this, we've also updated our default > version to v2. I definitely like the idea we settled on last time, that > this is an administrator setting and it can be controlled already by > catalog deployments. However, I'm coming around on updating the library > default to v2. > > My rationale is that we want people that are setting up Iceberg data > platforms (administrator roles) to be as successful as possible without > knowing all the internal details. While you _can_ set this at the catalog > level, those new platform administrators don't know to do that. So I'd > probably opt to make this v2 now. > > Ryan > > On Thu, May 25, 2023 at 2:51 PM Steven Wu <stevenz...@gmail.com> wrote: > > > +1. Anton made a good case with the new perspective. > > > > On Thu, May 25, 2023 at 2:29 PM Anton Okolnychyi > > <aokolnyc...@apple.com.invalid> wrote: > > > >> Oh, I missed the earlier discussion. Thanks for sharing it, Gabor! > >> > >> I am approaching this from a slightly different perspective. Defaulting > >> to v2 does not mean supporting delete files. My primary concern is that our > >> default behavior may be either confusing or inefficient. For instance, > >> using always null transforms in v1 spec evolution is hard to explain to > >> users. Not enabling snapshot ID inheritance means rewriting manifests in > >> huge tables can take hours. Managed catalogs or teams that run forks have > >> more control over tables and can make better choices but I also worry about > >> folks that just start with Iceberg and use built-in catalogs. > >> > >> Can we think of potential issues with having a v2 table with no delete > >> files vs a v1 table? > >> > >> - Anton > >> > >> On May 24, 2023, at 10:43 PM, Szehon Ho <szehon.apa...@gmail.com> wrote: > >> > >> Hi, > >> > >> I'm +1 to making v2 the default, say after this release. > >> > >> It seems most of the features brought up as concerns on Spark side in the > >> thread Gabor linked have been implemented (like position delete lifecycle). > >> > >> But Anton's point is also good. Even if some delete file features are > >> missing, V2 is not only about delete files, which are not produced by > >> default in Spark, and Flink(?), but rather the fixes for partition spec > >> evolution / snapshot id inheritance. Hence it makes sense to me, from that > >> angle. > >> > >> Thanks > >> Szehon > >> > >> On Wed, May 24, 2023 at 12:34 AM Gabor Kaszab < > >> gaborkas...@cloudera.com.invalid> wrote: > >> > >>> Hey Anton, > >>> > >>> Just adding a note that back around January the same topic was brought > >>> up on this mail list. There the conclusion was to use the 'table-default.' > >>> catalog level property to create V2 tables by default. > >>> https://lists.apache.org/thread/9ct0p817qxqqdnv7nb35kghsfygjkqdf > >>> > >>> I'm not saying that we shouldn't default to V2 just drawing attention to > >>> this previous conversation. > >>> > >>> Cheers, > >>> Gabor > >>> > >>> On Wed, May 24, 2023 at 12:04 AM Anton Okolnychyi < > >>> aokolnyc...@apple.com.invalid> wrote: > >>> > >>>> Hi folks, > >>>> > >>>> Would it be appropriate for us to consider changing the default table > >>>> format version for new tables from v1 to v2? > >>>> > >>>> I don’t think defaulting to v2 tables means all readers have to support > >>>> delete files. DELETE, UPDATE, MERGE operations will only produce delete > >>>> files if configured explicitly. > >>>> > >>>> The primary reason I am starting this thread is to avoid our > >>>> workarounds in v1 spec evolution, and snapshot ID inheritance. The latter > >>>> is critical for the performance of rewriting manifests. > >>>> > >>>> Any thoughts? > >>>> > >>>> - Anton > >>> > >>> > >> > > -- > Ryan Blue > Tabular >