Is there any progress to make default format version a catalog property?

Thanks,
Manu

On Wed, Jan 18, 2023 at 5:43 PM Gabor Kaszab
<[email protected]> wrote:

> I also ran into this "table-default." setting
> <https://github.com/apache/iceberg/blob/35151fe17b47c0af22787db4e4964b0cfcfdb215/core/src/main/java/org/apache/iceberg/CatalogProperties.java#L30>
> prefix. For me it seems that it's a catalog level config so it's enough to
> provide e.g. "table-default.format-version" = "2" to each catalog as a
> startup flag. For me it seems that catalogs derived from
> BaseMetastoreCatalog use this table default prefix
> <https://github.com/apache/iceberg/blob/35151fe17b47c0af22787db4e4964b0cfcfdb215/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L148>
> .
>
> Gabor
>
> On Wed, Jan 18, 2023 at 12:00 AM Yufei Gu <[email protected]> wrote:
>
>> The functionality has been there if we are talking about setting the
>> default format at the Iceberg catalog.  For example, we can set a catalog
>> like this. All tables created will be v2 tables.
>> spark.sql.catalog.hive_prod.table-default.format-version = "2"
>>
>> Of course, we need to set it for each Spark App. Setting Trino would be
>> easier. It would be one catalog level change.
>>
>> Best,
>>
>> Yufei
>>
>> `This is not a contribution`
>>
>>
>> On Mon, Jan 16, 2023 at 1:34 AM Gabor Kaszab
>> <[email protected]> wrote:
>>
>>> It seems we have a consensus on the approach. I can take a look at
>>> implementing this if no one has any objections.
>>>
>>> Gabor
>>>
>>> On Fri, Jan 13, 2023 at 11:28 PM Ryan Blue <[email protected]> wrote:
>>>
>>>> That sounds like a good idea to me.
>>>>
>>>> On Fri, Jan 13, 2023 at 11:04 AM Jack Ye <[email protected]> wrote:
>>>>
>>>>> > I think the issue is that all of the built-in catalogs currently
>>>>> call the version of `newTableMetadata` that defaults to v1.
>>>>>
>>>>> Yes I think this seems like the key issue for the catalogs that extend
>>>>> BaseMetastoreCatalog. Looks like we should make changes to make the 
>>>>> default
>>>>> format version a catalog property, instead of hard-coded in TableMetadata?
>>>>>
>>>>> -Jack
>>>>>
>>>>> On Thu, Jan 12, 2023 at 11:47 PM Jean-Baptiste Onofré <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Gabor,
>>>>>>
>>>>>> It makes sense to me. AFAIK, as the tables creation comes from catalog
>>>>>> "controller", they can "decide" the version. So, it would be each
>>>>>> catalog to deal with the way/version they want to create tables.
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On Wed, Jan 11, 2023 at 11:11 PM Gabor Kaszab <[email protected]>
>>>>>> wrote:
>>>>>> >
>>>>>> > Naively asking, can't we add some property to tell Iceberg which
>>>>>> version to use as default when creating tables? (If there is no such
>>>>>> setting currently)
>>>>>> >
>>>>>> > Gabor
>>>>>> >
>>>>>> > Jack Ye <[email protected]> ezt írta (időpont: 2023. jan. 11.,
>>>>>> Sze 20:04):
>>>>>> >>
>>>>>> >> Should we start a community vote on this?
>>>>>> >>
>>>>>> >> I remember in today's community sync meeting Russell briefly
>>>>>> discussed about some compaction supports that are not there yet and some
>>>>>> users are struggled with small delete files issue, and it was to some
>>>>>> extent why Spark is still defaulting v1.
>>>>>> >>
>>>>>> >> Regarding feature side, changelog scan is mostly there in Spark,
>>>>>> and there will also likely be movements on Trino side for it very soon.
>>>>>> >>
>>>>>> >> Overall, I think it would be beneficial to move default to v2,
>>>>>> which could incentivize the completion of those missing parts across
>>>>>> engines.
>>>>>> >>
>>>>>> >> Best,
>>>>>> >> Jack Ye
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> On Wed, Jan 11, 2023 at 5:47 AM Piotr Findeisen <
>>>>>> [email protected]> wrote:
>>>>>> >>>
>>>>>> >>> Hi,
>>>>>> >>>
>>>>>> >>> FWIW Trino already creates v2 tables by default.
>>>>>> >>> Thought it's worth sharing for context.
>>>>>> >>>
>>>>>> >>> Best
>>>>>> >>> PF
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>> On Tue, Jan 10, 2023 at 10:09 AM Manu Zhang <
>>>>>> [email protected]> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi all,
>>>>>> >>>>
>>>>>> >>>> We've maintained a forked Iceberg internally and all our use
>>>>>> cases involve v2 tables with row-level updates and deletes. Our users 
>>>>>> need
>>>>>> to remember to create table with the `'format-version'='2'` option or 
>>>>>> alter
>>>>>> table afterwards.
>>>>>> >>>>
>>>>>> >>>> I'm thinking about changing the default format-version of our
>>>>>> forked Iceberg to v2 . Is there any concern for this change? Any hidden
>>>>>> issues I've missed?
>>>>>> >>>>
>>>>>> >>>> Thanks,
>>>>>> >>>> Manu
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>

Reply via email to