Thanks wj for driving this! I'd like to give some inputs:

1. Java API 'createTag': Use 'Duration' as parameter instead of 'String'. I
think it's better.

2. For the field 'tagCreateTime' in class 'Tag': I think we can just use
the 'Snapshot#timeMillis' field.
The 'timeMillis' is the create time of the snapshot, I think the time won't
be used when we read
 the corresponding tag. So I think we can just reuse the field, what do you
think? And if do so,
in the tags system table, 'commit_time' can be renamed to 'create_time' or
'tag_create_time' or
other name.

3. Should we add TTL to auto-created tags? I think we should. Users can set
the same TTL for
all auto-created tags by table options.My suggestion of how to handle
`tag.num-retained-max`
and TTL is: the TTL has higher priority. When we try to expire auto-created
tag, we first found
candidates by `tag.num-retained-max`, then if the candidate's survival time
is less than TTL, we
don't expire it.

Best regards,
Zelin Yu


On Mon, Apr 1, 2024 at 9:54 AM <[email protected]> wrote:

> Hi devs:
>
> I would like to start a discussion of PIP-20: Introduce TTL for tags which
> are not auto-created. [1]. Currently, Paimon has automatic clearing
> mechanisms for tags created by TagAutoCreation, but not for other tags. It
> can't meet our demands.For example:1、The current tag cleanup mechanism may
> lead to resource-wasting.2、Tag does not support TTL, so it is not flexible
> to use.
> This PIP aims to
> support each Tag has its own TTL, so that the user can use the tag more 
> flexibly and reduce the probability of resource waste.And
> Paimon keep up with other data lake products such as Iceberg.
> Looking forward to your feedback, thanks.
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=300026341
>
>
> Best,
> wangwj

Reply via email to