Hi,

This PIP is missing a very important link, how to maintain
compatibility? What is behavior for old tags?

Please note, Paimon is a storage system, and we need to constantly
maintain its previous compatibility (even some backward compatibility)
during design.

Best,
Jingsong

On Wed, Apr 3, 2024 at 4:42 PM wj wang <[email protected]> wrote:
>
> Thank you very much for your valuable suggestions.
> Next, I will provide a PR.
>
> On Wed, Apr 3, 2024 at 4:18 PM yu zelin <[email protected]> wrote:
>
> > Agree.
> >
> > On Wed, Apr 3, 2024 at 11:14 AM Jingsong Li <[email protected]>
> > wrote:
> >
> > > > reuse the 'Snapshot#timeMillis' field
> > >
> > > Don't do this, tag is just snapshot reference, it cannot alter snapshot
> > > fields.
> > >
> > > >  the TTL has higher priority
> > >
> > > We should maintain a behavior similar to snapshot expiration, as long
> > > as one of the conditions hits, then delete it without setting any
> > > priority
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Wed, Apr 3, 2024 at 10:33 AM wj wang <[email protected]> wrote:
> > > >
> > > > Thanks jingsong li and yu zelin for reply.
> > > >
> > > > > I think it's similar to the snapshot expire, where both the number
> > and
> > > > time are used to determine whether it should be deleted. This is
> > > > reasonable, and the hit should be deleted.
> > > >
> > > > OK, I will do.
> > > >
> > > >
> > > > > Java API 'createTag': Use 'Duration' as parameter instead of
> > 'String'.
> > > I
> > > > think it's better.
> > > >
> > > > OK
> > > >
> > > >
> > > > > For the field 'tagCreateTime' in class 'Tag': I think we can just use
> > > the
> > > > 'Snapshot#timeMillis' field. The 'timeMillis' is the create time of the
> > > > snapshot, I think the time won't be used when we read the corresponding
> > > > tag. So I think we can just reuse the field, what do you think? And if
> > do
> > > > so,
> > > >
> > > > I think it's possible to reuse the 'Snapshot#timeMillis' field for auto
> > > > created tags, but I don't think 'Snapshot#timeMillis' field can be used
> > > for
> > > > non-auto created tags.
> > > > what do you think?
> > > >
> > > >
> > > > > in the tags system table, 'commit_time' can be renamed to
> > 'create_time'
> > > > or 'tag_create_time' or other name.
> > > >
> > > > I think create-time and time-retained is good.
> > > >
> > > >
> > > > > Should we add TTL to auto-created tags? I think we should. Users can
> > > set
> > > > the same TTL for all auto-created tags by table options.My suggestion
> > of
> > > > how to handle `tag.num-retained-max` and TTL is: the TTL has higher
> > > > priority. When we try to expire an auto-created tag, we first found
> > > > candidates by `tag.num-retained-max`, then if the candidate's survival
> > > time
> > > > is less than TTL, we don't expire it.
> > > >
> > > > OK, I will do.
> > > >
> > > >
> > > >
> > > > On Tue, Apr 2, 2024 at 5:26 PM yu zelin <[email protected]> wrote:
> > > >
> > > > > Thanks wj for driving this! I'd like to give some inputs:
> > > > >
> > > > > 1. Java API 'createTag': Use 'Duration' as parameter instead of
> > > 'String'. I
> > > > > think it's better.
> > > > >
> > > > > 2. For the field 'tagCreateTime' in class 'Tag': I think we can just
> > > use
> > > > > the 'Snapshot#timeMillis' field.
> > > > > The 'timeMillis' is the create time of the snapshot, I think the time
> > > won't
> > > > > be used when we read
> > > > >  the corresponding tag. So I think we can just reuse the field, what
> > > do you
> > > > > think? And if do so,
> > > > > in the tags system table, 'commit_time' can be renamed to
> > > 'create_time' or
> > > > > 'tag_create_time' or
> > > > > other name.
> > > > >
> > > > > 3. Should we add TTL to auto-created tags? I think we should. Users
> > > can set
> > > > > the same TTL for
> > > > > all auto-created tags by table options.My suggestion of how to handle
> > > > > `tag.num-retained-max`
> > > > > and TTL is: the TTL has higher priority. When we try to expire
> > > auto-created
> > > > > tag, we first found
> > > > > candidates by `tag.num-retained-max`, then if the candidate's
> > survival
> > > time
> > > > > is less than TTL, we
> > > > > don't expire it.
> > > > >
> > > > > Best regards,
> > > > > Zelin Yu
> > > > >
> > > > >
> > > > > On Mon, Apr 1, 2024 at 9:54 AM <[email protected]> wrote:
> > > > >
> > > > > > Hi devs:
> > > > > >
> > > > > > I would like to start a discussion of PIP-20: Introduce TTL for
> > tags
> > > > > which
> > > > > > are not auto-created. [1]. Currently, Paimon has automatic clearing
> > > > > > mechanisms for tags created by TagAutoCreation, but not for other
> > > tags.
> > > > > It
> > > > > > can't meet our demands.For example:1、The current tag cleanup
> > > mechanism
> > > > > may
> > > > > > lead to resource-wasting.2、Tag does not support TTL, so it is not
> > > > > flexible
> > > > > > to use.
> > > > > > This PIP aims to
> > > > > > support each Tag has its own TTL, so that the user can use the tag
> > > more
> > > > > flexibly and reduce the probability of resource waste.And
> > > > > > Paimon keep up with other data lake products such as Iceberg.
> > > > > > Looking forward to your feedback, thanks.
> > > > > > [1]
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=300026341
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > wangwj
> > > > >
> > >
> >

Reply via email to