Re: [DISCUSS] PIP-4 Support savepoint

Jingsong Li Sun, 28 May 2023 19:29:45 -0700

Thanks Zelin for the update.

## TAG ID


Is this useful? We have tag-name, snapshot-id, and now introducing a
tag id? What is used?

## Time Travel

SELECT * FROM t VERSION AS OF tag-name.<name>

This does not look like sql standard.

Why do we introduce this `tag-name` prefix?

## Tag class

Why not just use the Snapshot class? It looks like we don't need to
introduce Tag class. We can just copy the snapshot file to tag/.

## Expiring Snapshot

We should note that "record it in `DataFileMeta`" should be done
before "tag". And document version compatibility.
And why not record it in ManifestEntry?

Best,
Jingsong

On Fri, May 26, 2023 at 11:15 AM yu zelin <[email protected]> wrote:
>
> Hi, all,
>
> FYI, I have updated the PIP [1].
>
> Main changes:
> - Use new name `tag`
> - Enrich Motivation
> - New Section `Data Files Handling` to describe how to determine a data files 
> can be deleted.
>
> Best,
> Yu Zelin
>
> [1] https://cwiki.apache.org/confluence/x/NxE0Dw
>
> > 2023年5月24日 17:18，yu zelin <[email protected]> 写道：
> >
> > Hi, Guojun,
> >
> > I’d like to share my thoughts about your questions.
> >
> > 1. Expiration of savepoint
> > In my opinion, savepoints are created in a long interval, so there will not 
> > exist too many of them.
> > If users create a savepoint per day, there are 365 savepoints a year. So I 
> > didn’t consider expiration
> > of it, and I think provide a flink action like `delete-savepoint id = 1` is 
> > enough now.
> > But if it is really important, we can introduce table options to do so. I 
> > think we can do it like expiring
> > snapshots.
> >
> > 2. >   id of compacted snapshot picked by the savepoint
> > My initial idea is picking a compacted snapshot or doing compaction before 
> > creating savepoint. But
> > After discuss with Jingsong, I found it’s difficult. So now I suppose to 
> > directly create savepoint from
> > the given snapshot. Maybe we can optimize it later.
> > The changes will be updated soon.
> >> manifest file list in system-table
> > I think manifest file is not very important for users. Users can find when 
> > a savepoint is created, and
> > get the savepoint id, then they can query it from the savepoint by the id. 
> > I did’t see what scenario
> > the users need the manifest file information. What do you think?
> >
> > Best,
> > Yu Zelin
> >
> >> 2023年5月24日 10:50，Guojun Li <[email protected]> 写道：
> >>
> >> Thanks zelin for bringing up the discussion. I'm thinking about:
> >> 1. How to manage the savepoints if there are no expiration mechanism, by
> >> the TTL management of storages or external script?
> >> 2. I think the id of compacted snapshot picked by the savepoint and
> >> manifest file list is also important information for users, could these
> >> information be stored in the system-table?
> >>
> >> Best,
> >> Guojun
> >>
> >> On Mon, May 22, 2023 at 9:13 PM Jingsong Li <[email protected]> wrote:
> >>
> >>> FYI
> >>>
> >>> The PIP lacks a table to show Discussion thread & Vote thread & ISSUE...
> >>>
> >>> Best
> >>> Jingsong
> >>>
> >>> On Mon, May 22, 2023 at 4:48 PM yu zelin <[email protected]> wrote:
> >>>>
> >>>> Hi, all,
> >>>>
> >>>> Thank all of you for your suggestions and questions. After reading your
> >>> suggestions, I adopt some of them and I want to share my opinions here.
> >>>>
> >>>> To make my statements more clear, I will still use the word `savepoint`.
> >>> When we make a consensus, the name may be changed.
> >>>>
> >>>> 1. The purposes of savepoint
> >>>>
> >>>> As Shammon mentioned, Flink and database also have the concept of
> >>> `savepoint`. So it’s better to clarify the purposes of our savepoint.
> >>> Thanks for Nicholas and Jingsong, I think your explanations are very 
> >>> clear.
> >>> I’d like to give my summary:
> >>>>
> >>>> (1) Fault recovery (or we can say disaster recovery). Users can ROLL
> >>> BACK to a savepoint if needed. If user rollbacks to a savepoint, the table
> >>> will hold the data in the savepoint and the data committed  after the
> >>> savepoint will be deleted. In this scenario we need savepoint because
> >>> snapshots may have expired, the savepoint can keep longer and save user’s
> >>> old data.
> >>>>
> >>>> (2) Record versions of data at a longer interval (typically daily level
> >>> or weekly level). With savepoint, user can query the old data in batch
> >>> mode. Comparing to copy records to a new table or merge incremental 
> >>> records
> >>> with old records (like using merge into in Hive), the savepoint is more
> >>> lightweight because we don’t copy data files, we just record the meta data
> >>> of them.
> >>>>
> >>>> As you can see, savepoint is very similar to snapshot. The differences
> >>> are:
> >>>>
> >>>> (1) Savepoint lives longer. In most cases, snapshot’s life time is
> >>> about several minutes to hours. We suppose the savepoint can live several
> >>> days, weeks, or even months.
> >>>>
> >>>> (2) Savepoint is mainly used for batch reading for historical data. In
> >>> this PIP, we don’t introduce streaming reading for savepoint.
> >>>>
> >>>> 2. Candidates of name
> >>>>
> >>>> I agree with Jingsong that we can use a new name. Since the purpose and
> >>> mechanism (savepoint is very similar to snapshot) of savepoint is similar
> >>> to `tag` in iceberg, maybe we can use `tag`.
> >>>>
> >>>> In my opinion, an alternative is `anchor`. All the snapshots are like
> >>> the navigation path of the streaming data, and an `anchor` can stop it in 
> >>> a
> >>> place.
> >>>>
> >>>> 3. Public table operations and options
> >>>>
> >>>> We supposed to expose some operations and table options for user to
> >>> manage the savepoint.
> >>>>
> >>>> (1) Operations (Currently for Flink)
> >>>> We provide flink actions to manage savepoints:
> >>>>   create-savepoint: To generate a savepoint from latest snapshot.
> >>> Support to create from specified snapshot.
> >>>>   delete-savepoint: To delete specified savepoint.
> >>>>   rollback-to: To roll back to a specified savepoint.
> >>>>
> >>>> (2) Table options
> >>>> We suppose to provide options for creating savepoint periodically:
> >>>>   savepoint.create-time: When to create the savepoint. Example: 00:00
> >>>>   savepoint.create-interval: Interval between the creation of two
> >>> savepoints. Examples: 2 d.
> >>>>   savepoint.time-retained: The maximum time of savepoints to retain.
> >>>>
> >>>> (3) Procedures (future work)
> >>>> Spark supports SQL extension. After we support Spark CALL statement, we
> >>> can provide procedures to create, delete or rollback to savepoint for 
> >>> Spark
> >>> users.
> >>>>
> >>>> Support of CALL is on the road map of Flink. In future version, we can
> >>> also support savepoint-related procedures for Flink users.
> >>>>
> >>>> 4. Expiration of data files
> >>>>
> >>>> Currently, when a snapshot is expired, data files that not be used by
> >>> other snapshots. After we introduce the savepoint, we must make sure the
> >>> data files saved by savepoint will not be deleted.
> >>>>
> >>>> Conversely,  when a savepoint is deleted, the data files that not be
> >>> used by existing snapshots and other savepoints will be deleted.
> >>>>
> >>>> I have wrote some POC codes to implement it. I will update the mechanism
> >>> in PIP soon.
> >>>>
> >>>> Best,
> >>>> Yu Zelin
> >>>>
> >>>>> 2023年5月21日 20:54，Jingsong Li <[email protected]> 写道：
> >>>>>
> >>>>> Thanks Yun for your information.
> >>>>>
> >>>>> We need to be careful to avoid confusion between Paimon and Flink
> >>>>> concepts about "savepoint"
> >>>>>
> >>>>> Maybe we don't have to insist on using this "savepoint", for example,
> >>>>> TAG is also a candidate just like Iceberg [1]
> >>>>>
> >>>>> [1] https://iceberg.apache.org/docs/latest/branching/
> >>>>>
> >>>>> Best,
> >>>>> Jingsong
> >>>>>
> >>>>> On Sun, May 21, 2023 at 8:51 PM Jingsong Li <[email protected]>
> >>> wrote:
> >>>>>>
> >>>>>> Thanks Nicholas for your detailed requirements.
> >>>>>>
> >>>>>> We need to supplement user requirements in FLIP, which is mainly aimed
> >>>>>> at two purposes:
> >>>>>> 1. Fault recovery for data errors (named: restore or rollback-to)
> >>>>>> 2. Used to record versions at the day level (such as), targeting
> >>> batch queries
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong
> >>>>>>
> >>>>>> On Sat, May 20, 2023 at 2:55 PM Yun Tang <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi Guys,
> >>>>>>>
> >>>>>>> Since we use Paimon with Flink in most cases, I think we need to
> >>> identify the same word "savepoint" in different systems.
> >>>>>>>
> >>>>>>> For Flink, savepoint means:
> >>>>>>>
> >>>>>>> 1.  Triggered by users, not periodically triggered by the system
> >>> itself. However, this FLIP wants to support it created periodically.
> >>>>>>> 2.  Even the so-called incremental native savepoint [1], it will
> >>> not depend on the previous checkpoints or savepoints, it will still copy
> >>> files on DFS to the self-contained savepoint folder. However, from the
> >>> description of this FLIP about the deletion of expired snapshot files,
> >>> paimion savepoint will refer to the previously existing files directly.
> >>>>>>>
> >>>>>>> I don't think we need to make the semantics of Paimon totally the
> >>> same as Flink's. However, we need to introduce a table to tell the
> >>> difference compared with Flink and discuss about the difference.
> >>>>>>>
> >>>>>>> [1]
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
> >>>>>>>
> >>>>>>> Best
> >>>>>>> Yun Tang
> >>>>>>> ________________________________
> >>>>>>> From: Nicholas Jiang <[email protected]>
> >>>>>>> Sent: Friday, May 19, 2023 17:40
> >>>>>>> To: [email protected] <[email protected]>
> >>>>>>> Subject: Re: [DISCUSS] PIP-4 Support savepoint
> >>>>>>>
> >>>>>>> Hi Guys,
> >>>>>>>
> >>>>>>> Thanks Zelin for driving the savepoint proposal. I propose some
> >>> opinions for savepont:
> >>>>>>>
> >>>>>>> -- About "introduce savepoint for Paimon to persist full data in a
> >>> time point"
> >>>>>>>
> >>>>>>> The motivation of savepoint proposal is more like snapshot TTL
> >>> management. Actually, disaster recovery is very much mission critical for
> >>> any software. Especially when it comes to data systems, the impact could 
> >>> be
> >>> very serious leading to delay in business decisions or even wrong business
> >>> decisions at times. Savepoint is proposed to assist users in recovering
> >>> data from a previous state: "savepoint" and "restore".
> >>>>>>>
> >>>>>>> "savepoint" saves the Paimon table as of the commit time, therefore
> >>> if there is a savepoint, the data generated in the corresponding commit
> >>> could not be clean. Meanwhile, savepoint could let user restore the table
> >>> to this savepoint at a later point in time if need be. On similar lines,
> >>> savepoint cannot be triggered on a commit that is already cleaned up.
> >>> Savepoint is synonymous to taking a backup, just that we don't make a new
> >>> copy of the table, but just save the state of the table elegantly so that
> >>> we can restore it later when in need.
> >>>>>>>
> >>>>>>> "restore" lets you restore your table to one of the savepoint
> >>> commit. Meanwhile, it cannot be undone (or reversed) and so care should be
> >>> taken before doing a restore. At this time, Paimon would delete all data
> >>> files and commit files (timeline files) greater than the savepoint commit
> >>> to which the table is being restored.
> >>>>>>>
> >>>>>>> BTW, it's better to introduce snapshot view based on savepoint,
> >>> which could improve query performance of historical data for Paimon table.
> >>>>>>>
> >>>>>>> -- About Public API of savepont
> >>>>>>>
> >>>>>>> Current introduced savepoint interfaces in Public API are not enough
> >>> for users, for example, deleteSavepoint, restoreSavepoint etc.
> >>>>>>>
> >>>>>>> -- About "Paimon's savepoint need to be combined with Flink's
> >>> savepoint":
> >>>>>>>
> >>>>>>> If paimon supports savepoint mechanism and provides savepoint
> >>> interfaces, the integration with Flink's savepoint is not blocked for this
> >>> proposal.
> >>>>>>>
> >>>>>>> In summary, savepoint is not only used to improve the query
> >>> performance of historical data, but also used for disaster recovery
> >>> processing.
> >>>>>>>
> >>>>>>> On 2023/05/17 09:53:11 Jingsong Li wrote:
> >>>>>>>> What Shammon mentioned is interesting. I agree with what he said
> >>> about
> >>>>>>>> the differences in savepoints between databases and stream
> >>> computing.
> >>>>>>>>
> >>>>>>>> About "Paimon's savepoint need to be combined with Flink's
> >>> savepoint":
> >>>>>>>>
> >>>>>>>> I think it is possible, but we may need to deal with this in another
> >>>>>>>> mechanism, because the snapshots after savepoint may expire. We need
> >>>>>>>> to compare data between two savepoints to generate incremental data
> >>>>>>>> for streaming read.
> >>>>>>>>
> >>>>>>>> But this may not need to block FLIP, it looks like the current
> >>> design
> >>>>>>>> does not break the future combination?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jingsong
> >>>>>>>>
> >>>>>>>> On Wed, May 17, 2023 at 5:33 PM Shammon FY <[email protected]>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Caizhi,
> >>>>>>>>>
> >>>>>>>>> Thanks for your comments. As you mentioned, I think we may need to
> >>> discuss
> >>>>>>>>> the role of savepoint in Paimon.
> >>>>>>>>>
> >>>>>>>>> If I understand correctly, the main feature of savepoint in the
> >>> current PIP
> >>>>>>>>> is that the savepoint will not be expired, and users can perform a
> >>> query on
> >>>>>>>>> the savepoint according to time-travel. Besides that, there is
> >>> savepoint in
> >>>>>>>>> the database and Flink.
> >>>>>>>>>
> >>>>>>>>> 1. Savepoint in database. The database can roll back table data to
> >>> the
> >>>>>>>>> specified 'version' based on savepoint. So the key point of
> >>> savepoint in
> >>>>>>>>> the database is to rollback data.
> >>>>>>>>>
> >>>>>>>>> 2. Savepoint in Flink. Users can trigger a savepoint with a
> >>> specific
> >>>>>>>>> 'path', and save all data of state to the savepoint for job. Then
> >>> users can
> >>>>>>>>> create a new job based on the savepoint to continue consuming
> >>> incremental
> >>>>>>>>> data. I think the core capabilities are: backup for a job, and
> >>> resume a job
> >>>>>>>>> based on the savepoint.
> >>>>>>>>>
> >>>>>>>>> In addition to the above, Paimon may also face data write
> >>> corruption and
> >>>>>>>>> need to recover data based on the specified savepoint. So we may
> >>> need to
> >>>>>>>>> consider what abilities should Paimon savepoint need besides the
> >>> ones
> >>>>>>>>> mentioned in the current PIP?
> >>>>>>>>>
> >>>>>>>>> Additionally, as mentioned above, Flink also has
> >>>>>>>>> savepoint mechanism. During the process of streaming data from
> >>> Flink to
> >>>>>>>>> Paimon, does Paimon's savepoint need to be combined with Flink's
> >>> savepoint?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Shammon FY
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, May 17, 2023 at 4:02 PM Caizhi Weng <[email protected]>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi developers!
> >>>>>>>>>>
> >>>>>>>>>> Thanks Zelin for bringing up the discussion. The proposal seems
> >>> good to me
> >>>>>>>>>> overall. However I'd also like to bring up a few options.
> >>>>>>>>>>
> >>>>>>>>>> 1. As Jingsong mentioned, Savepoint class should not become a
> >>> public API,
> >>>>>>>>>> at least for now. What we need to discuss for the public API is
> >>> how the
> >>>>>>>>>> users can create or delete savepoints. For example, what the
> >>> table option
> >>>>>>>>>> looks like, what commands and options are provided for the Flink
> >>> action,
> >>>>>>>>>> etc.
> >>>>>>>>>>
> >>>>>>>>>> 2. Currently most Flink actions are related to streaming
> >>> processing, so
> >>>>>>>>>> only Flink can support them. However, savepoint creation and
> >>> deletion seems
> >>>>>>>>>> like a feature for batch processing. So aside from Flink actions,
> >>> shall we
> >>>>>>>>>> also provide something like Spark actions for savepoints?
> >>>>>>>>>>
> >>>>>>>>>> I would also like to comment on Shammon's views.
> >>>>>>>>>>
> >>>>>>>>>> Should we introduce an option for savepoint path which may be
> >>> different
> >>>>>>>>>>> from 'warehouse'? Then users can backup the data of savepoint.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I don't see this is necessary. To backup a table the user just
> >>> need to copy
> >>>>>>>>>> all files from the table directory. Savepoint in Paimon, as far
> >>> as I
> >>>>>>>>>> understand, is mainly for users to review historical data, not
> >>> for backing
> >>>>>>>>>> up tables.
> >>>>>>>>>>
> >>>>>>>>>> Will the savepoint copy data files from snapshot or only save
> >>> meta files?
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> It would be a heavy burden if a savepoint copies all its files.
> >>> As I
> >>>>>>>>>> mentioned above, savepoint is not for backing up tables.
> >>>>>>>>>>
> >>>>>>>>>> How can users create a new table and restore data from the
> >>> specified
> >>>>>>>>>>> savepoint?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> This reminds me of savepoints in Flink. Still, savepoint is not
> >>> for backing
> >>>>>>>>>> up tables so I guess we don't need to support "restoring data"
> >>> from a
> >>>>>>>>>> savepoint.
> >>>>>>>>>>
> >>>>>>>>>> Shammon FY <[email protected]> 于2023年5月17日周三 10:32写道：
> >>>>>>>>>>
> >>>>>>>>>>> Thanks Zelin for initiating this discussion. I have some
> >>> comments:
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Should we introduce an option for savepoint path which may be
> >>>>>>>>>> different
> >>>>>>>>>>> from 'warehouse'? Then users can backup the data of savepoint.
> >>>>>>>>>>>
> >>>>>>>>>>> 2. Will the savepoint copy data files from snapshot or only save
> >>> meta
> >>>>>>>>>>> files? The description in the PIP "After we introduce savepoint,
> >>> we
> >>>>>>>>>> should
> >>>>>>>>>>> also check if the data files are used by savepoints." looks like
> >>> we only
> >>>>>>>>>>> save meta files for savepoint.
> >>>>>>>>>>>
> >>>>>>>>>>> 3. How can users create a new table and restore data from the
> >>> specified
> >>>>>>>>>>> savepoint?
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Shammon FY
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, May 17, 2023 at 10:19 AM Jingsong Li <
> >>> [email protected]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks Zelin for driving.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Some comments:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. I think it's possible to advance `Proposed Changes` to the
> >>> top,
> >>>>>>>>>>>> Public API has no meaning if I don't know how to do it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. Public API, Savepoint and SavepointManager are not Public
> >>> API, only
> >>>>>>>>>>>> Flink action or configuration option should be public API.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3.Maybe we can have a separate chapter to describe
> >>>>>>>>>>>> `savepoint.create-interval`, maybe 'Periodically savepoint'? It
> >>> is not
> >>>>>>>>>>>> just an interval, because the true user case is savepoint after
> >>> 0:00.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 4.About 'Interaction with Snapshot', to be continued ...
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Jingsong
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, May 16, 2023 at 7:07 PM yu zelin <[email protected]
> >>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi, Paimon Devs,
> >>>>>>>>>>>>>   I’d like to start a discussion about PIP-4[1]. In this
> >>> PIP, I
> >>>>>>>>>> want
> >>>>>>>>>>>> to talk about why we need savepoint, and some thoughts about
> >>> managing
> >>>>>>>>>> and
> >>>>>>>>>>>> using savepoint. Look forward to your question and suggestions.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Yu Zelin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/NxE0Dw
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>
> >>>
> >
>

Re: [DISCUSS] PIP-4 Support savepoint

Reply via email to