Re: [DISCUSS] PIP-4 Support savepoint

yu zelin Thu, 25 May 2023 20:15:35 -0700

Hi, all,

FYI, I have updated the PIP [1].


Main changes:
- Use new name `tag`
- Enrich Motivation
- New Section `Data Files Handling` to describe how to determine a data files 
can be deleted.

Best,
Yu Zelin

[1] https://cwiki.apache.org/confluence/x/NxE0Dw

> 2023年5月24日 17:18，yu zelin <[email protected]> 写道：
> 
> Hi, Guojun,
> 
> I’d like to share my thoughts about your questions.
> 
> 1. Expiration of savepoint
> In my opinion, savepoints are created in a long interval, so there will not 
> exist too many of them.
> If users create a savepoint per day, there are 365 savepoints a year. So I 
> didn’t consider expiration 
> of it, and I think provide a flink action like `delete-savepoint id = 1` is 
> enough now. 
> But if it is really important, we can introduce table options to do so. I 
> think we can do it like expiring 
> snapshots.
> 
> 2. >   id of compacted snapshot picked by the savepoint
> My initial idea is picking a compacted snapshot or doing compaction before 
> creating savepoint. But 
> After discuss with Jingsong, I found it’s difficult. So now I suppose to 
> directly create savepoint from 
> the given snapshot. Maybe we can optimize it later.
> The changes will be updated soon.
>> manifest file list in system-table
> I think manifest file is not very important for users. Users can find when a 
> savepoint is created, and 
> get the savepoint id, then they can query it from the savepoint by the id. I 
> did’t see what scenario 
> the users need the manifest file information. What do you think?
> 
> Best, 
> Yu Zelin
> 
>> 2023年5月24日 10:50，Guojun Li <[email protected]> 写道：
>> 
>> Thanks zelin for bringing up the discussion. I'm thinking about:
>> 1. How to manage the savepoints if there are no expiration mechanism, by
>> the TTL management of storages or external script?
>> 2. I think the id of compacted snapshot picked by the savepoint and
>> manifest file list is also important information for users, could these
>> information be stored in the system-table?
>> 
>> Best,
>> Guojun
>> 
>> On Mon, May 22, 2023 at 9:13 PM Jingsong Li <[email protected]> wrote:
>> 
>>> FYI
>>> 
>>> The PIP lacks a table to show Discussion thread & Vote thread & ISSUE...
>>> 
>>> Best
>>> Jingsong
>>> 
>>> On Mon, May 22, 2023 at 4:48 PM yu zelin <[email protected]> wrote:
>>>> 
>>>> Hi, all,
>>>> 
>>>> Thank all of you for your suggestions and questions. After reading your
>>> suggestions, I adopt some of them and I want to share my opinions here.
>>>> 
>>>> To make my statements more clear, I will still use the word `savepoint`.
>>> When we make a consensus, the name may be changed.
>>>> 
>>>> 1. The purposes of savepoint
>>>> 
>>>> As Shammon mentioned, Flink and database also have the concept of
>>> `savepoint`. So it’s better to clarify the purposes of our savepoint.
>>> Thanks for Nicholas and Jingsong, I think your explanations are very clear.
>>> I’d like to give my summary:
>>>> 
>>>> (1) Fault recovery (or we can say disaster recovery). Users can ROLL
>>> BACK to a savepoint if needed. If user rollbacks to a savepoint, the table
>>> will hold the data in the savepoint and the data committed  after the
>>> savepoint will be deleted. In this scenario we need savepoint because
>>> snapshots may have expired, the savepoint can keep longer and save user’s
>>> old data.
>>>> 
>>>> (2) Record versions of data at a longer interval (typically daily level
>>> or weekly level). With savepoint, user can query the old data in batch
>>> mode. Comparing to copy records to a new table or merge incremental records
>>> with old records (like using merge into in Hive), the savepoint is more
>>> lightweight because we don’t copy data files, we just record the meta data
>>> of them.
>>>> 
>>>> As you can see, savepoint is very similar to snapshot. The differences
>>> are:
>>>> 
>>>> (1) Savepoint lives longer. In most cases, snapshot’s life time is
>>> about several minutes to hours. We suppose the savepoint can live several
>>> days, weeks, or even months.
>>>> 
>>>> (2) Savepoint is mainly used for batch reading for historical data. In
>>> this PIP, we don’t introduce streaming reading for savepoint.
>>>> 
>>>> 2. Candidates of name
>>>> 
>>>> I agree with Jingsong that we can use a new name. Since the purpose and
>>> mechanism (savepoint is very similar to snapshot) of savepoint is similar
>>> to `tag` in iceberg, maybe we can use `tag`.
>>>> 
>>>> In my opinion, an alternative is `anchor`. All the snapshots are like
>>> the navigation path of the streaming data, and an `anchor` can stop it in a
>>> place.
>>>> 
>>>> 3. Public table operations and options
>>>> 
>>>> We supposed to expose some operations and table options for user to
>>> manage the savepoint.
>>>> 
>>>> (1) Operations (Currently for Flink)
>>>> We provide flink actions to manage savepoints:
>>>>   create-savepoint: To generate a savepoint from latest snapshot.
>>> Support to create from specified snapshot.
>>>>   delete-savepoint: To delete specified savepoint.
>>>>   rollback-to: To roll back to a specified savepoint.
>>>> 
>>>> (2) Table options
>>>> We suppose to provide options for creating savepoint periodically:
>>>>   savepoint.create-time: When to create the savepoint. Example: 00:00
>>>>   savepoint.create-interval: Interval between the creation of two
>>> savepoints. Examples: 2 d.
>>>>   savepoint.time-retained: The maximum time of savepoints to retain.
>>>> 
>>>> (3) Procedures (future work)
>>>> Spark supports SQL extension. After we support Spark CALL statement, we
>>> can provide procedures to create, delete or rollback to savepoint for Spark
>>> users.
>>>> 
>>>> Support of CALL is on the road map of Flink. In future version, we can
>>> also support savepoint-related procedures for Flink users.
>>>> 
>>>> 4. Expiration of data files
>>>> 
>>>> Currently, when a snapshot is expired, data files that not be used by
>>> other snapshots. After we introduce the savepoint, we must make sure the
>>> data files saved by savepoint will not be deleted.
>>>> 
>>>> Conversely,  when a savepoint is deleted, the data files that not be
>>> used by existing snapshots and other savepoints will be deleted.
>>>> 
>>>> I have wrote some POC codes to implement it. I will update the mechanism
>>> in PIP soon.
>>>> 
>>>> Best,
>>>> Yu Zelin
>>>> 
>>>>> 2023年5月21日 20:54，Jingsong Li <[email protected]> 写道：
>>>>> 
>>>>> Thanks Yun for your information.
>>>>> 
>>>>> We need to be careful to avoid confusion between Paimon and Flink
>>>>> concepts about "savepoint"
>>>>> 
>>>>> Maybe we don't have to insist on using this "savepoint", for example,
>>>>> TAG is also a candidate just like Iceberg [1]
>>>>> 
>>>>> [1] https://iceberg.apache.org/docs/latest/branching/
>>>>> 
>>>>> Best,
>>>>> Jingsong
>>>>> 
>>>>> On Sun, May 21, 2023 at 8:51 PM Jingsong Li <[email protected]>
>>> wrote:
>>>>>> 
>>>>>> Thanks Nicholas for your detailed requirements.
>>>>>> 
>>>>>> We need to supplement user requirements in FLIP, which is mainly aimed
>>>>>> at two purposes:
>>>>>> 1. Fault recovery for data errors (named: restore or rollback-to)
>>>>>> 2. Used to record versions at the day level (such as), targeting
>>> batch queries
>>>>>> 
>>>>>> Best,
>>>>>> Jingsong
>>>>>> 
>>>>>> On Sat, May 20, 2023 at 2:55 PM Yun Tang <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi Guys,
>>>>>>> 
>>>>>>> Since we use Paimon with Flink in most cases, I think we need to
>>> identify the same word "savepoint" in different systems.
>>>>>>> 
>>>>>>> For Flink, savepoint means:
>>>>>>> 
>>>>>>> 1.  Triggered by users, not periodically triggered by the system
>>> itself. However, this FLIP wants to support it created periodically.
>>>>>>> 2.  Even the so-called incremental native savepoint [1], it will
>>> not depend on the previous checkpoints or savepoints, it will still copy
>>> files on DFS to the self-contained savepoint folder. However, from the
>>> description of this FLIP about the deletion of expired snapshot files,
>>> paimion savepoint will refer to the previously existing files directly.
>>>>>>> 
>>>>>>> I don't think we need to make the semantics of Paimon totally the
>>> same as Flink's. However, we need to introduce a table to tell the
>>> difference compared with Flink and discuss about the difference.
>>>>>>> 
>>>>>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
>>>>>>> 
>>>>>>> Best
>>>>>>> Yun Tang
>>>>>>> ________________________________
>>>>>>> From: Nicholas Jiang <[email protected]>
>>>>>>> Sent: Friday, May 19, 2023 17:40
>>>>>>> To: [email protected] <[email protected]>
>>>>>>> Subject: Re: [DISCUSS] PIP-4 Support savepoint
>>>>>>> 
>>>>>>> Hi Guys,
>>>>>>> 
>>>>>>> Thanks Zelin for driving the savepoint proposal. I propose some
>>> opinions for savepont:
>>>>>>> 
>>>>>>> -- About "introduce savepoint for Paimon to persist full data in a
>>> time point"
>>>>>>> 
>>>>>>> The motivation of savepoint proposal is more like snapshot TTL
>>> management. Actually, disaster recovery is very much mission critical for
>>> any software. Especially when it comes to data systems, the impact could be
>>> very serious leading to delay in business decisions or even wrong business
>>> decisions at times. Savepoint is proposed to assist users in recovering
>>> data from a previous state: "savepoint" and "restore".
>>>>>>> 
>>>>>>> "savepoint" saves the Paimon table as of the commit time, therefore
>>> if there is a savepoint, the data generated in the corresponding commit
>>> could not be clean. Meanwhile, savepoint could let user restore the table
>>> to this savepoint at a later point in time if need be. On similar lines,
>>> savepoint cannot be triggered on a commit that is already cleaned up.
>>> Savepoint is synonymous to taking a backup, just that we don't make a new
>>> copy of the table, but just save the state of the table elegantly so that
>>> we can restore it later when in need.
>>>>>>> 
>>>>>>> "restore" lets you restore your table to one of the savepoint
>>> commit. Meanwhile, it cannot be undone (or reversed) and so care should be
>>> taken before doing a restore. At this time, Paimon would delete all data
>>> files and commit files (timeline files) greater than the savepoint commit
>>> to which the table is being restored.
>>>>>>> 
>>>>>>> BTW, it's better to introduce snapshot view based on savepoint,
>>> which could improve query performance of historical data for Paimon table.
>>>>>>> 
>>>>>>> -- About Public API of savepont
>>>>>>> 
>>>>>>> Current introduced savepoint interfaces in Public API are not enough
>>> for users, for example, deleteSavepoint, restoreSavepoint etc.
>>>>>>> 
>>>>>>> -- About "Paimon's savepoint need to be combined with Flink's
>>> savepoint":
>>>>>>> 
>>>>>>> If paimon supports savepoint mechanism and provides savepoint
>>> interfaces, the integration with Flink's savepoint is not blocked for this
>>> proposal.
>>>>>>> 
>>>>>>> In summary, savepoint is not only used to improve the query
>>> performance of historical data, but also used for disaster recovery
>>> processing.
>>>>>>> 
>>>>>>> On 2023/05/17 09:53:11 Jingsong Li wrote:
>>>>>>>> What Shammon mentioned is interesting. I agree with what he said
>>> about
>>>>>>>> the differences in savepoints between databases and stream
>>> computing.
>>>>>>>> 
>>>>>>>> About "Paimon's savepoint need to be combined with Flink's
>>> savepoint":
>>>>>>>> 
>>>>>>>> I think it is possible, but we may need to deal with this in another
>>>>>>>> mechanism, because the snapshots after savepoint may expire. We need
>>>>>>>> to compare data between two savepoints to generate incremental data
>>>>>>>> for streaming read.
>>>>>>>> 
>>>>>>>> But this may not need to block FLIP, it looks like the current
>>> design
>>>>>>>> does not break the future combination?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jingsong
>>>>>>>> 
>>>>>>>> On Wed, May 17, 2023 at 5:33 PM Shammon FY <[email protected]>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Caizhi,
>>>>>>>>> 
>>>>>>>>> Thanks for your comments. As you mentioned, I think we may need to
>>> discuss
>>>>>>>>> the role of savepoint in Paimon.
>>>>>>>>> 
>>>>>>>>> If I understand correctly, the main feature of savepoint in the
>>> current PIP
>>>>>>>>> is that the savepoint will not be expired, and users can perform a
>>> query on
>>>>>>>>> the savepoint according to time-travel. Besides that, there is
>>> savepoint in
>>>>>>>>> the database and Flink.
>>>>>>>>> 
>>>>>>>>> 1. Savepoint in database. The database can roll back table data to
>>> the
>>>>>>>>> specified 'version' based on savepoint. So the key point of
>>> savepoint in
>>>>>>>>> the database is to rollback data.
>>>>>>>>> 
>>>>>>>>> 2. Savepoint in Flink. Users can trigger a savepoint with a
>>> specific
>>>>>>>>> 'path', and save all data of state to the savepoint for job. Then
>>> users can
>>>>>>>>> create a new job based on the savepoint to continue consuming
>>> incremental
>>>>>>>>> data. I think the core capabilities are: backup for a job, and
>>> resume a job
>>>>>>>>> based on the savepoint.
>>>>>>>>> 
>>>>>>>>> In addition to the above, Paimon may also face data write
>>> corruption and
>>>>>>>>> need to recover data based on the specified savepoint. So we may
>>> need to
>>>>>>>>> consider what abilities should Paimon savepoint need besides the
>>> ones
>>>>>>>>> mentioned in the current PIP?
>>>>>>>>> 
>>>>>>>>> Additionally, as mentioned above, Flink also has
>>>>>>>>> savepoint mechanism. During the process of streaming data from
>>> Flink to
>>>>>>>>> Paimon, does Paimon's savepoint need to be combined with Flink's
>>> savepoint?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Shammon FY
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, May 17, 2023 at 4:02 PM Caizhi Weng <[email protected]>
>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi developers!
>>>>>>>>>> 
>>>>>>>>>> Thanks Zelin for bringing up the discussion. The proposal seems
>>> good to me
>>>>>>>>>> overall. However I'd also like to bring up a few options.
>>>>>>>>>> 
>>>>>>>>>> 1. As Jingsong mentioned, Savepoint class should not become a
>>> public API,
>>>>>>>>>> at least for now. What we need to discuss for the public API is
>>> how the
>>>>>>>>>> users can create or delete savepoints. For example, what the
>>> table option
>>>>>>>>>> looks like, what commands and options are provided for the Flink
>>> action,
>>>>>>>>>> etc.
>>>>>>>>>> 
>>>>>>>>>> 2. Currently most Flink actions are related to streaming
>>> processing, so
>>>>>>>>>> only Flink can support them. However, savepoint creation and
>>> deletion seems
>>>>>>>>>> like a feature for batch processing. So aside from Flink actions,
>>> shall we
>>>>>>>>>> also provide something like Spark actions for savepoints?
>>>>>>>>>> 
>>>>>>>>>> I would also like to comment on Shammon's views.
>>>>>>>>>> 
>>>>>>>>>> Should we introduce an option for savepoint path which may be
>>> different
>>>>>>>>>>> from 'warehouse'? Then users can backup the data of savepoint.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I don't see this is necessary. To backup a table the user just
>>> need to copy
>>>>>>>>>> all files from the table directory. Savepoint in Paimon, as far
>>> as I
>>>>>>>>>> understand, is mainly for users to review historical data, not
>>> for backing
>>>>>>>>>> up tables.
>>>>>>>>>> 
>>>>>>>>>> Will the savepoint copy data files from snapshot or only save
>>> meta files?
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> It would be a heavy burden if a savepoint copies all its files.
>>> As I
>>>>>>>>>> mentioned above, savepoint is not for backing up tables.
>>>>>>>>>> 
>>>>>>>>>> How can users create a new table and restore data from the
>>> specified
>>>>>>>>>>> savepoint?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> This reminds me of savepoints in Flink. Still, savepoint is not
>>> for backing
>>>>>>>>>> up tables so I guess we don't need to support "restoring data"
>>> from a
>>>>>>>>>> savepoint.
>>>>>>>>>> 
>>>>>>>>>> Shammon FY <[email protected]> 于2023年5月17日周三 10:32写道：
>>>>>>>>>> 
>>>>>>>>>>> Thanks Zelin for initiating this discussion. I have some
>>> comments:
>>>>>>>>>>> 
>>>>>>>>>>> 1. Should we introduce an option for savepoint path which may be
>>>>>>>>>> different
>>>>>>>>>>> from 'warehouse'? Then users can backup the data of savepoint.
>>>>>>>>>>> 
>>>>>>>>>>> 2. Will the savepoint copy data files from snapshot or only save
>>> meta
>>>>>>>>>>> files? The description in the PIP "After we introduce savepoint,
>>> we
>>>>>>>>>> should
>>>>>>>>>>> also check if the data files are used by savepoints." looks like
>>> we only
>>>>>>>>>>> save meta files for savepoint.
>>>>>>>>>>> 
>>>>>>>>>>> 3. How can users create a new table and restore data from the
>>> specified
>>>>>>>>>>> savepoint?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Shammon FY
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, May 17, 2023 at 10:19 AM Jingsong Li <
>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Zelin for driving.
>>>>>>>>>>>> 
>>>>>>>>>>>> Some comments:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. I think it's possible to advance `Proposed Changes` to the
>>> top,
>>>>>>>>>>>> Public API has no meaning if I don't know how to do it.
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. Public API, Savepoint and SavepointManager are not Public
>>> API, only
>>>>>>>>>>>> Flink action or configuration option should be public API.
>>>>>>>>>>>> 
>>>>>>>>>>>> 3.Maybe we can have a separate chapter to describe
>>>>>>>>>>>> `savepoint.create-interval`, maybe 'Periodically savepoint'? It
>>> is not
>>>>>>>>>>>> just an interval, because the true user case is savepoint after
>>> 0:00.
>>>>>>>>>>>> 
>>>>>>>>>>>> 4.About 'Interaction with Snapshot', to be continued ...
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jingsong
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, May 16, 2023 at 7:07 PM yu zelin <[email protected]
>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi, Paimon Devs,
>>>>>>>>>>>>>   I’d like to start a discussion about PIP-4[1]. In this
>>> PIP, I
>>>>>>>>>> want
>>>>>>>>>>>> to talk about why we need savepoint, and some thoughts about
>>> managing
>>>>>>>>>> and
>>>>>>>>>>>> using savepoint. Look forward to your question and suggestions.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Yu Zelin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/NxE0Dw
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>> 
>>> 
>

Re: [DISCUSS] PIP-4 Support savepoint

Reply via email to