Thanks Zelin for the update. ## TAG ID
Is this useful? We have tag-name, snapshot-id, and now introducing a tag id? What is used? ## Time Travel SELECT * FROM t VERSION AS OF tag-name.<name> This does not look like sql standard. Why do we introduce this `tag-name` prefix? ## Tag class Why not just use the Snapshot class? It looks like we don't need to introduce Tag class. We can just copy the snapshot file to tag/. ## Expiring Snapshot We should note that "record it in `DataFileMeta`" should be done before "tag". And document version compatibility. And why not record it in ManifestEntry? Best, Jingsong On Fri, May 26, 2023 at 11:15 AM yu zelin <[email protected]> wrote: > > Hi, all, > > FYI, I have updated the PIP [1]. > > Main changes: > - Use new name `tag` > - Enrich Motivation > - New Section `Data Files Handling` to describe how to determine a data files > can be deleted. > > Best, > Yu Zelin > > [1] https://cwiki.apache.org/confluence/x/NxE0Dw > > > 2023年5月24日 17:18,yu zelin <[email protected]> 写道: > > > > Hi, Guojun, > > > > I’d like to share my thoughts about your questions. > > > > 1. Expiration of savepoint > > In my opinion, savepoints are created in a long interval, so there will not > > exist too many of them. > > If users create a savepoint per day, there are 365 savepoints a year. So I > > didn’t consider expiration > > of it, and I think provide a flink action like `delete-savepoint id = 1` is > > enough now. > > But if it is really important, we can introduce table options to do so. I > > think we can do it like expiring > > snapshots. > > > > 2. > id of compacted snapshot picked by the savepoint > > My initial idea is picking a compacted snapshot or doing compaction before > > creating savepoint. But > > After discuss with Jingsong, I found it’s difficult. So now I suppose to > > directly create savepoint from > > the given snapshot. Maybe we can optimize it later. > > The changes will be updated soon. > >> manifest file list in system-table > > I think manifest file is not very important for users. Users can find when > > a savepoint is created, and > > get the savepoint id, then they can query it from the savepoint by the id. > > I did’t see what scenario > > the users need the manifest file information. What do you think? > > > > Best, > > Yu Zelin > > > >> 2023年5月24日 10:50,Guojun Li <[email protected]> 写道: > >> > >> Thanks zelin for bringing up the discussion. I'm thinking about: > >> 1. How to manage the savepoints if there are no expiration mechanism, by > >> the TTL management of storages or external script? > >> 2. I think the id of compacted snapshot picked by the savepoint and > >> manifest file list is also important information for users, could these > >> information be stored in the system-table? > >> > >> Best, > >> Guojun > >> > >> On Mon, May 22, 2023 at 9:13 PM Jingsong Li <[email protected]> wrote: > >> > >>> FYI > >>> > >>> The PIP lacks a table to show Discussion thread & Vote thread & ISSUE... > >>> > >>> Best > >>> Jingsong > >>> > >>> On Mon, May 22, 2023 at 4:48 PM yu zelin <[email protected]> wrote: > >>>> > >>>> Hi, all, > >>>> > >>>> Thank all of you for your suggestions and questions. After reading your > >>> suggestions, I adopt some of them and I want to share my opinions here. > >>>> > >>>> To make my statements more clear, I will still use the word `savepoint`. > >>> When we make a consensus, the name may be changed. > >>>> > >>>> 1. The purposes of savepoint > >>>> > >>>> As Shammon mentioned, Flink and database also have the concept of > >>> `savepoint`. So it’s better to clarify the purposes of our savepoint. > >>> Thanks for Nicholas and Jingsong, I think your explanations are very > >>> clear. > >>> I’d like to give my summary: > >>>> > >>>> (1) Fault recovery (or we can say disaster recovery). Users can ROLL > >>> BACK to a savepoint if needed. If user rollbacks to a savepoint, the table > >>> will hold the data in the savepoint and the data committed after the > >>> savepoint will be deleted. In this scenario we need savepoint because > >>> snapshots may have expired, the savepoint can keep longer and save user’s > >>> old data. > >>>> > >>>> (2) Record versions of data at a longer interval (typically daily level > >>> or weekly level). With savepoint, user can query the old data in batch > >>> mode. Comparing to copy records to a new table or merge incremental > >>> records > >>> with old records (like using merge into in Hive), the savepoint is more > >>> lightweight because we don’t copy data files, we just record the meta data > >>> of them. > >>>> > >>>> As you can see, savepoint is very similar to snapshot. The differences > >>> are: > >>>> > >>>> (1) Savepoint lives longer. In most cases, snapshot’s life time is > >>> about several minutes to hours. We suppose the savepoint can live several > >>> days, weeks, or even months. > >>>> > >>>> (2) Savepoint is mainly used for batch reading for historical data. In > >>> this PIP, we don’t introduce streaming reading for savepoint. > >>>> > >>>> 2. Candidates of name > >>>> > >>>> I agree with Jingsong that we can use a new name. Since the purpose and > >>> mechanism (savepoint is very similar to snapshot) of savepoint is similar > >>> to `tag` in iceberg, maybe we can use `tag`. > >>>> > >>>> In my opinion, an alternative is `anchor`. All the snapshots are like > >>> the navigation path of the streaming data, and an `anchor` can stop it in > >>> a > >>> place. > >>>> > >>>> 3. Public table operations and options > >>>> > >>>> We supposed to expose some operations and table options for user to > >>> manage the savepoint. > >>>> > >>>> (1) Operations (Currently for Flink) > >>>> We provide flink actions to manage savepoints: > >>>> create-savepoint: To generate a savepoint from latest snapshot. > >>> Support to create from specified snapshot. > >>>> delete-savepoint: To delete specified savepoint. > >>>> rollback-to: To roll back to a specified savepoint. > >>>> > >>>> (2) Table options > >>>> We suppose to provide options for creating savepoint periodically: > >>>> savepoint.create-time: When to create the savepoint. Example: 00:00 > >>>> savepoint.create-interval: Interval between the creation of two > >>> savepoints. Examples: 2 d. > >>>> savepoint.time-retained: The maximum time of savepoints to retain. > >>>> > >>>> (3) Procedures (future work) > >>>> Spark supports SQL extension. After we support Spark CALL statement, we > >>> can provide procedures to create, delete or rollback to savepoint for > >>> Spark > >>> users. > >>>> > >>>> Support of CALL is on the road map of Flink. In future version, we can > >>> also support savepoint-related procedures for Flink users. > >>>> > >>>> 4. Expiration of data files > >>>> > >>>> Currently, when a snapshot is expired, data files that not be used by > >>> other snapshots. After we introduce the savepoint, we must make sure the > >>> data files saved by savepoint will not be deleted. > >>>> > >>>> Conversely, when a savepoint is deleted, the data files that not be > >>> used by existing snapshots and other savepoints will be deleted. > >>>> > >>>> I have wrote some POC codes to implement it. I will update the mechanism > >>> in PIP soon. > >>>> > >>>> Best, > >>>> Yu Zelin > >>>> > >>>>> 2023年5月21日 20:54,Jingsong Li <[email protected]> 写道: > >>>>> > >>>>> Thanks Yun for your information. > >>>>> > >>>>> We need to be careful to avoid confusion between Paimon and Flink > >>>>> concepts about "savepoint" > >>>>> > >>>>> Maybe we don't have to insist on using this "savepoint", for example, > >>>>> TAG is also a candidate just like Iceberg [1] > >>>>> > >>>>> [1] https://iceberg.apache.org/docs/latest/branching/ > >>>>> > >>>>> Best, > >>>>> Jingsong > >>>>> > >>>>> On Sun, May 21, 2023 at 8:51 PM Jingsong Li <[email protected]> > >>> wrote: > >>>>>> > >>>>>> Thanks Nicholas for your detailed requirements. > >>>>>> > >>>>>> We need to supplement user requirements in FLIP, which is mainly aimed > >>>>>> at two purposes: > >>>>>> 1. Fault recovery for data errors (named: restore or rollback-to) > >>>>>> 2. Used to record versions at the day level (such as), targeting > >>> batch queries > >>>>>> > >>>>>> Best, > >>>>>> Jingsong > >>>>>> > >>>>>> On Sat, May 20, 2023 at 2:55 PM Yun Tang <[email protected]> wrote: > >>>>>>> > >>>>>>> Hi Guys, > >>>>>>> > >>>>>>> Since we use Paimon with Flink in most cases, I think we need to > >>> identify the same word "savepoint" in different systems. > >>>>>>> > >>>>>>> For Flink, savepoint means: > >>>>>>> > >>>>>>> 1. Triggered by users, not periodically triggered by the system > >>> itself. However, this FLIP wants to support it created periodically. > >>>>>>> 2. Even the so-called incremental native savepoint [1], it will > >>> not depend on the previous checkpoints or savepoints, it will still copy > >>> files on DFS to the self-contained savepoint folder. However, from the > >>> description of this FLIP about the deletion of expired snapshot files, > >>> paimion savepoint will refer to the previously existing files directly. > >>>>>>> > >>>>>>> I don't think we need to make the semantics of Paimon totally the > >>> same as Flink's. However, we need to introduce a table to tell the > >>> difference compared with Flink and discuss about the difference. > >>>>>>> > >>>>>>> [1] > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic > >>>>>>> > >>>>>>> Best > >>>>>>> Yun Tang > >>>>>>> ________________________________ > >>>>>>> From: Nicholas Jiang <[email protected]> > >>>>>>> Sent: Friday, May 19, 2023 17:40 > >>>>>>> To: [email protected] <[email protected]> > >>>>>>> Subject: Re: [DISCUSS] PIP-4 Support savepoint > >>>>>>> > >>>>>>> Hi Guys, > >>>>>>> > >>>>>>> Thanks Zelin for driving the savepoint proposal. I propose some > >>> opinions for savepont: > >>>>>>> > >>>>>>> -- About "introduce savepoint for Paimon to persist full data in a > >>> time point" > >>>>>>> > >>>>>>> The motivation of savepoint proposal is more like snapshot TTL > >>> management. Actually, disaster recovery is very much mission critical for > >>> any software. Especially when it comes to data systems, the impact could > >>> be > >>> very serious leading to delay in business decisions or even wrong business > >>> decisions at times. Savepoint is proposed to assist users in recovering > >>> data from a previous state: "savepoint" and "restore". > >>>>>>> > >>>>>>> "savepoint" saves the Paimon table as of the commit time, therefore > >>> if there is a savepoint, the data generated in the corresponding commit > >>> could not be clean. Meanwhile, savepoint could let user restore the table > >>> to this savepoint at a later point in time if need be. On similar lines, > >>> savepoint cannot be triggered on a commit that is already cleaned up. > >>> Savepoint is synonymous to taking a backup, just that we don't make a new > >>> copy of the table, but just save the state of the table elegantly so that > >>> we can restore it later when in need. > >>>>>>> > >>>>>>> "restore" lets you restore your table to one of the savepoint > >>> commit. Meanwhile, it cannot be undone (or reversed) and so care should be > >>> taken before doing a restore. At this time, Paimon would delete all data > >>> files and commit files (timeline files) greater than the savepoint commit > >>> to which the table is being restored. > >>>>>>> > >>>>>>> BTW, it's better to introduce snapshot view based on savepoint, > >>> which could improve query performance of historical data for Paimon table. > >>>>>>> > >>>>>>> -- About Public API of savepont > >>>>>>> > >>>>>>> Current introduced savepoint interfaces in Public API are not enough > >>> for users, for example, deleteSavepoint, restoreSavepoint etc. > >>>>>>> > >>>>>>> -- About "Paimon's savepoint need to be combined with Flink's > >>> savepoint": > >>>>>>> > >>>>>>> If paimon supports savepoint mechanism and provides savepoint > >>> interfaces, the integration with Flink's savepoint is not blocked for this > >>> proposal. > >>>>>>> > >>>>>>> In summary, savepoint is not only used to improve the query > >>> performance of historical data, but also used for disaster recovery > >>> processing. > >>>>>>> > >>>>>>> On 2023/05/17 09:53:11 Jingsong Li wrote: > >>>>>>>> What Shammon mentioned is interesting. I agree with what he said > >>> about > >>>>>>>> the differences in savepoints between databases and stream > >>> computing. > >>>>>>>> > >>>>>>>> About "Paimon's savepoint need to be combined with Flink's > >>> savepoint": > >>>>>>>> > >>>>>>>> I think it is possible, but we may need to deal with this in another > >>>>>>>> mechanism, because the snapshots after savepoint may expire. We need > >>>>>>>> to compare data between two savepoints to generate incremental data > >>>>>>>> for streaming read. > >>>>>>>> > >>>>>>>> But this may not need to block FLIP, it looks like the current > >>> design > >>>>>>>> does not break the future combination? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jingsong > >>>>>>>> > >>>>>>>> On Wed, May 17, 2023 at 5:33 PM Shammon FY <[email protected]> > >>> wrote: > >>>>>>>>> > >>>>>>>>> Hi Caizhi, > >>>>>>>>> > >>>>>>>>> Thanks for your comments. As you mentioned, I think we may need to > >>> discuss > >>>>>>>>> the role of savepoint in Paimon. > >>>>>>>>> > >>>>>>>>> If I understand correctly, the main feature of savepoint in the > >>> current PIP > >>>>>>>>> is that the savepoint will not be expired, and users can perform a > >>> query on > >>>>>>>>> the savepoint according to time-travel. Besides that, there is > >>> savepoint in > >>>>>>>>> the database and Flink. > >>>>>>>>> > >>>>>>>>> 1. Savepoint in database. The database can roll back table data to > >>> the > >>>>>>>>> specified 'version' based on savepoint. So the key point of > >>> savepoint in > >>>>>>>>> the database is to rollback data. > >>>>>>>>> > >>>>>>>>> 2. Savepoint in Flink. Users can trigger a savepoint with a > >>> specific > >>>>>>>>> 'path', and save all data of state to the savepoint for job. Then > >>> users can > >>>>>>>>> create a new job based on the savepoint to continue consuming > >>> incremental > >>>>>>>>> data. I think the core capabilities are: backup for a job, and > >>> resume a job > >>>>>>>>> based on the savepoint. > >>>>>>>>> > >>>>>>>>> In addition to the above, Paimon may also face data write > >>> corruption and > >>>>>>>>> need to recover data based on the specified savepoint. So we may > >>> need to > >>>>>>>>> consider what abilities should Paimon savepoint need besides the > >>> ones > >>>>>>>>> mentioned in the current PIP? > >>>>>>>>> > >>>>>>>>> Additionally, as mentioned above, Flink also has > >>>>>>>>> savepoint mechanism. During the process of streaming data from > >>> Flink to > >>>>>>>>> Paimon, does Paimon's savepoint need to be combined with Flink's > >>> savepoint? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Shammon FY > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Wed, May 17, 2023 at 4:02 PM Caizhi Weng <[email protected]> > >>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi developers! > >>>>>>>>>> > >>>>>>>>>> Thanks Zelin for bringing up the discussion. The proposal seems > >>> good to me > >>>>>>>>>> overall. However I'd also like to bring up a few options. > >>>>>>>>>> > >>>>>>>>>> 1. As Jingsong mentioned, Savepoint class should not become a > >>> public API, > >>>>>>>>>> at least for now. What we need to discuss for the public API is > >>> how the > >>>>>>>>>> users can create or delete savepoints. For example, what the > >>> table option > >>>>>>>>>> looks like, what commands and options are provided for the Flink > >>> action, > >>>>>>>>>> etc. > >>>>>>>>>> > >>>>>>>>>> 2. Currently most Flink actions are related to streaming > >>> processing, so > >>>>>>>>>> only Flink can support them. However, savepoint creation and > >>> deletion seems > >>>>>>>>>> like a feature for batch processing. So aside from Flink actions, > >>> shall we > >>>>>>>>>> also provide something like Spark actions for savepoints? > >>>>>>>>>> > >>>>>>>>>> I would also like to comment on Shammon's views. > >>>>>>>>>> > >>>>>>>>>> Should we introduce an option for savepoint path which may be > >>> different > >>>>>>>>>>> from 'warehouse'? Then users can backup the data of savepoint. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> I don't see this is necessary. To backup a table the user just > >>> need to copy > >>>>>>>>>> all files from the table directory. Savepoint in Paimon, as far > >>> as I > >>>>>>>>>> understand, is mainly for users to review historical data, not > >>> for backing > >>>>>>>>>> up tables. > >>>>>>>>>> > >>>>>>>>>> Will the savepoint copy data files from snapshot or only save > >>> meta files? > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> It would be a heavy burden if a savepoint copies all its files. > >>> As I > >>>>>>>>>> mentioned above, savepoint is not for backing up tables. > >>>>>>>>>> > >>>>>>>>>> How can users create a new table and restore data from the > >>> specified > >>>>>>>>>>> savepoint? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This reminds me of savepoints in Flink. Still, savepoint is not > >>> for backing > >>>>>>>>>> up tables so I guess we don't need to support "restoring data" > >>> from a > >>>>>>>>>> savepoint. > >>>>>>>>>> > >>>>>>>>>> Shammon FY <[email protected]> 于2023年5月17日周三 10:32写道: > >>>>>>>>>> > >>>>>>>>>>> Thanks Zelin for initiating this discussion. I have some > >>> comments: > >>>>>>>>>>> > >>>>>>>>>>> 1. Should we introduce an option for savepoint path which may be > >>>>>>>>>> different > >>>>>>>>>>> from 'warehouse'? Then users can backup the data of savepoint. > >>>>>>>>>>> > >>>>>>>>>>> 2. Will the savepoint copy data files from snapshot or only save > >>> meta > >>>>>>>>>>> files? The description in the PIP "After we introduce savepoint, > >>> we > >>>>>>>>>> should > >>>>>>>>>>> also check if the data files are used by savepoints." looks like > >>> we only > >>>>>>>>>>> save meta files for savepoint. > >>>>>>>>>>> > >>>>>>>>>>> 3. How can users create a new table and restore data from the > >>> specified > >>>>>>>>>>> savepoint? > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Shammon FY > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Wed, May 17, 2023 at 10:19 AM Jingsong Li < > >>> [email protected]> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Thanks Zelin for driving. > >>>>>>>>>>>> > >>>>>>>>>>>> Some comments: > >>>>>>>>>>>> > >>>>>>>>>>>> 1. I think it's possible to advance `Proposed Changes` to the > >>> top, > >>>>>>>>>>>> Public API has no meaning if I don't know how to do it. > >>>>>>>>>>>> > >>>>>>>>>>>> 2. Public API, Savepoint and SavepointManager are not Public > >>> API, only > >>>>>>>>>>>> Flink action or configuration option should be public API. > >>>>>>>>>>>> > >>>>>>>>>>>> 3.Maybe we can have a separate chapter to describe > >>>>>>>>>>>> `savepoint.create-interval`, maybe 'Periodically savepoint'? It > >>> is not > >>>>>>>>>>>> just an interval, because the true user case is savepoint after > >>> 0:00. > >>>>>>>>>>>> > >>>>>>>>>>>> 4.About 'Interaction with Snapshot', to be continued ... > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Jingsong > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, May 16, 2023 at 7:07 PM yu zelin <[email protected] > >>>> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi, Paimon Devs, > >>>>>>>>>>>>> I’d like to start a discussion about PIP-4[1]. In this > >>> PIP, I > >>>>>>>>>> want > >>>>>>>>>>>> to talk about why we need savepoint, and some thoughts about > >>> managing > >>>>>>>>>> and > >>>>>>>>>>>> using savepoint. Look forward to your question and suggestions. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Yu Zelin > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/NxE0Dw > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>> > >>> > > >
