I think we can just throw exceptions for pure numeric tag names. Iceberg's behavior looks confusing.
Best, Jingsong On Tue, May 30, 2023 at 3:40 PM yu zelin <[email protected]> wrote: > > Hi, Shammon, > > An intuitive way is use numeric string to indicate snapshot and non-numeric > string to indicate tag. > For example: > > SELECT * FROM t VERSION AS OF 1 —to snapshot #1 > SELECT * FROM t VERSION AS OF ‘last_year’ —to tag `last_year` > > This is also how iceberg do [1]. > > However, if we use this way, the tag name cannot be numeric string. I think > this is acceptable and I will add this to the document. > > Best, > Yu Zelin > > [1] https://iceberg.apache.org/docs/latest/spark-queries/#sql > > > 2023年5月30日 12:17,Shammon FY <[email protected]> 写道: > > > > Hi zelin, > > > > Thanks for your update. I have one comment about Time Travel on savepoint. > > > > Currently we can use statement in spark for specific snapshot 1 > > SELECT * FROM t VERSION AS OF 1; > > > > My point is how can we distinguish between snapshot and savepoint when > > users submit a statement as followed: > > SELECT * FROM t VERSION AS OF <version value>; > > > > Best, > > Shammon FY > > > > On Tue, May 30, 2023 at 11:37 AM yu zelin <[email protected]> wrote: > > > >> Hi, Jingsong, > >> > >> Thanks for your feedback. > >> > >> ## TAG ID > >> It seems the id is useless currently. I’ll remove it. > >> > >> ## Time Travel Syntax > >> Since tag id is removed, we can just use: > >> > >> SELECT * FROM t VERSION AS OF ’tag-name’ > >> > >> to travel to a tag. > >> > >> ## Tag class > >> I agree with you that we can reuse the Snapshot class. We can introduce > >> `TagManager` > >> only to manage tags. > >> > >> ## Expiring Snapshot > >>> why not record it in ManifestEntry? > >> This is because every time Paimon generate a snapshot, it will create new > >> ManifestEntries > >> for data files. Consider this scenario, if we record it in ManifestEntry, > >> assuming we commit > >> data file A to snapshot #1, we will get manifest entry Entry#1 as [ADD, > >> A, commit at #1]. > >> Then we commit -A to snapshot #2, we will get manifest entry Entry#2 as > >> [DELETE, A, ?], > >> as you can see, we cannot know at which snapshot we commit the file A. So > >> we have to > >> record this information to data file meta directly. > >> > >>> We should note that "record it in `DataFileMeta` should be done before > >> “tag” > >> and document version compatibility. > >> > >> I will add message for this. > >> > >> Best, > >> Yu Zelin > >> > >> > >>> 2023年5月29日 10:29,Jingsong Li <[email protected]> 写道: > >>> > >>> Thanks Zelin for the update. > >>> > >>> ## TAG ID > >>> > >>> Is this useful? We have tag-name, snapshot-id, and now introducing a > >>> tag id? What is used? > >>> > >>> ## Time Travel > >>> > >>> SELECT * FROM t VERSION AS OF tag-name.<name> > >>> > >>> This does not look like sql standard. > >>> > >>> Why do we introduce this `tag-name` prefix? > >>> > >>> ## Tag class > >>> > >>> Why not just use the Snapshot class? It looks like we don't need to > >>> introduce Tag class. We can just copy the snapshot file to tag/. > >>> > >>> ## Expiring Snapshot > >>> > >>> We should note that "record it in `DataFileMeta`" should be done > >>> before "tag". And document version compatibility. > >>> And why not record it in ManifestEntry? > >>> > >>> Best, > >>> Jingsong > >>> > >>> On Fri, May 26, 2023 at 11:15 AM yu zelin <[email protected]> wrote: > >>>> > >>>> Hi, all, > >>>> > >>>> FYI, I have updated the PIP [1]. > >>>> > >>>> Main changes: > >>>> - Use new name `tag` > >>>> - Enrich Motivation > >>>> - New Section `Data Files Handling` to describe how to determine a data > >> files can be deleted. > >>>> > >>>> Best, > >>>> Yu Zelin > >>>> > >>>> [1] https://cwiki.apache.org/confluence/x/NxE0Dw > >>>> > >>>>> 2023年5月24日 17:18,yu zelin <[email protected]> 写道: > >>>>> > >>>>> Hi, Guojun, > >>>>> > >>>>> I’d like to share my thoughts about your questions. > >>>>> > >>>>> 1. Expiration of savepoint > >>>>> In my opinion, savepoints are created in a long interval, so there > >> will not exist too many of them. > >>>>> If users create a savepoint per day, there are 365 savepoints a year. > >> So I didn’t consider expiration > >>>>> of it, and I think provide a flink action like `delete-savepoint id = > >> 1` is enough now. > >>>>> But if it is really important, we can introduce table options to do > >> so. I think we can do it like expiring > >>>>> snapshots. > >>>>> > >>>>> 2. > id of compacted snapshot picked by the savepoint > >>>>> My initial idea is picking a compacted snapshot or doing compaction > >> before creating savepoint. But > >>>>> After discuss with Jingsong, I found it’s difficult. So now I suppose > >> to directly create savepoint from > >>>>> the given snapshot. Maybe we can optimize it later. > >>>>> The changes will be updated soon. > >>>>>> manifest file list in system-table > >>>>> I think manifest file is not very important for users. Users can find > >> when a savepoint is created, and > >>>>> get the savepoint id, then they can query it from the savepoint by the > >> id. I did’t see what scenario > >>>>> the users need the manifest file information. What do you think? > >>>>> > >>>>> Best, > >>>>> Yu Zelin > >>>>> > >>>>>> 2023年5月24日 10:50,Guojun Li <[email protected]> 写道: > >>>>>> > >>>>>> Thanks zelin for bringing up the discussion. I'm thinking about: > >>>>>> 1. How to manage the savepoints if there are no expiration mechanism, > >> by > >>>>>> the TTL management of storages or external script? > >>>>>> 2. I think the id of compacted snapshot picked by the savepoint and > >>>>>> manifest file list is also important information for users, could > >> these > >>>>>> information be stored in the system-table? > >>>>>> > >>>>>> Best, > >>>>>> Guojun > >>>>>> > >>>>>> On Mon, May 22, 2023 at 9:13 PM Jingsong Li <[email protected]> > >> wrote: > >>>>>> > >>>>>>> FYI > >>>>>>> > >>>>>>> The PIP lacks a table to show Discussion thread & Vote thread & > >> ISSUE... > >>>>>>> > >>>>>>> Best > >>>>>>> Jingsong > >>>>>>> > >>>>>>> On Mon, May 22, 2023 at 4:48 PM yu zelin <[email protected]> > >> wrote: > >>>>>>>> > >>>>>>>> Hi, all, > >>>>>>>> > >>>>>>>> Thank all of you for your suggestions and questions. After reading > >> your > >>>>>>> suggestions, I adopt some of them and I want to share my opinions > >> here. > >>>>>>>> > >>>>>>>> To make my statements more clear, I will still use the word > >> `savepoint`. > >>>>>>> When we make a consensus, the name may be changed. > >>>>>>>> > >>>>>>>> 1. The purposes of savepoint > >>>>>>>> > >>>>>>>> As Shammon mentioned, Flink and database also have the concept of > >>>>>>> `savepoint`. So it’s better to clarify the purposes of our savepoint. > >>>>>>> Thanks for Nicholas and Jingsong, I think your explanations are very > >> clear. > >>>>>>> I’d like to give my summary: > >>>>>>>> > >>>>>>>> (1) Fault recovery (or we can say disaster recovery). Users can ROLL > >>>>>>> BACK to a savepoint if needed. If user rollbacks to a savepoint, the > >> table > >>>>>>> will hold the data in the savepoint and the data committed after the > >>>>>>> savepoint will be deleted. In this scenario we need savepoint because > >>>>>>> snapshots may have expired, the savepoint can keep longer and save > >> user’s > >>>>>>> old data. > >>>>>>>> > >>>>>>>> (2) Record versions of data at a longer interval (typically daily > >> level > >>>>>>> or weekly level). With savepoint, user can query the old data in > >> batch > >>>>>>> mode. Comparing to copy records to a new table or merge incremental > >> records > >>>>>>> with old records (like using merge into in Hive), the savepoint is > >> more > >>>>>>> lightweight because we don’t copy data files, we just record the > >> meta data > >>>>>>> of them. > >>>>>>>> > >>>>>>>> As you can see, savepoint is very similar to snapshot. The > >> differences > >>>>>>> are: > >>>>>>>> > >>>>>>>> (1) Savepoint lives longer. In most cases, snapshot’s life time is > >>>>>>> about several minutes to hours. We suppose the savepoint can live > >> several > >>>>>>> days, weeks, or even months. > >>>>>>>> > >>>>>>>> (2) Savepoint is mainly used for batch reading for historical data. > >> In > >>>>>>> this PIP, we don’t introduce streaming reading for savepoint. > >>>>>>>> > >>>>>>>> 2. Candidates of name > >>>>>>>> > >>>>>>>> I agree with Jingsong that we can use a new name. Since the purpose > >> and > >>>>>>> mechanism (savepoint is very similar to snapshot) of savepoint is > >> similar > >>>>>>> to `tag` in iceberg, maybe we can use `tag`. > >>>>>>>> > >>>>>>>> In my opinion, an alternative is `anchor`. All the snapshots are > >> like > >>>>>>> the navigation path of the streaming data, and an `anchor` can stop > >> it in a > >>>>>>> place. > >>>>>>>> > >>>>>>>> 3. Public table operations and options > >>>>>>>> > >>>>>>>> We supposed to expose some operations and table options for user to > >>>>>>> manage the savepoint. > >>>>>>>> > >>>>>>>> (1) Operations (Currently for Flink) > >>>>>>>> We provide flink actions to manage savepoints: > >>>>>>>> create-savepoint: To generate a savepoint from latest snapshot. > >>>>>>> Support to create from specified snapshot. > >>>>>>>> delete-savepoint: To delete specified savepoint. > >>>>>>>> rollback-to: To roll back to a specified savepoint. > >>>>>>>> > >>>>>>>> (2) Table options > >>>>>>>> We suppose to provide options for creating savepoint periodically: > >>>>>>>> savepoint.create-time: When to create the savepoint. Example: 00:00 > >>>>>>>> savepoint.create-interval: Interval between the creation of two > >>>>>>> savepoints. Examples: 2 d. > >>>>>>>> savepoint.time-retained: The maximum time of savepoints to retain. > >>>>>>>> > >>>>>>>> (3) Procedures (future work) > >>>>>>>> Spark supports SQL extension. After we support Spark CALL > >> statement, we > >>>>>>> can provide procedures to create, delete or rollback to savepoint > >> for Spark > >>>>>>> users. > >>>>>>>> > >>>>>>>> Support of CALL is on the road map of Flink. In future version, we > >> can > >>>>>>> also support savepoint-related procedures for Flink users. > >>>>>>>> > >>>>>>>> 4. Expiration of data files > >>>>>>>> > >>>>>>>> Currently, when a snapshot is expired, data files that not be used > >> by > >>>>>>> other snapshots. After we introduce the savepoint, we must make sure > >> the > >>>>>>> data files saved by savepoint will not be deleted. > >>>>>>>> > >>>>>>>> Conversely, when a savepoint is deleted, the data files that not be > >>>>>>> used by existing snapshots and other savepoints will be deleted. > >>>>>>>> > >>>>>>>> I have wrote some POC codes to implement it. I will update the > >> mechanism > >>>>>>> in PIP soon. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Yu Zelin > >>>>>>>> > >>>>>>>>> 2023年5月21日 20:54,Jingsong Li <[email protected]> 写道: > >>>>>>>>> > >>>>>>>>> Thanks Yun for your information. > >>>>>>>>> > >>>>>>>>> We need to be careful to avoid confusion between Paimon and Flink > >>>>>>>>> concepts about "savepoint" > >>>>>>>>> > >>>>>>>>> Maybe we don't have to insist on using this "savepoint", for > >> example, > >>>>>>>>> TAG is also a candidate just like Iceberg [1] > >>>>>>>>> > >>>>>>>>> [1] https://iceberg.apache.org/docs/latest/branching/ > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jingsong > >>>>>>>>> > >>>>>>>>> On Sun, May 21, 2023 at 8:51 PM Jingsong Li < > >> [email protected]> > >>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Thanks Nicholas for your detailed requirements. > >>>>>>>>>> > >>>>>>>>>> We need to supplement user requirements in FLIP, which is mainly > >> aimed > >>>>>>>>>> at two purposes: > >>>>>>>>>> 1. Fault recovery for data errors (named: restore or rollback-to) > >>>>>>>>>> 2. Used to record versions at the day level (such as), targeting > >>>>>>> batch queries > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jingsong > >>>>>>>>>> > >>>>>>>>>> On Sat, May 20, 2023 at 2:55 PM Yun Tang <[email protected]> > >> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Guys, > >>>>>>>>>>> > >>>>>>>>>>> Since we use Paimon with Flink in most cases, I think we need to > >>>>>>> identify the same word "savepoint" in different systems. > >>>>>>>>>>> > >>>>>>>>>>> For Flink, savepoint means: > >>>>>>>>>>> > >>>>>>>>>>> 1. Triggered by users, not periodically triggered by the system > >>>>>>> itself. However, this FLIP wants to support it created periodically. > >>>>>>>>>>> 2. Even the so-called incremental native savepoint [1], it will > >>>>>>> not depend on the previous checkpoints or savepoints, it will still > >> copy > >>>>>>> files on DFS to the self-contained savepoint folder. However, from > >> the > >>>>>>> description of this FLIP about the deletion of expired snapshot > >> files, > >>>>>>> paimion savepoint will refer to the previously existing files > >> directly. > >>>>>>>>>>> > >>>>>>>>>>> I don't think we need to make the semantics of Paimon totally the > >>>>>>> same as Flink's. However, we need to introduce a table to tell the > >>>>>>> difference compared with Flink and discuss about the difference. > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic > >>>>>>>>>>> > >>>>>>>>>>> Best > >>>>>>>>>>> Yun Tang > >>>>>>>>>>> ________________________________ > >>>>>>>>>>> From: Nicholas Jiang <[email protected]> > >>>>>>>>>>> Sent: Friday, May 19, 2023 17:40 > >>>>>>>>>>> To: [email protected] <[email protected]> > >>>>>>>>>>> Subject: Re: [DISCUSS] PIP-4 Support savepoint > >>>>>>>>>>> > >>>>>>>>>>> Hi Guys, > >>>>>>>>>>> > >>>>>>>>>>> Thanks Zelin for driving the savepoint proposal. I propose some > >>>>>>> opinions for savepont: > >>>>>>>>>>> > >>>>>>>>>>> -- About "introduce savepoint for Paimon to persist full data in > >> a > >>>>>>> time point" > >>>>>>>>>>> > >>>>>>>>>>> The motivation of savepoint proposal is more like snapshot TTL > >>>>>>> management. Actually, disaster recovery is very much mission > >> critical for > >>>>>>> any software. Especially when it comes to data systems, the impact > >> could be > >>>>>>> very serious leading to delay in business decisions or even wrong > >> business > >>>>>>> decisions at times. Savepoint is proposed to assist users in > >> recovering > >>>>>>> data from a previous state: "savepoint" and "restore". > >>>>>>>>>>> > >>>>>>>>>>> "savepoint" saves the Paimon table as of the commit time, > >> therefore > >>>>>>> if there is a savepoint, the data generated in the corresponding > >> commit > >>>>>>> could not be clean. Meanwhile, savepoint could let user restore the > >> table > >>>>>>> to this savepoint at a later point in time if need be. On similar > >> lines, > >>>>>>> savepoint cannot be triggered on a commit that is already cleaned up. > >>>>>>> Savepoint is synonymous to taking a backup, just that we don't make > >> a new > >>>>>>> copy of the table, but just save the state of the table elegantly so > >> that > >>>>>>> we can restore it later when in need. > >>>>>>>>>>> > >>>>>>>>>>> "restore" lets you restore your table to one of the savepoint > >>>>>>> commit. Meanwhile, it cannot be undone (or reversed) and so care > >> should be > >>>>>>> taken before doing a restore. At this time, Paimon would delete all > >> data > >>>>>>> files and commit files (timeline files) greater than the savepoint > >> commit > >>>>>>> to which the table is being restored. > >>>>>>>>>>> > >>>>>>>>>>> BTW, it's better to introduce snapshot view based on savepoint, > >>>>>>> which could improve query performance of historical data for Paimon > >> table. > >>>>>>>>>>> > >>>>>>>>>>> -- About Public API of savepont > >>>>>>>>>>> > >>>>>>>>>>> Current introduced savepoint interfaces in Public API are not > >> enough > >>>>>>> for users, for example, deleteSavepoint, restoreSavepoint etc. > >>>>>>>>>>> > >>>>>>>>>>> -- About "Paimon's savepoint need to be combined with Flink's > >>>>>>> savepoint": > >>>>>>>>>>> > >>>>>>>>>>> If paimon supports savepoint mechanism and provides savepoint > >>>>>>> interfaces, the integration with Flink's savepoint is not blocked > >> for this > >>>>>>> proposal. > >>>>>>>>>>> > >>>>>>>>>>> In summary, savepoint is not only used to improve the query > >>>>>>> performance of historical data, but also used for disaster recovery > >>>>>>> processing. > >>>>>>>>>>> > >>>>>>>>>>> On 2023/05/17 09:53:11 Jingsong Li wrote: > >>>>>>>>>>>> What Shammon mentioned is interesting. I agree with what he said > >>>>>>> about > >>>>>>>>>>>> the differences in savepoints between databases and stream > >>>>>>> computing. > >>>>>>>>>>>> > >>>>>>>>>>>> About "Paimon's savepoint need to be combined with Flink's > >>>>>>> savepoint": > >>>>>>>>>>>> > >>>>>>>>>>>> I think it is possible, but we may need to deal with this in > >> another > >>>>>>>>>>>> mechanism, because the snapshots after savepoint may expire. We > >> need > >>>>>>>>>>>> to compare data between two savepoints to generate incremental > >> data > >>>>>>>>>>>> for streaming read. > >>>>>>>>>>>> > >>>>>>>>>>>> But this may not need to block FLIP, it looks like the current > >>>>>>> design > >>>>>>>>>>>> does not break the future combination? > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Jingsong > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, May 17, 2023 at 5:33 PM Shammon FY <[email protected]> > >>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Caizhi, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks for your comments. As you mentioned, I think we may > >> need to > >>>>>>> discuss > >>>>>>>>>>>>> the role of savepoint in Paimon. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If I understand correctly, the main feature of savepoint in the > >>>>>>> current PIP > >>>>>>>>>>>>> is that the savepoint will not be expired, and users can > >> perform a > >>>>>>> query on > >>>>>>>>>>>>> the savepoint according to time-travel. Besides that, there is > >>>>>>> savepoint in > >>>>>>>>>>>>> the database and Flink. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. Savepoint in database. The database can roll back table > >> data to > >>>>>>> the > >>>>>>>>>>>>> specified 'version' based on savepoint. So the key point of > >>>>>>> savepoint in > >>>>>>>>>>>>> the database is to rollback data. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2. Savepoint in Flink. Users can trigger a savepoint with a > >>>>>>> specific > >>>>>>>>>>>>> 'path', and save all data of state to the savepoint for job. > >> Then > >>>>>>> users can > >>>>>>>>>>>>> create a new job based on the savepoint to continue consuming > >>>>>>> incremental > >>>>>>>>>>>>> data. I think the core capabilities are: backup for a job, and > >>>>>>> resume a job > >>>>>>>>>>>>> based on the savepoint. > >>>>>>>>>>>>> > >>>>>>>>>>>>> In addition to the above, Paimon may also face data write > >>>>>>> corruption and > >>>>>>>>>>>>> need to recover data based on the specified savepoint. So we > >> may > >>>>>>> need to > >>>>>>>>>>>>> consider what abilities should Paimon savepoint need besides > >> the > >>>>>>> ones > >>>>>>>>>>>>> mentioned in the current PIP? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Additionally, as mentioned above, Flink also has > >>>>>>>>>>>>> savepoint mechanism. During the process of streaming data from > >>>>>>> Flink to > >>>>>>>>>>>>> Paimon, does Paimon's savepoint need to be combined with > >> Flink's > >>>>>>> savepoint? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Shammon FY > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, May 17, 2023 at 4:02 PM Caizhi Weng < > >> [email protected]> > >>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi developers! > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks Zelin for bringing up the discussion. The proposal > >> seems > >>>>>>> good to me > >>>>>>>>>>>>>> overall. However I'd also like to bring up a few options. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1. As Jingsong mentioned, Savepoint class should not become a > >>>>>>> public API, > >>>>>>>>>>>>>> at least for now. What we need to discuss for the public API > >> is > >>>>>>> how the > >>>>>>>>>>>>>> users can create or delete savepoints. For example, what the > >>>>>>> table option > >>>>>>>>>>>>>> looks like, what commands and options are provided for the > >> Flink > >>>>>>> action, > >>>>>>>>>>>>>> etc. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 2. Currently most Flink actions are related to streaming > >>>>>>> processing, so > >>>>>>>>>>>>>> only Flink can support them. However, savepoint creation and > >>>>>>> deletion seems > >>>>>>>>>>>>>> like a feature for batch processing. So aside from Flink > >> actions, > >>>>>>> shall we > >>>>>>>>>>>>>> also provide something like Spark actions for savepoints? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I would also like to comment on Shammon's views. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Should we introduce an option for savepoint path which may be > >>>>>>> different > >>>>>>>>>>>>>>> from 'warehouse'? Then users can backup the data of > >> savepoint. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I don't see this is necessary. To backup a table the user just > >>>>>>> need to copy > >>>>>>>>>>>>>> all files from the table directory. Savepoint in Paimon, as > >> far > >>>>>>> as I > >>>>>>>>>>>>>> understand, is mainly for users to review historical data, not > >>>>>>> for backing > >>>>>>>>>>>>>> up tables. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Will the savepoint copy data files from snapshot or only save > >>>>>>> meta files? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> It would be a heavy burden if a savepoint copies all its > >> files. > >>>>>>> As I > >>>>>>>>>>>>>> mentioned above, savepoint is not for backing up tables. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> How can users create a new table and restore data from the > >>>>>>> specified > >>>>>>>>>>>>>>> savepoint? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This reminds me of savepoints in Flink. Still, savepoint is > >> not > >>>>>>> for backing > >>>>>>>>>>>>>> up tables so I guess we don't need to support "restoring data" > >>>>>>> from a > >>>>>>>>>>>>>> savepoint. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Shammon FY <[email protected]> 于2023年5月17日周三 10:32写道: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks Zelin for initiating this discussion. I have some > >>>>>>> comments: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. Should we introduce an option for savepoint path which > >> may be > >>>>>>>>>>>>>> different > >>>>>>>>>>>>>>> from 'warehouse'? Then users can backup the data of > >> savepoint. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2. Will the savepoint copy data files from snapshot or only > >> save > >>>>>>> meta > >>>>>>>>>>>>>>> files? The description in the PIP "After we introduce > >> savepoint, > >>>>>>> we > >>>>>>>>>>>>>> should > >>>>>>>>>>>>>>> also check if the data files are used by savepoints." looks > >> like > >>>>>>> we only > >>>>>>>>>>>>>>> save meta files for savepoint. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 3. How can users create a new table and restore data from the > >>>>>>> specified > >>>>>>>>>>>>>>> savepoint? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Shammon FY > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, May 17, 2023 at 10:19 AM Jingsong Li < > >>>>>>> [email protected]> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks Zelin for driving. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Some comments: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 1. I think it's possible to advance `Proposed Changes` to > >> the > >>>>>>> top, > >>>>>>>>>>>>>>>> Public API has no meaning if I don't know how to do it. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 2. Public API, Savepoint and SavepointManager are not Public > >>>>>>> API, only > >>>>>>>>>>>>>>>> Flink action or configuration option should be public API. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 3.Maybe we can have a separate chapter to describe > >>>>>>>>>>>>>>>> `savepoint.create-interval`, maybe 'Periodically > >> savepoint'? It > >>>>>>> is not > >>>>>>>>>>>>>>>> just an interval, because the true user case is savepoint > >> after > >>>>>>> 0:00. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> 4.About 'Interaction with Snapshot', to be continued ... > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Jingsong > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, May 16, 2023 at 7:07 PM yu zelin < > >> [email protected] > >>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi, Paimon Devs, > >>>>>>>>>>>>>>>>> I’d like to start a discussion about PIP-4[1]. In this > >>>>>>> PIP, I > >>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>> to talk about why we need savepoint, and some thoughts about > >>>>>>> managing > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>> using savepoint. Look forward to your question and > >> suggestions. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Yu Zelin > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/NxE0Dw > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>> > >>>> > >> > >> >
