> Why not put snapshot copying in the last? Thanks, I am agree with you.
PR: https://github.com/apache/paimon/pull/3230 On Mon, Apr 15, 2024 at 7:51 PM Jingsong Li <[email protected]> wrote: > > Thanks wj for driving this PIP. > > A question about execution. > > Why not put snapshot copying in the last? > > In this way, the table will be used only snapshot file ready, this can > avoid a uncompleted table. > > Just like writing a table, snapshot file will be ready in committer. > > wj wang <[email protected]>于2024年4月11日 周四10:00写道: > > > Hi devs. > > I have updated pip-18. > > please review it and let us discuss. > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure > > > > On Sun, Apr 7, 2024 at 4:53 PM Jingsong Li <[email protected]> wrote: > > > > > > > We must ensure that there is at least one complete snapshot in the > > target table after the clone procedure is finished. > > > > > > +1 > > > > > > > We need to discuss whether it is possible for the following corner > > case to occur. > > > > > > This example is indeed a problem. We can check if there is a complete > > > snapshot, but it can also be left out of consideration for now. It > > > should be improved in the future. > > > > > > Best, > > > Jingsong > > > > > > On Wed, Apr 3, 2024 at 4:40 PM wj wang <[email protected]> wrote: > > > > > > > > Hi,Jingsong and zelin, > > > > > > > > My opinion is as follows: > > > > We must ensure that there is at least one complete snapshot in the > > target > > > > table after the clone procedure is finished. > > > > > > > > > For cloning specified snapshot or tag, undoubtedly, rollback > > operation > > > > (deleting copied files) and an exception should be thrown. > > > > > > > > My opinion is exactly the same as Jingsong. > > > > > > > > > > > > > > > > > For cloning all snapshots and tags, we should ignore deleted files > > to > > > > keep this clone working. To avoid conflicting with expiring snapshots > > and > > > > deleting files in streaming writing job. > > > > > > > > We need to discuss whether it is possible for the following corner > > case to > > > > occur. > > > > 1. There are three snapshots(snapshot-1, snapshot-2, snapshot-3) > > at the > > > > beginning of the source table. > > > > 2. We start a clone procedure. All files belonging to > > > > snapshots(snapshot-1, snapshot-2, snapshot-3) are selected. > > > > 3. Start a flink batch job to copy files. > > > > 4. In streaming writing job, commit snapshot-4, snapshot-5, > > snapshot-6. > > > > 5. The snapshot-3 hit the snapshot expire logic and some files of > > > > snapshot-3 are deleted. > > > > 6. The flink batch job was executed for a long time due to cluster > > > > environment and other factors. Now it finished and ignore FileNotFound > > > > exception. > > > > 7. Finally there is no complete snapshot in the target table. > > > > Whether it is possible for the corner case to occur? Let discuss it. > > > > > > > > > > > > > > > > On Wed, Apr 3, 2024 at 3:13 PM Jingsong Li <[email protected]> > > wrote: > > > > > > > > > > I want to know that if in the clone procedure, the specified > > snapshot or > > > > > tag is being deleted, how do we handle the exception? > > > > > Should we stop the procedure and clean the temporary target table > > > > > directory? > > > > > > > > > > - For cloning specified snapshot or tag, undoubtedly, rollback > > > > > operation (deleting copied files) and an exception should be thrown. > > > > > > > > > > - For cloning all snapshots and tags, we should ignore deleted files > > > > > to keep this clone working. To avoid conflicting with expiring > > > > > snapshots and deleting files in streaming writing job. > > > > > > > > > > Best, > > > > > Jingsong > > > > > > > > > > On Wed, Apr 3, 2024 at 3:08 PM yu zelin <[email protected]> > > wrote: > > > > > > > > > > > > Hi Jingsong, > > > > > > > > > > > > I want to know that if in the clone procedure, the specified > > snapshot or > > > > > > tag is being deleted, how do we handle the exception? > > > > > > Should we stop the procedure and clean the temporary target table > > > > > directory? > > > > > > > > > > > > Best regards, > > > > > > Zelin Yu > > > > > > > > > > > > On Mon, Mar 18, 2024 at 1:30 PM Jingsong Li < > > [email protected]> > > > > > wrote: > > > > > > > > > > > > > Hi devs, > > > > > > > > > > > > > > I have heard many times that there is a need to copy the entire > > table, > > > > > > > and my advice to them is often to use file system file copying. > > > > > > > > > > > > > > But there are a few issues: > > > > > > > 1. It is necessary to copy a large number of files, and it is > > likely > > > > > > > that some files will be deleted due to ongoing work, resulting in > > > > > > > copying failure. > > > > > > > 2. The target table may need to synchronize Hive metadata, which > > means > > > > > > > using HiveCatalog, which cannot be solved by copying files. > > > > > > > > > > > > > > So I suggest we have a clone procedure. [1] > > > > > > > > > > > > > > Also, welcome contributors to develop this PIP together, and I > > will > > > > > > > help you review your code. > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure > > > > > > > > > > > > > > Best, > > > > > > > Jingsong > > > > > > > > > > > > > >
