Hi devs. I have updated pip-18. please review it and let us discuss. https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure
On Sun, Apr 7, 2024 at 4:53 PM Jingsong Li <[email protected]> wrote: > > > We must ensure that there is at least one complete snapshot in the target > > table after the clone procedure is finished. > > +1 > > > We need to discuss whether it is possible for the following corner case to > > occur. > > This example is indeed a problem. We can check if there is a complete > snapshot, but it can also be left out of consideration for now. It > should be improved in the future. > > Best, > Jingsong > > On Wed, Apr 3, 2024 at 4:40 PM wj wang <[email protected]> wrote: > > > > Hi,Jingsong and zelin, > > > > My opinion is as follows: > > We must ensure that there is at least one complete snapshot in the target > > table after the clone procedure is finished. > > > > > For cloning specified snapshot or tag, undoubtedly, rollback operation > > (deleting copied files) and an exception should be thrown. > > > > My opinion is exactly the same as Jingsong. > > > > > > > > > For cloning all snapshots and tags, we should ignore deleted files to > > keep this clone working. To avoid conflicting with expiring snapshots and > > deleting files in streaming writing job. > > > > We need to discuss whether it is possible for the following corner case to > > occur. > > 1. There are three snapshots(snapshot-1, snapshot-2, snapshot-3) at the > > beginning of the source table. > > 2. We start a clone procedure. All files belonging to > > snapshots(snapshot-1, snapshot-2, snapshot-3) are selected. > > 3. Start a flink batch job to copy files. > > 4. In streaming writing job, commit snapshot-4, snapshot-5, snapshot-6. > > 5. The snapshot-3 hit the snapshot expire logic and some files of > > snapshot-3 are deleted. > > 6. The flink batch job was executed for a long time due to cluster > > environment and other factors. Now it finished and ignore FileNotFound > > exception. > > 7. Finally there is no complete snapshot in the target table. > > Whether it is possible for the corner case to occur? Let discuss it. > > > > > > > > On Wed, Apr 3, 2024 at 3:13 PM Jingsong Li <[email protected]> wrote: > > > > > > I want to know that if in the clone procedure, the specified snapshot or > > > tag is being deleted, how do we handle the exception? > > > Should we stop the procedure and clean the temporary target table > > > directory? > > > > > > - For cloning specified snapshot or tag, undoubtedly, rollback > > > operation (deleting copied files) and an exception should be thrown. > > > > > > - For cloning all snapshots and tags, we should ignore deleted files > > > to keep this clone working. To avoid conflicting with expiring > > > snapshots and deleting files in streaming writing job. > > > > > > Best, > > > Jingsong > > > > > > On Wed, Apr 3, 2024 at 3:08 PM yu zelin <[email protected]> wrote: > > > > > > > > Hi Jingsong, > > > > > > > > I want to know that if in the clone procedure, the specified snapshot or > > > > tag is being deleted, how do we handle the exception? > > > > Should we stop the procedure and clean the temporary target table > > > directory? > > > > > > > > Best regards, > > > > Zelin Yu > > > > > > > > On Mon, Mar 18, 2024 at 1:30 PM Jingsong Li <[email protected]> > > > wrote: > > > > > > > > > Hi devs, > > > > > > > > > > I have heard many times that there is a need to copy the entire table, > > > > > and my advice to them is often to use file system file copying. > > > > > > > > > > But there are a few issues: > > > > > 1. It is necessary to copy a large number of files, and it is likely > > > > > that some files will be deleted due to ongoing work, resulting in > > > > > copying failure. > > > > > 2. The target table may need to synchronize Hive metadata, which means > > > > > using HiveCatalog, which cannot be solved by copying files. > > > > > > > > > > So I suggest we have a clone procedure. [1] > > > > > > > > > > Also, welcome contributors to develop this PIP together, and I will > > > > > help you review your code. > > > > > > > > > > [1] > > > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure > > > > > > > > > > Best, > > > > > Jingsong > > > > > > > >
