> We must ensure that there is at least one complete snapshot in the target 
> table after the clone procedure is finished.

+1

> We need to discuss whether it is possible for the following corner case to 
> occur.

This example is indeed a problem. We can check if there is a complete
snapshot, but it can also be left out of consideration for now. It
should be improved in the future.

Best,
Jingsong

On Wed, Apr 3, 2024 at 4:40 PM wj wang <[email protected]> wrote:
>
> Hi,Jingsong and zelin,
>
> My opinion is as follows:
> We must ensure that there is at least one complete snapshot in the target
> table after the clone procedure is finished.
>
> > For cloning specified snapshot or tag, undoubtedly, rollback operation
> (deleting copied files) and an exception should be thrown.
>
> My opinion is exactly the same as Jingsong.
>
>
>
> >  For cloning all snapshots and tags, we should ignore deleted files to
> keep this clone working. To avoid conflicting with expiring snapshots and
> deleting files in streaming writing job.
>
> We need to discuss whether it is possible for the following corner case to
> occur.
>     1. There are three snapshots(snapshot-1, snapshot-2, snapshot-3) at the
> beginning of the source table.
>     2. We start a clone procedure. All files belonging to
> snapshots(snapshot-1, snapshot-2, snapshot-3) are selected.
>     3. Start a flink batch job to copy files.
>     4. In streaming writing job, commit snapshot-4, snapshot-5, snapshot-6.
>     5. The snapshot-3 hit the snapshot expire logic and some files of
> snapshot-3 are deleted.
>     6. The flink batch job was executed for a long time due to cluster
> environment and other factors. Now it finished and ignore FileNotFound
> exception.
>     7. Finally there is no complete snapshot in the target table.
> Whether it is possible for the corner case to occur? Let discuss it.
>
>
>
> On Wed, Apr 3, 2024 at 3:13 PM Jingsong Li <[email protected]> wrote:
>
> > > I want to know that if in the clone procedure, the specified snapshot or
> > tag is being deleted, how do we handle the exception?
> > Should we stop the procedure and clean the temporary target table
> > directory?
> >
> > - For cloning specified snapshot or tag, undoubtedly, rollback
> > operation (deleting copied files) and an exception should be thrown.
> >
> > - For cloning all snapshots and tags, we should ignore deleted files
> > to keep this clone working. To avoid conflicting with expiring
> > snapshots and deleting files in streaming writing job.
> >
> > Best,
> > Jingsong
> >
> > On Wed, Apr 3, 2024 at 3:08 PM yu zelin <[email protected]> wrote:
> > >
> > > Hi Jingsong,
> > >
> > > I want to know that if in the clone procedure, the specified snapshot or
> > > tag is being deleted, how do we handle the exception?
> > > Should we stop the procedure and clean the temporary target table
> > directory?
> > >
> > > Best regards,
> > > Zelin Yu
> > >
> > > On Mon, Mar 18, 2024 at 1:30 PM Jingsong Li <[email protected]>
> > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I have heard many times that there is a need to copy the entire table,
> > > > and my advice to them is often to use file system file copying.
> > > >
> > > > But there are a few issues:
> > > > 1. It is necessary to copy a large number of files, and it is likely
> > > > that some files will be deleted due to ongoing work, resulting in
> > > > copying failure.
> > > > 2. The target table may need to synchronize Hive metadata, which means
> > > > using HiveCatalog, which cannot be solved by copying files.
> > > >
> > > > So I suggest we have a clone procedure. [1]
> > > >
> > > > Also, welcome contributors to develop this PIP together, and I will
> > > > help you review your code.
> > > >
> > > > [1]
> > > >
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> >

Reply via email to