> Why not put snapshot copying in the last?
Thanks, I am agree with you.

PR:
https://github.com/apache/paimon/pull/3230

On Mon, Apr 15, 2024 at 7:51 PM Jingsong Li <[email protected]> wrote:
>
> Thanks wj for driving this PIP.
>
> A question about execution.
>
> Why not put snapshot copying in the last?
>
> In this way, the table will be used only snapshot file ready, this can
> avoid a uncompleted table.
>
> Just like writing a table, snapshot file will be ready in committer.
>
> wj wang <[email protected]>于2024年4月11日 周四10:00写道:
>
> > Hi devs.
> > I have updated pip-18.
> > please review it and let us discuss.
> >
> >
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure
> >
> > On Sun, Apr 7, 2024 at 4:53 PM Jingsong Li <[email protected]> wrote:
> > >
> > > > We must ensure that there is at least one complete snapshot in the
> > target table after the clone procedure is finished.
> > >
> > > +1
> > >
> > > > We need to discuss whether it is possible for the following corner
> > case to occur.
> > >
> > > This example is indeed a problem. We can check if there is a complete
> > > snapshot, but it can also be left out of consideration for now. It
> > > should be improved in the future.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Wed, Apr 3, 2024 at 4:40 PM wj wang <[email protected]> wrote:
> > > >
> > > > Hi,Jingsong and zelin,
> > > >
> > > > My opinion is as follows:
> > > > We must ensure that there is at least one complete snapshot in the
> > target
> > > > table after the clone procedure is finished.
> > > >
> > > > > For cloning specified snapshot or tag, undoubtedly, rollback
> > operation
> > > > (deleting copied files) and an exception should be thrown.
> > > >
> > > > My opinion is exactly the same as Jingsong.
> > > >
> > > >
> > > >
> > > > >  For cloning all snapshots and tags, we should ignore deleted files
> > to
> > > > keep this clone working. To avoid conflicting with expiring snapshots
> > and
> > > > deleting files in streaming writing job.
> > > >
> > > > We need to discuss whether it is possible for the following corner
> > case to
> > > > occur.
> > > >     1. There are three snapshots(snapshot-1, snapshot-2, snapshot-3)
> > at the
> > > > beginning of the source table.
> > > >     2. We start a clone procedure. All files belonging to
> > > > snapshots(snapshot-1, snapshot-2, snapshot-3) are selected.
> > > >     3. Start a flink batch job to copy files.
> > > >     4. In streaming writing job, commit snapshot-4, snapshot-5,
> > snapshot-6.
> > > >     5. The snapshot-3 hit the snapshot expire logic and some files of
> > > > snapshot-3 are deleted.
> > > >     6. The flink batch job was executed for a long time due to cluster
> > > > environment and other factors. Now it finished and ignore FileNotFound
> > > > exception.
> > > >     7. Finally there is no complete snapshot in the target table.
> > > > Whether it is possible for the corner case to occur? Let discuss it.
> > > >
> > > >
> > > >
> > > > On Wed, Apr 3, 2024 at 3:13 PM Jingsong Li <[email protected]>
> > wrote:
> > > >
> > > > > > I want to know that if in the clone procedure, the specified
> > snapshot or
> > > > > tag is being deleted, how do we handle the exception?
> > > > > Should we stop the procedure and clean the temporary target table
> > > > > directory?
> > > > >
> > > > > - For cloning specified snapshot or tag, undoubtedly, rollback
> > > > > operation (deleting copied files) and an exception should be thrown.
> > > > >
> > > > > - For cloning all snapshots and tags, we should ignore deleted files
> > > > > to keep this clone working. To avoid conflicting with expiring
> > > > > snapshots and deleting files in streaming writing job.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Wed, Apr 3, 2024 at 3:08 PM yu zelin <[email protected]>
> > wrote:
> > > > > >
> > > > > > Hi Jingsong,
> > > > > >
> > > > > > I want to know that if in the clone procedure, the specified
> > snapshot or
> > > > > > tag is being deleted, how do we handle the exception?
> > > > > > Should we stop the procedure and clean the temporary target table
> > > > > directory?
> > > > > >
> > > > > > Best regards,
> > > > > > Zelin Yu
> > > > > >
> > > > > > On Mon, Mar 18, 2024 at 1:30 PM Jingsong Li <
> > [email protected]>
> > > > > wrote:
> > > > > >
> > > > > > > Hi devs,
> > > > > > >
> > > > > > > I have heard many times that there is a need to copy the entire
> > table,
> > > > > > > and my advice to them is often to use file system file copying.
> > > > > > >
> > > > > > > But there are a few issues:
> > > > > > > 1. It is necessary to copy a large number of files, and it is
> > likely
> > > > > > > that some files will be deleted due to ongoing work, resulting in
> > > > > > > copying failure.
> > > > > > > 2. The target table may need to synchronize Hive metadata, which
> > means
> > > > > > > using HiveCatalog, which cannot be solved by copying files.
> > > > > > >
> > > > > > > So I suggest we have a clone procedure. [1]
> > > > > > >
> > > > > > > Also, welcome contributors to develop this PIP together, and I
> > will
> > > > > > > help you review your code.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > >
> >

Reply via email to