Re: [DISCUSS] PIP-18: Introduce clone Procedure

Jingsong Li Wed, 20 Mar 2024 00:25:17 -0700

Hi Aitozi,

Thanks for your feedback.


> 1. Is the `clone` procedure equal to the following process ?
     - create the target table in target catalog
     - Insert into the target_table select * from source_table
/*+OPTIONS('scan.tag-name' = 'tag-1') */;

Yes.

> 2. What do you mean by
     > "The target table may need to synchronize Hive metadata, which
means using HiveCatalog, which cannot be solved by copying files."
     Does it means clone a paimon table to a hive table ?

No, I mean using metastore=hive, using HiveCatalog. I just want to
clarify that simple file copying cannot fully handle schema migration,
as the schema should be synchronized to HMS.

> If clone a paimon table to another paimon table, can we use file copy 
> solution?

In terms of implementation, yes, it is file copying, but compared to
full directory copying:
1. Copy only partial files.
2. At the same time, tables will also be created in the catalog.

> I guess it may be useful when we have to move the data between cluster.

You can use HiveCatalog here, so that different databases can be on
different clusters, which can complete data migration across clusters.

Best,
Jingsong

On Tue, Mar 19, 2024 at 11:25 PM Aitozi <[email protected]> wrote:
>
> Hi Jingsong,
>
>     Clone table is a useful feature, I have two question here.
>
> 1. Is the `clone` procedure equal to the following process ?
>        - create the target table in target catalog
>        - Insert into the target_table select * from source_table /*+
> OPTIONS('scan.tag-name' = 'tag-1') */;
> 2. What do you mean by
>
> > "The target table may need to synchronize Hive metadata, which means
> using HiveCatalog, which cannot be solved by copying files."
>
> Does it means clone a paimon table to a hive table ?
>
> If clone a paimon table to another paimon table, can we use file copy
> solution?
> I guess it may be useful when we have to move the data between cluster.
>
> Best,
> Atiozi
>
> Jingsong Li <[email protected]> 于2024年3月18日周一 13:30写道：
>
> > Hi devs,
> >
> > I have heard many times that there is a need to copy the entire table,
> > and my advice to them is often to use file system file copying.
> >
> > But there are a few issues:
> > 1. It is necessary to copy a large number of files, and it is likely
> > that some files will be deleted due to ongoing work, resulting in
> > copying failure.
> > 2. The target table may need to synchronize Hive metadata, which means
> > using HiveCatalog, which cannot be solved by copying files.
> >
> > So I suggest we have a clone procedure. [1]
> >
> > Also, welcome contributors to develop this PIP together, and I will
> > help you review your code.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure
> >
> > Best,
> > Jingsong
> >

Re: [DISCUSS] PIP-18: Introduce clone Procedure

Reply via email to