> But in this case, we may have to copy all the snapshot for data migration.
We need an API to describe how to copy snapshots. I will think about this. Best, Jingsong On Wed, Mar 20, 2024 at 4:06 PM Aitozi <[email protected]> wrote: > > Hi Jingsong > > No, I mean using metastore=hive, using HiveCatalog. I just want to > clarify that simple file copying cannot fully handle schema migration, > as the schema should be synchronized to HMS. > > So in terms of implementation, the actual workflow like this ? > 1. pick the file list of one snapshot > 2. copy the file to the target table's path > 3. create the table in table schema > > > You can use HiveCatalog here, so that different databases can be on > different clusters, which can complete data migration across clusters. > > Yes. But in this case, we may have to copy all the snapshot for data > migration. > So I think maybe we do not have to limit only copy one snapshot as > described in the doc. > > Best, > Aitozi. > > > Jingsong Li <[email protected]> 于2024年3月20日周三 15:25写道: > > > Hi Aitozi, > > > > Thanks for your feedback. > > > > > 1. Is the `clone` procedure equal to the following process ? > > - create the target table in target catalog > > - Insert into the target_table select * from source_table > > /*+OPTIONS('scan.tag-name' = 'tag-1') */; > > > > Yes. > > > > > 2. What do you mean by > > > "The target table may need to synchronize Hive metadata, which > > means using HiveCatalog, which cannot be solved by copying files." > > Does it means clone a paimon table to a hive table ? > > > > No, I mean using metastore=hive, using HiveCatalog. I just want to > > clarify that simple file copying cannot fully handle schema migration, > > as the schema should be synchronized to HMS. > > > > > If clone a paimon table to another paimon table, can we use file copy > > solution? > > > > In terms of implementation, yes, it is file copying, but compared to > > full directory copying: > > 1. Copy only partial files. > > 2. At the same time, tables will also be created in the catalog. > > > > > I guess it may be useful when we have to move the data between cluster. > > > > You can use HiveCatalog here, so that different databases can be on > > different clusters, which can complete data migration across clusters. > > > > Best, > > Jingsong > > > > On Tue, Mar 19, 2024 at 11:25 PM Aitozi <[email protected]> wrote: > > > > > > Hi Jingsong, > > > > > > Clone table is a useful feature, I have two question here. > > > > > > 1. Is the `clone` procedure equal to the following process ? > > > - create the target table in target catalog > > > - Insert into the target_table select * from source_table /*+ > > > OPTIONS('scan.tag-name' = 'tag-1') */; > > > 2. What do you mean by > > > > > > > "The target table may need to synchronize Hive metadata, which means > > > using HiveCatalog, which cannot be solved by copying files." > > > > > > Does it means clone a paimon table to a hive table ? > > > > > > If clone a paimon table to another paimon table, can we use file copy > > > solution? > > > I guess it may be useful when we have to move the data between cluster. > > > > > > Best, > > > Atiozi > > > > > > Jingsong Li <[email protected]> 于2024年3月18日周一 13:30写道: > > > > > > > Hi devs, > > > > > > > > I have heard many times that there is a need to copy the entire table, > > > > and my advice to them is often to use file system file copying. > > > > > > > > But there are a few issues: > > > > 1. It is necessary to copy a large number of files, and it is likely > > > > that some files will be deleted due to ongoing work, resulting in > > > > copying failure. > > > > 2. The target table may need to synchronize Hive metadata, which means > > > > using HiveCatalog, which cannot be solved by copying files. > > > > > > > > So I suggest we have a clone procedure. [1] > > > > > > > > Also, welcome contributors to develop this PIP together, and I will > > > > help you review your code. > > > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Procedure > > > > > > > > Best, > > > > Jingsong > > > > > >
