[GitHub] [iceberg] lichaohao opened a new issue, #6330: iceberg : format-version=2 , when the job is running (insert and update), can not execute rewrite small data file ?

GitBox Thu, 01 Dec 2022 01:14:50 -0800


lichaohao opened a new issue, #6330:
URL: https://github.com/apache/iceberg/issues/6330


   ### Query engine
   
   iceberg:1.0.0
   spark:3.2.0
   flink:1.13.2    
   catalog:hive-catalog
   
   ### Question
   
   iceberg:1.0.0
   spark:3.2.0
   flink:1.13.2    
   catalog:hive-catalog
   
   source table:mysql cdc table: mysql_cdc_source
   sink table:iceberg table:  my_iceberg_sink ==> primary key id 
,format-version=2, write.upsert.enabled=true
   
   execute sql: (checkpoint 1min)
     upsert into my_iceberg_sink select * from mysql_cdc_source;
     ps: mysql exists insert and update operation
   
   when the job is running  somet time, i want to rewrite the iceberg data file 
into bigger one, 【spark execute】 (call 
"hive_prod.system.rewrite_data_files(my_iceberg_sink)")
   
   get the following exception message:
   can not commit,found new position delete for replaced data file: 
GenericDataFile....hdfs://xxxxdata.parquet
   
   
   all above that what should i do can execute "call 
hive_prod.system.rewrite_data_files(my_iceberg_sink) " correctly? 
   Thank you for your answer！
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] lichaohao opened a new issue, #6330: iceberg : format-version=2 , when the job is running (insert and update), can not execute rewrite small data file ?

Reply via email to