Re: [I] Concern about possible consistency issue in HiveCatalog's _commit_table [iceberg-python]

2024-04-15 Thread via GitHub


Fokko closed issue #588: Concern about possible consistency issue in 
HiveCatalog's _commit_table
URL: https://github.com/apache/iceberg-python/issues/588


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Concern about possible consistency issue in HiveCatalog's _commit_table [iceberg-python]

2024-04-15 Thread via GitHub


Fokko commented on issue #588:
URL: https://github.com/apache/iceberg-python/issues/588#issuecomment-2056289982

   @HonahX Thanks for spotting this, and I agree that we should include the 
refreshing and updating of the metadata in the transaction. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Concern about possible consistency issue in HiveCatalog's _commit_table [iceberg-python]

2024-04-06 Thread via GitHub


HonahX opened a new issue, #588:
URL: https://github.com/apache/iceberg-python/issues/588

   ### Question
   
   Currently, the HiveCatalog's `_commit_table` workflow looks like:
   
   1. load current table metadata via `load_table`
   2. construct updated metadata
   3. lock the hive table
   4. alter the hive table
   5. unlock the hive table
   
   Suppose now there are 2 process, A and B try to commit some changes to the 
same iceberg table It is possible that the code execution happens to be in the 
following order:
   
   1. process A load current table metadata
   2. process A construct updated metadata
   3. process B starts and finishes the **whole** `_commit_table`
   4. process A lock the hive table
   5. process A alter the hive table
   6. process A unlock the hive table
   
   In this specific scenario, both processes successfully commit their changes 
because process B releases the lock before A tries to acquire. But if the 
`alter_table` does not support [transactional 
check](https://issues.apache.org/jira/browse/HIVE-26882), the changes made by 
process B will be overridden. 
   
   Since in python we do not know which Hive version we are connecting to, I 
wonder if we need to update the code to lock the table before loading current 
table metadata, like what [Java 
implementation](https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L184)
 does.
   
   BTW, it seems there are some consistency issue of 
https://issues.apache.org/jira/browse/HIVE-26882 as well and there is an open 
fix for that https://github.com/apache/hive/pull/5129
   
   Please correct me if I misunderstand something here. Thanks!
   
   cc: @Fokko 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org