lirui-apache commented on issue #10006:
URL: https://github.com/apache/iceberg/issues/10006#issuecomment-2011212269
@szehon-ho @pvary I'm using this test case to demonstrate the issue:
```java
final int parallel = 2;
ThreadPoolExecutor pool = (ThreadPoolExecutor)
Executors.newFixedThreadPool(
parallel,
new
ThreadFactoryBuilder().setDaemon(true).setNameFormat("committer-%d").build());
pool.prestartAllCoreThreads();
try {
List<Future<Void>> futures = Lists.newArrayListWithCapacity(parallel);
futures.add(pool.submit(() -> {
Table table = catalog.loadTable(tableId);
table.updateProperties().set(HIVE_LOCK_ENABLED, "false").commit();
return null;
}));
futures.add(pool.submit(() -> {
Table table = catalog.loadTable(tableId);
table.updateProperties().set(GC_ENABLED, "false").commit();
return null;
}));
for(Future<Void> future : futures) {
future.get();
}
Table table = catalog.loadTable(tableId);
assertEquals("false", table.properties().get(HIVE_LOCK_ENABLED));
assertEquals("false", table.properties().get(GC_ENABLED));
}
```
There's a chance the test fails because HIVE_LOCK_ENABLED was not set. BTW
the test connects to a local standalone HMS that has the required changes in
Hive.
Let's call the threads setting HIVE_LOCK_ENABLED and GC_ENABLED `thread-1`
and `thread-2` respectively. I think the issue happens like this:
1. `thread-1` decides to use NoLock because its new table properties have
`(HIVE_LOCK_ENABLED, "false")`
2. `thread-2` decides to use MetastoreLock
3. `thread-2` acquires the HMS lock and verifies the metadata location has
not been changed
4. `thread-1` made the commit via the no-lock path
5. `thread-2` made the commit, and the change made by `thread-1` is lost
So my question is should we use the old table metadata (i.e. the `base`) to
decide which lock to use? So that both threads will use MetastoreLock in the
above example. Or do you think this use case is a known limitation and should
be avoided in the first place?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]