wzx140 commented on issue #10493:
URL: https://github.com/apache/iceberg/issues/10493#issuecomment-3744162191
@singhpk234 @findepi Here's a concrete case that illustrates this race
condition: In a Spark task, I execute insert overwrite table followed
immediately by select table. If an asynchronous Spark listener calls loadTable
during the insert overwrite commit process, the select query will inevitably
return stale data, which would be very confusing to users. Here's the timeline:
```
1. spark_write calls loadTable for table^1 with snapshot1 ⇒
cache.put(table^1[snapshot1])
2. Time passes, table^1 expires in cache ⇒ cache.expire(table^1[snapshot1])
3. Listener calls loadTable for table^2 with snapshot1 ⇒
cache.put(table^2[snapshot1])
4. spark_write commits using table^1, advancing to snapshot2
5. spark_scan calls loadTable ⇒ cache.get() returns table^2[snapshot1] ⚠️
The snapshot is stale!
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]