Hi, Today I investigated failures in failover suite and found issue with near cache update. Now when near cache entry is initialized we store primary node id, and when value is requested from near cache entry we check that stored node is still primary (NearCacheEntry.valid()). Following scenario is possible (reproduces in our test): - there are two nodes A is primary, B is near - near cache entry is initialized on B, A is stored in near cache entry as primary - new node C joins grid and becomes new primary - values is updated from C, it is not aware about near reader B and value in near cache on B is not updated - node C leaves grid, A again becomes primary - value is requested from near cache entry on B, it sees that stored node A is still primary and returns outdated value
As a simple fix I changed GridNearCacheEntry to store current topology version at the moment when entry was initialized from primary, and method NearCacheEntry.valid() checks that topology version did not change. Assuming topology should not change often this fix should not impact near cache performance. The only case when topology can change often is usage of client nodes. When support for client nodes will be fully implemented we will need some way to check that cache affinity topology did not change. Thoughts?
