[ 
https://issues.apache.org/jira/browse/IMPALA-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741698#comment-16741698
 ] 

Paul Rogers commented on IMPALA-7534:
-------------------------------------

Examined {{CatalogdMetaProvider.loadWithCaching()}}, which seems the be the 
primary area of concern for this ticket.

Simulated the behavior of the load/invalidate pair with 10 loading threads and 
1 invalidate thread each doing a random load (or invalidate) of 20 simulated 
catalog objects. The result show that, while the load/invalidate are mutually 
exclusive, they do work as expected. If I force 500 invalidates, then there are 
500 new loads. Bottom line: invalidates do not cause the loss of a load.

The next issue is the one identified in this bug: is there a problem with a 
simultaneous load/invalidate? The description assumes that the load is for the 
old version. But, it could just as well be for the new version.

It appears we make order guarantees only when the actions are done by a single 
session. Run a query, invalidate, run another query and the events have a 
guaranteed order.

But, do the same events in two sessions (one runs queries, the other 
invalidates) and there is no guarantee of ordering. Due to delays in each 
session, the invalidate might arrive a few ms before or a few ms after the 
query (which loads metadata into the cache.)

That is, it would be impossible to test the scenario outlines because there are 
no ordering guarantees to build upon.

It is hard to prove, but I suspect that, as long as cache entires have version 
number and are immutable, we get the ordering guarantees we need.

> Handle invalidation races in CatalogdMetaProvider cache
> -------------------------------------------------------
>
>                 Key: IMPALA-7534
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7534
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Assignee: Paul Rogers
>            Priority: Major
>
> There is a well-known race in Guava's LoadingCache that we are using for 
> CatalogdMetaProvider which we are not currently handling:
> - thread 1 gets a cache miss and makes a request to fetch some data from the 
> catalogd. It fetches the catalog object with version 1 and then gets context 
> switched out or otherwise slow
> - thread 2 receives an invalidation for the same object, because it has 
> changed to v2. It calls 'invalidate' on the cache, but nothing is yet cached.
> - thread 1 puts back v1 of the object into the cache
> In essence we've "missed" an invalidation. This is also described in this 
> nice post: https://softwaremill.com/race-condition-cache-guava-caffeine/
> The race is quite unlikely but could cause some unexpected results that are 
> hard to reason about, so we should look into a fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to