On Oct 26, 2011, at 9:46 AM, Dan Berindei wrote:

> Hi Galder, sorry it took so long to reply.
> 
> On Mon, Oct 24, 2011 at 4:16 PM, Galder Zamarreño <gal...@redhat.com> wrote:
>> Btw, forgot to attach the log:
>> 
>> 
>> 
>> 
>> On Oct 24, 2011, at 3:13 PM, Galder Zamarreño wrote:
>> 
>>> Hi Dan,
>>> 
>>> Re: http://goo.gl/TGwrP
>>> 
>>> There's a few of this in the Hot Rod server+client testsuites. It's easy to 
>>> replicate it locally. Seems like cache operations right after a cache has 
>>> started are rather problematic.
>>> 
>>> In local execution of HotRodReplicationTest, I was able to replicate the 
>>> issue when trying to test topology changes. Please find attached the log 
>>> file, but here're the interesting bits:
>>> 
>>> 1. A new view installation is being prepared with NodeA and NodeB:
>>> 2011-10-24 14:36:09,046 4221  TRACE 
>>> [org.infinispan.cacheviews.CacheViewsManagerImpl] 
>>> (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) 
>>> ___hotRodTopologyCache: Preparing cache view CacheView{viewId=4, 
>>> members=[NodeA-63227, NodeB-15806]}, committed view is CacheView{viewId=3, 
>>> members=[NodeA-63227, NodeB-15806, NodeC-17654]}
>>> …
>>> 2011-10-24 14:36:09,047 4222  DEBUG 
>>> [org.infinispan.statetransfer.StateTransferLockImpl] 
>>> (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Blocking new 
>>> transactions
>>> 2011-10-24 14:36:09,047 4222  TRACE 
>>> [org.infinispan.statetransfer.StateTransferLockImpl] 
>>> (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Acquiring 
>>> exclusive state transfer shared lock, shared holders: 0
>>> 2011-10-24 14:36:09,047 4222  TRACE 
>>> [org.infinispan.statetransfer.StateTransferLockImpl] 
>>> (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Acquired 
>>> state transfer lock in exclusive mode
>>> 
>>> 2. The cluster coordinator discovers a view change and requests NodeA and 
>>> NodeB to remove NodeC from the topology view:
>>> 2011-10-24 14:36:09,048 4223  TRACE 
>>> [org.infinispan.interceptors.InvocationContextInterceptor] 
>>> (OOB-3,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Invoked with 
>>> command RemoveCommand{key=NodeC-17654, value=null, flags=null} and 
>>> InvocationContext [NonTxInvocationContext{flags=null}]
>>> 
>>> 3. NodeB has not yet finished installing the cache view, so that remove 
>>> times out:
>>> 2011-10-24 14:36:09,049 4224  ERROR 
>>> [org.infinispan.interceptors.InvocationContextInterceptor] 
>>> (OOB-3,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) ISPN000136: 
>>> Execution error
>>> org.infinispan.distribution.RehashInProgressException: Timed out waiting 
>>> for the transaction lock
>>> 
>>> A way to solve this is to avoid relying on cluster view changes, but 
>>> instead wait for the cache view to be installed, and then do the operations 
>>> then. Is there any way to wait till then?
>>> 
>>> One way would be to have some CacheView installed callbacks or similar. 
>>> This could be a good option cos I could have a CacheView listener for the 
>>> hot rod topology cache whose callbacks I can check for isPre=false and then 
>>> do the cache ops safely.
>>> 
> 
> Initially I was thinking of allowing multiple cache view listeners for
> each cache and making StateTransferManager one of them but I decided
> against it because I realized it needs a different interface than our
> regular listeners. I know that it was only a matter of time until
> someone needed it...
> 
> An alternative solution would be to retry all operations, like we do
> with commits now, when we receive a RehashInProgressException
> exception from the remote node. That's what I was planning to do first
> as it helps in other use cases as well.

Ok, do you have time to include this today ahead of the BETA3 release? 

I think this is a very important fix cos as you can see in the testsuite, it's 
very easy to get this error with Hot Rod servers.

> 
>>> Otherwise, code like this the one I used for keeping the Hot Rod topology 
>>> is gonna be racing against your cache view installation code.
>>> 
>>> You seem to have some pieces in place for this, i.e. CacheViewListener, but 
>>> it seems only designed for internal core/ work.
>>> 
>>> Any other suggestions?
>>> 
>>> Cheers,
>>> --
>>> Galder Zamarreño
>>> Sr. Software Engineer
>>> Infinispan, JBoss Cache
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Galder Zamarreño
>> Sr. Software Engineer
>> Infinispan, JBoss Cache
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to