[ https://issues.apache.org/jira/browse/GEODE-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607484#comment-16607484 ]
ASF subversion and git services commented on GEODE-4650: -------------------------------------------------------- Commit 52bd3fc63970e2929fc8df76a7621d5147a6e393 in geode's branch refs/heads/develop from [~balesh2] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=52bd3fc ] GEODE-4650: Refactor Elder selection (#2393) GEODE-4650: Resolve race condition in selection of the elder * no longer cache the elder, re-compute the elder when needed * extract elder logic to a new class to make unit testing possible * adds tests for elder selection * adds tests of DLock Grantor failover * removes isAdam() - isAdam used to mean that the member was alone (that there were no non-surprise, non-admin members in the cluster) when it joined. This was only used in two places. The first, in the DLockService, protected against recovering dlocks when there isn't a cluster. This usage is replaced with a check for isLoner(). The other use of isAdam was in ElderInitProcessor and was redundant with an inner check if there were other members in the distributed system. * fix testFairness so that it can be run repeatedly in the same JVM Signed-off-by: Dan Smith <dsm...@pivotal.io> Signed-off-by: Galen O'Sullivan <gosulli...@pivotal.io> Signed-off-by: Ken Howe <kh...@pivotal.io> > DLockService.clearGrantor can potentially hang > ---------------------------------------------- > > Key: GEODE-4650 > URL: https://issues.apache.org/jira/browse/GEODE-4650 > Project: Geode > Issue Type: Bug > Components: distributed lock service > Reporter: Jason Huynh > Assignee: Helena Bales > Priority: Major > Labels: pull-request-available, swat > Attachments: callstacks-2018-02-10-05-25-15.txt, > callstacks-2018-02-10-05-25-23.txt, callstacks-2018-02-10-05-25-30.txt > > Time Spent: 6h 20m > Remaining Estimate: 0h > > There was a test run in the precheckin pipeline that hung with the following > stack: > > {code:java} > "RMI TCP Connection(1)-172.17.0.3" #30 daemon prio=5 os_prio=0 > tid=0x00007f4560001800 nid=0x191 waiting on condition [0x00007f45771c0000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000e082d298> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853) > at > org.apache.geode.distributed.internal.locks.ElderInitProcessor.init(ElderInitProcessor.java:72) > at > org.apache.geode.distributed.internal.locks.ElderState.<init>(ElderState.java:56) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.getElderStateWithTryLock(ClusterDistributionManager.java:3359) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.getElderState(ClusterDistributionManager.java:3309) > at > org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.startElderCall(GrantorRequestProcessor.java:238) > at > org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:347) > at > org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:327) > at > org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.clearGrantor(GrantorRequestProcessor.java:318) > at > org.apache.geode.distributed.internal.locks.DLockService.clearGrantor(DLockService.java:872) > at > org.apache.geode.distributed.internal.locks.DLockGrantor.destroy(DLockGrantor.java:1227) > - locked <0x00000000e0837ff0> (a > org.apache.geode.distributed.internal.locks.DLockGrantor) > at > org.apache.geode.distributed.internal.locks.DLockService.nullLockGrantorId(DLockService.java:646) > at > org.apache.geode.distributed.internal.locks.DLockService.basicDestroy(DLockService.java:2358) > at > org.apache.geode.distributed.internal.locks.DLockService.destroyAndRemove(DLockService.java:2276) > - locked <0x00000000e05c7468> (a java.lang.Object) > at > org.apache.geode.distributed.internal.locks.DLockService.destroyServiceNamed(DLockService.java:2214) > at > org.apache.geode.distributed.DistributedLockService.destroy(DistributedLockService.java:84) > at > org.apache.geode.internal.cache.GemFireCacheImpl.destroyGatewaySenderLockService(GemFireCacheImpl.java:2043) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2180) > - locked <0x00000000e04653e0> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1960) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1950) > at > org.apache.geode.test.junit.rules.ServerStarterRule.stopMember(ServerStarterRule.java:99) > at > org.apache.geode.test.junit.rules.MemberStarterRule.after(MemberStarterRule.java:81) > at > org.apache.geode.test.dunit.rules.ClusterStartupRule.stopElementInsideVM(ClusterStartupRule.java:412) > at > org.apache.geode.test.junit.rules.VMProvider.lambda$stopVM$fe0d42dc$1(VMProvider.java:35) > at > org.apache.geode.test.junit.rules.VMProvider$$Lambda$53/208982926.run(Unknown > Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at hydra.MethExecutor.executeObject(MethExecutor.java:244) > at > org.apache.geode.test.dunit.standalone.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:70) > at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$7/1394836008.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: > - <0x00000000e0332230> (a java.util.concurrent.ThreadPoolExecutor$Worker) > - <0x00000000e08499b0> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > - <0x00000000e08520f0> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > {code} > It looks like the cache is shutting down and we are unable to destroy the > lock service for the gateway sender. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)