[ https://issues.apache.org/jira/browse/IGNITE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590264#comment-16590264 ]
Ilya Kasnacheev commented on IGNITE-9068: ----------------------------------------- I don't really have ideas how to make it more correct. This test attains same thread dump that was observed on a problematic node. > Node fails to stop when CacheObjectBinaryProcessor.addMeta() is executed > inside guard()/unguard() > ------------------------------------------------------------------------------------------------- > > Key: IGNITE-9068 > URL: https://issues.apache.org/jira/browse/IGNITE-9068 > Project: Ignite > Issue Type: Bug > Components: binary, managed services > Affects Versions: 2.5 > Reporter: Ilya Kasnacheev > Assignee: Ilya Lantukh > Priority: Blocker > Labels: test > Fix For: 2.7 > > Attachments: GridServiceDeadlockTest.java, MyService.java > > > When addMeta is called in e.g. service deployment it us executed inside > guard()/unguard() > If node will be stopped at this point, Ignite.stop() will hang. > Consider the following thread dump: > {code} > "Thread-1" #57 prio=5 os_prio=0 tid=0x00007f7780005000 nid=0x7f26 runnable > [0x00007f766cbef000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000005cb7b0468> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) > at > org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.tryLock(StripedCompositeReadWriteLock.java:220) > at > org.apache.ignite.internal.GridKernalGatewayImpl.tryWriteLock(GridKernalGatewayImpl.java:143) > // Waiting for lock to cancel futures of BinaryMetadataTransport > at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2171) > at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2094) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2545) > - locked <0x00000005cb423f00> (a > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2508) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.run(IgnitionEx.java:2033) > "test-runner-#1%service.GridServiceDeadlockTest%" #13 prio=5 os_prio=0 > tid=0x00007f77b87d5800 nid=0x7eb8 waiting on condition [0x00007f778cdfc000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > // May never return if there's discovery problems > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:463) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$2.addMeta(CacheObjectBinaryProcessorImpl.java:188) > at > org.apache.ignite.internal.binary.BinaryContext.registerUserClassDescriptor(BinaryContext.java:802) > at > org.apache.ignite.internal.binary.BinaryContext.registerClassDescriptor(BinaryContext.java:761) > at > org.apache.ignite.internal.binary.BinaryContext.descriptorForClass(BinaryContext.java:627) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:174) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:157) > at > org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:144) > at > org.apache.ignite.internal.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:254) > at > org.apache.ignite.internal.binary.BinaryMarshaller.marshal0(BinaryMarshaller.java:82) > at > org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:58) > at > org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10069) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.prepareServiceConfigurations(GridServiceProcessor.java:570) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:622) > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployAll(GridServiceProcessor.java:610) > // Lock held here: > at > org.apache.ignite.internal.processors.service.GridServiceProcessor.deployMultiple(GridServiceProcessor.java:498) > at > org.apache.ignite.internal.IgniteServicesImpl.deployMultiple(IgniteServicesImpl.java:153) > at > org.apache.ignite.internal.processors.service.GridServiceDeadlockTest.testMetadataDeadlock(GridServiceDeadlockTest.java:48) > {code} > It seems that waiting for futures inside addMeta (and for that matter inside > guard()/unguard() for service deploy) is not safe. Some kind of continuation > / compound future is desired here. > I am attaching a reproducing test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)