Hi All ,


We have observed a peculiar case in which ignite shutdown gets stuck
indefinitely



We have deployed ignite 2  nodes in embedded mode.

   1. On Node1 , some cache operations are performed which triggers
   BinaryMetadata Transfer/sync up with other node
      1. This wait is indefinite and there is no timeout here
      2. As this is a cache put , one write lock also gets acquired.
   2. As node1 is waiting for response from node2 , Node1 gets segmented
   and loses connectivity with node2
      1. Due to this , Node1 never gets expected response from node2 and
      does not come out of the waiting
   3. On node segmentation , we are closing Ignite , but as some thread are
   still performing cache operation and are stuck at binary metadata transfer
   , Ignite is not able to close
      1. Ignite close is waiting for the write lock acquired in step 1 to
      get released ,which will not happen in this case.



Following is the thread dump



     java.lang.Thread.State: WAITING (parking)   -> <<THREAD WAITING FOR
BINARY METADATA TRANSFER>>

               at jdk.internal.misc.Unsafe.park(java.base@11.0.19/Native
 Method)

               at java.util.concurrent.locks.LockSupport.park(
java.base@11.0.19/LockSupport.java:323)

               at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:179)

               at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:142)

               at
org.apache.ignite.internal.processors.cache.binary.BinaryMetadataTransport.putAndWaitPendingUpdate(BinaryMetadataTransport.java:281)

               at
org.apache.ignite.internal.processors.cache.binary.BinaryMetadataTransport.requestMetadataUpdate(BinaryMetadataTransport.java:221)

               at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.addMeta(CacheObjectBinaryProcessorImpl.java:638)

               at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.addMeta(CacheObjectBinaryProcessorImpl.java:292)

               at
org.apache.ignite.internal.binary.BinaryContext.updateMetadata(BinaryContext.java:1337)

               at
org.apache.ignite.internal.binary.BinaryClassDescriptor.write(BinaryClassDescriptor.java:862)

               at
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal0(BinaryWriterExImpl.java:232)

               at
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:165)

               at
org.apache.ignite.internal.binary.BinaryWriterExImpl.marshal(BinaryWriterExImpl.java:152)

               at
org.apache.ignite.internal.binary.GridBinaryMarshaller.marshal(GridBinaryMarshaller.java:251)

               at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.marshalToBinary(CacheObjectBinaryProcessorImpl.java:583)

               at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.toBinary(CacheObjectBinaryProcessorImpl.java:1492)

               at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.toCacheObject(CacheObjectBinaryProcessorImpl.java:1329)

               at
org.apache.ignite.internal.processors.cache.GridCacheContext.toCacheObject(GridCacheContext.java:1822)

               at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.enlistWriteEntry(GridNearTxLocal.java:1546)

               at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.enlistWrite(GridNearTxLocal.java:1083)

               at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync0(GridNearTxLocal.java:635)

               at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync(GridNearTxLocal.java:484)

               at
org.apache.ignite.internal.processors.cache.GridCacheAdapter$20.op(GridCacheAdapter.java:2511)

               at
org.apache.ignite.internal.processors.cache.GridCacheAdapter$20.op(GridCacheAdapter.java:2509)

               at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4284)

               at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put0(GridCacheAdapter.java:2509)

               at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2487)

               at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2466)

               at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1332)

               at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:867)



       java.lang.Thread.State: TIMED_WAITING (sleeping)  <<IGNITE WAITING
TO BE STOPPED>>

               at java.lang.Thread.sleep(java.base@11.0.19/Native Method)

               at
org.apache.ignite.internal.util.IgniteUtils.sleep(IgniteUtils.java:8270)

               at
org.apache.ignite.internal.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:324)

               at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.blockGateways(GridCacheProcessor.java:806)

               at
org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1916)

               at
org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1806)

               at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2340)

               - locked <0x0000000685a014c0> (a
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)

               at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2163)

               at
org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:351)

               at org.apache.ignite.Ignition.stop(Ignition.java:230)

               at
org.apache.ignite.internal.IgniteKernal.close(IgniteKernal.java:2776)

               at
org.apache.ignite.cache.CacheManager.close(CacheManager.java:411)



*I wanted to know if this analysis is correct and what are the
alternative/workaround/configuration that I can use to avoid this issue.*


-- 
Thanks and Regard
Atul Dhatrak

Reply via email to