[ https://issues.apache.org/jira/browse/IGNITE-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164776#comment-17164776 ]
Vipul Thakur commented on IGNITE-13298: --------------------------------------- cluster memory config/persistence is in environment section at top. > Found long running cache at client end > --------------------------------------- > > Key: IGNITE-13298 > URL: https://issues.apache.org/jira/browse/IGNITE-13298 > Project: Ignite > Issue Type: Task > Affects Versions: 2.7.6 > Environment: ========cluster memory > config/persistence================ > <property name="gridLogger"> <property name="gridLogger"> <bean > class="org.apache.ignite.logger.log4j2.Log4J2Logger"> > <constructor-arg type="java.lang.String" > value="${IGNITE_SCRIPT}/ignite-log4j2.xml" /> </bean> > </property> <property name="dataStorageConfiguration"> > <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> > <property name="defaultDataRegionConfiguration"> > <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> > <property name="metricsEnabled" value="true"/> > <property name="persistenceEnabled" value="true" /> > <!--<property name="maxSize" value="#\{10L * 1024 * 1024 * 1024}"/> > --> <property name="maxSize" value="400Gb" /> > <!-- Increasing the buffer size to 4 GB. --> > <property name="checkpointPageBufferSize" > value="${checkpointPageBufferSize}" /> </bean> > </property> <property name="storagePath" > value="${storagePath}" /> <property name="walPath" > value="${walPath}" /> <property name="walArchivePath" > value="${walArchivePath}" /> <property name="walMode" > value="LOG_ONLY" /> <property name="pageSize" > value="${pageSize}" /> <!-- Enable write throttling. --> > <property name="writeThrottlingEnabled" value="true" /> > <property name="walHistorySize" value="1" /> <property > name="metricsEnabled" value="true"/> </bean> </property> > ==================Client thread dump =========================== > 2020-07-20 12:14:432020-07-20 12:14:43Full thread dump Java HotSpot(TM) > 64-Bit Server VM (25.211-b12 mixed mode): > "Attach Listener" #788 daemon prio=9 os_prio=0 tid=0x00007fe7f4001000 > nid=0x32d waiting on condition [0x0000000000000000] java.lang.Thread.State: > RUNNABLE > Locked ownable synchronizers: - None > "Context_6_jms_314_ConsumerDispatcher" #787 daemon prio=5 os_prio=0 > tid=0x00007fe6e805e000 nid=0x31a waiting on condition [0x00007fe2e5bdd000] > java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native > Method) - parking to wait for <0x00000000cb87d9d0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) > at > com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) > at > com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: - None > "DefaultMessageListenerContainer-35" #786 prio=5 os_prio=0 > tid=0x00007fe460013800 nid=0x319 in Object.wait() [0x00007fe2e5cde000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) at > java.lang.Object.wait(Native Method) at > com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) > at > com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845) > - locked <0x00000000cb8cce50> (a > com.solacesystems.jcsmp.impl.XMLMessageQueueList) at > com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) > at > org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86) > at > org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132) > at > org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418) > at > org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303) > at > org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: - None > "Context_4_jms_313_ConsumerDispatcher" #785 daemon prio=5 os_prio=0 > tid=0x00007fe6f8028000 nid=0x318 waiting on condition [0x00007fe2e5ddf000] > java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native > Method) - parking to wait for <0x00000000cb8cf8d0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) > at > com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) > at > com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: - None > "DefaultMessageListenerContainer-27" #784 prio=5 os_prio=0 > tid=0x00007fe45800f800 nid=0x317 in Object.wait() [0x00007fe2e5ee0000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) at > java.lang.Object.wait(Native Method) at > com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) > at > com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845) > - locked <0x00000000cb8cffc8> (a > com.solacesystems.jcsmp.impl.XMLMessageQueueList) at > com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) > at > org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86) > at > org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132) > at > org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418) > at > org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303) > at > org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: - None > "Context_6_jms_312_ConsumerDispatcher" #780 daemon prio=5 os_prio=0 > tid=0x00007fe6e805c800 nid=0x313 waiting on condition [0x00007fe2e62e4000] > java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native > Method) - parking to wait for <0x00000000cb751ad0> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) > at > com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) > at > com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: - None > "DefaultMessageListenerContainer-34" #779 prio=5 os_prio=0 > tid=0x00007fe450003800 nid=0x312 waiting on condition [0x00007fe2e63e5000] > java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native > Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.get0(GridCacheAdapter.java:4723) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4697) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1415) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:928) > at > org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:640) > at > com.jio.digitalapi.cacheservice.client.impl.DigitalApiIgniteCache.get(DigitalApiIgniteCache.java:87) > at > com.jio.digitalapi.eventprocessing.service.dataservice.EventManagementApacheIgniteDataService.getCustomerEntity(EventManagementApacheIgniteDataService.java:101) > at > com.jio.digitalapi.ep.dataservice.impl.AbstractEventManagementDataService.getCustomer(AbstractEventManagementDataService.java:38) > at > com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.getCustomerEntity(AbstractMessageEventActionProcessor.java:154) > at > com.jio.digitalapi.eventprocessing.service.event.action.processor.PrimeMemberUpdateEventActionProcessor.processEvent(PrimeMemberUpdateEventActionProcessor.java:31) > at > com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.processMessageEvent(AbstractMessageEventActionProcessor.java:112) > at > com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventHandler.processMessage(AbstractMessageEventHandler.java:66) > at > com.jio.digitalapi.platform.core.messaging.jms.receiver.DigitalApiAsyncJmsMessageReceiver.onMessage(DigitalApiAsyncJmsMessageReceiver.java:106) > at > org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:761) > at > org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:699) > at > org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:674) > at > org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:318) > at > org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) > at > org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) > at java.lang.Thread.run(Thread.java:748) > Locked ownable synchronizers: - None > > > > > Reporter: Vipul Thakur > Priority: Blocker > Attachments: Ignite_10.143.75.24_threaddump.txt, > Ignite_10.143.75.24_threaddump_1.txt, Ignite_10.143.75.24_threaddump_2.txt, > Ignite_10.143.75.24_threaddump_3.txt, Ignite_10.143.75.24_threaddump_4.txt > > > Hi > We have a ignite cluster with four nodes.(each having a memory of 400Gb). > After deploying the cluster and clients in an environment after nearly 2 > months our cluster gets hung up, initially few clients get stuck with some > processing pending, and then after some duration everything gets stuck. > So after restarting everything(clients and cluster both) it works > fine(process around 1 crore of records in 10-15 minutes involving creation of > data and even updating the data. > > We are using transactions for all the caches, for create and update and no > transaction for get calls. > We have already faced this issue twice in a span of 4-5 months of deployment. > I am attaching the cluster thread dump and client thread dump. > I have seen the *Found long running caches* with one ticket already in Jira > and moved to 2.8.1, so is that the solution(please confirm). > 2020-06-04 20:05:55.889 WARN 1 --- [c7fd8b84-d8sdl%] > org.apache.ignite.internal.diagnostic : Found long running cache future > [startTime=19:59:30.288, curTime=20:05:55.882, > fut=GridPartitionedSingleGetFuture [topVer=AffinityTopologyVersion > [topVer=10, minorTopVer=0], key=UserKeyCacheObjectImpl [part=105, > val=7701112105, hasValBytes=true], readThrough=true, forcePrimary=false, > futId=681978f7271-55010ba8-d8d5-475f-97be-ff1c1916cea1, trackable=true, > subjId=59d5e3cf-d09c-44d3-82d6-84dd35b64e10, taskName=null, > deserializeBinary=true, skipVals=false, expiryPlc=null, canRemap=true, > needVer=false, keepCacheObjects=false, recovery=false, node=TcpDiscoveryNode > [id=14718baa-35e7-4d61-bde8-1e9c61978e8f, addrs=[10.135.34.67, 127.0.0.1], > sockAddrs=[/10.135.34.67:47500, /127.0.0.1:47500], discPort=47500, order=1, > intOrder=1, lastExchangeTime=1591277613188, loc=false, > ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], postProcessingClos=null]] > > But this issue i have observed to come up in our scenario in environment also > coming without any load or huge traffic.(my cluster just had 100 mb data). > > We dont have any transaction timeout set as of now , should we go for that. > > > Thanks > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)