[ https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801646#comment-17801646 ]
Vipul Thakur commented on IGNITE-21059: --------------------------------------- Hi [~zstan] Thank you for the observation. We have also observed a new exception related to striped pool : 2023-12-29 16:41:09.426 ERROR 1 --- [api.endpoint-22] b.b.EventProcessingErrorHandlerJmsSender : >>>>>>>>>>>>>>> Published error message ......EventProcessingErrorHandlerJmsSender .. *2023-12-29 16:41:09.569 WARN 1 --- [85b8d7f7-ntw27%] o.a.i.i.processors.pool.PoolProcessor : >>> Possible starvation in striped pool.* *Thread name: sys-stripe-0-#1%DIGITALAPI__PRIMARY_digiapi-eventprocessing-app-zone1-6685b8d7f7-ntw27%* Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_TX, topicOrd=20, ordered=false, timeout=0, skipOnTimeout=false, msg=TxLocksResponse [futId=2236, nearTxKeyLocks=HashMap {}, txKeys=null]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearLockResponse [pending=ArrayList [], miniId=1, dhtVers=GridCacheVersion[] [GridCacheVersion [topVer=312674347, order=1703970204663, nodeOrder=2, dataCenterId=0]], mappedVers=GridCacheVersion[] [GridCacheVersion [topVer=315266949, order=1703839756326, nodeOrder=2, dataCenterId=0]], clientRemapVer=null, compatibleRemapVer=false, super=GridDistributedLockResponse [futId=b9a9f75bc81-870cf83b-d2dd-4aa0-9d9f-bffdb8d46b1a, err=null, vals=ArrayList [BinaryObjectImpl [arr= true, ctx=false, start=0]], super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=315266949, order=1703839751829, nodeOrder=11, dataCenterId=0], commit PFB for detailed logs [^digiapi-eventprocessing-app-zone1-6685b8d7f7-ntw27.log] Could be it due to having too many read client that our write services are getting affected. Should we be trying to decrease the no read services? > We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running > cache operations > -------------------------------------------------------------------------------------------- > > Key: IGNITE-21059 > URL: https://issues.apache.org/jira/browse/IGNITE-21059 > Project: Ignite > Issue Type: Bug > Components: binary, clients > Affects Versions: 2.14 > Reporter: Vipul Thakur > Priority: Critical > Attachments: Ignite_server_logs.zip, cache-config-1.xml, > client-service.zip, digiapi-eventprocessing-app-zone1-6685b8d7f7-ntw27.log, > digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, > digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, > digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, > digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, > digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, > ignite-server-nohup-1.out, ignite-server-nohup.out, image.png, long_txn_.png, > nohup_12.out > > > We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in > production environment where cluster would go in hang state due to partition > map exchange. > Please find the below ticket which i created a while back for ignite 2.7.6 > https://issues.apache.org/jira/browse/IGNITE-13298 > So we migrated the apache ignite version to 2.14 and upgrade happened > smoothly but on the third day we could see cluster traffic dip again. > We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 > TB SDD. > PFB for the attached config.[I have added it as attachment for review] > I have also added the server logs from the same time when issue happened. > We have set txn timeout as well as socket timeout both at server and client > end for our write operations but seems like sometimes cluster goes into hang > state and all our get calls are stuck and slowly everything starts to freeze > our jms listener threads and every thread reaches a choked up state in > sometime. > Due to which our read services which does not even use txn to retrieve data > also starts to choke. Ultimately leading to end user traffic dip. > We were hoping product upgrade will help but that has not been the case till > now. > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)