[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

Vipul Thakur (Jira) Tue, 12 Dec 2023 09:33:45 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795861#comment-17795861
 ]


Vipul Thakur edited comment on IGNITE-21059 at 12/12/23 5:32 PM:
-----------------------------------------------------------------

We have daily requirement of 90-120 millions request for read and around 15-20 
millions write requests

current values : 

failureDetectionTimeout=120000

clientFailureDetectionTimeout= 120000

What would be the suggested value should bring this closer to what 
socketTimeout is like 5secs and should these configuration be same at both 
server and client end?


was (Author: vipul.thakur):
We have daily requirement of 90-120 millions request for read and around 15-20 
millions 

current values : 

failureDetectionTimeout=120000

clientFailureDetectionTimeout= 120000

What would be the suggested value should bring this closer to what 
socketTimeout is like 5secs and should these configuration be same at both 
server and client end?

> We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running 
> cache operations
> --------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-21059
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21059
>             Project: Ignite
>          Issue Type: Bug
>          Components: binary, clients
>    Affects Versions: 2.14
>            Reporter: Vipul Thakur
>            Priority: Critical
>         Attachments: cache-config-1.xml, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2, 
> digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1, 
> digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2, 
> ignite-server-nohup.out
>
>
> We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in 
> production environment where cluster would go in hang state due to partition 
> map exchange.
> Please find the below ticket which i created a while back for ignite 2.7.6
> https://issues.apache.org/jira/browse/IGNITE-13298
> So we migrated the apache ignite version to 2.14 and upgrade happened 
> smoothly but on the third day we could see cluster traffic dip again. 
> We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 
> TB SDD.
> PFB for the attached config.[I have added it as attachment for review]
> I have also added the server logs from the same time when issue happened.
> We have set txn timeout as well as socket timeout both at server and client 
> end for our write operations but seems like sometimes cluster goes into hang 
> state and all our get calls are stuck and slowly everything starts to freeze 
> our jms listener threads and every thread reaches a choked up state in 
> sometime.
> Due to which our read services which does not even use txn to retrieve data 
> also starts to choke. Ultimately leading to end user traffic dip.
> We were hoping product upgrade will help but that has not been the case till 
> now. 
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IGNITE-21059) We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

Reply via email to