[ https://issues.apache.org/jira/browse/SLIDER-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830275#comment-15830275 ]
Billie Rinaldi commented on SLIDER-1188: ---------------------------------------- Upon further testing, this retry threshold is not doing what I thought it was. It is not capping the total number of retries, but a number of retries to perform before rereading AM data from the ZK registry and re-registering with the AM. I think I'll have to do more testing to figure out how to improve things on the agent side. We can go ahead with the AM improvements here. > Make AM agent heartbeat loss configurable / increase defaults > ------------------------------------------------------------- > > Key: SLIDER-1188 > URL: https://issues.apache.org/jira/browse/SLIDER-1188 > Project: Slider > Issue Type: Bug > Reporter: Billie Rinaldi > Assignee: Billie Rinaldi > Attachments: SLIDER-1188.1.patch, SLIDER-1188.2.patch > > > Currently containers are marked as lost after a couple of minutes, which is > too sensitive for a busy cluster. We should increase the defaults and make > the container timeout configurable. We may also want to increase the number > of times the agent will retry heartbeating to the AM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)