[ 
https://issues.apache.org/jira/browse/SLIDER-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830275#comment-15830275
 ] 

Billie Rinaldi commented on SLIDER-1188:
----------------------------------------

Upon further testing, this retry threshold is not doing what I thought it was. 
It is not capping the total number of retries, but a number of retries to 
perform before rereading AM data from the ZK registry and re-registering with 
the AM. I think I'll have to do more testing to figure out how to improve 
things on the agent side. We can go ahead with the AM improvements here.

> Make AM agent heartbeat loss configurable / increase defaults
> -------------------------------------------------------------
>
>                 Key: SLIDER-1188
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1188
>             Project: Slider
>          Issue Type: Bug
>            Reporter: Billie Rinaldi
>            Assignee: Billie Rinaldi
>         Attachments: SLIDER-1188.1.patch, SLIDER-1188.2.patch
>
>
> Currently containers are marked as lost after a couple of minutes, which is 
> too sensitive for a busy cluster. We should increase the defaults and make 
> the container timeout configurable. We may also want to increase the number 
> of times the agent will retry heartbeating to the AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to