[ https://issues.apache.org/jira/browse/MESOS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940074#comment-14940074 ]
haosdent commented on MESOS-3479: --------------------------------- [~gabriel.hartm...@gmail.com] I want to understand your problem first. Please correct me if my understanding is wrong. {code} Then the 4th attempt takes more than 60s to eventually fail/timeout. While it's running a 5th attempt is started (it succeeds). {code} This is an unexpected behaviour. According my understanding of current code, the 4th attempt timeout after 60s and the health check would exit. Could you provide task stdout/stderr for this? {quote} All this occurs before expiration of the grace period. The 5th attempt is the last attempt. No more health checks are made. Marathon never receives a health check report. {quote} This is an expected behaviour. "gracePeriodSeconds" is mean when error happens in this interval since healthCheck launch, all failed would be ignore and don't send unhealthy message to framework. And because 4th is timeout, so the health check exit and then "No more health checks are made". > COMMAND Health Checks are not executed if the timeout is exceeded > ----------------------------------------------------------------- > > Key: MESOS-3479 > URL: https://issues.apache.org/jira/browse/MESOS-3479 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.23.0 > Reporter: Matthias Veit > Assignee: haosdent > Priority: Critical > > The issue first appeared as Marathon Bug: See here for reference: > https://github.com/mesosphere/marathon/issues/2179. > A COMMAND health check is defined with a timeout of 20 seconds. > The command itself takes longer than 20 seconds to execute. > Current behavior: > - The mesos health check process get's killed, but the defined command > process not (in the example the curl command returns after 21 seconds). > - The check attempt is considered healthy, if the timeout is exceeded > - The health check stops and is not executed any longer > Expected behavior: > - The defined health check command is killed, when the timeout is exceeded > - The check attempt is considered Unhealthy, if the timeout is exceeded > - The health check does not stop -- This message was sent by Atlassian JIRA (v6.3.4#6332)