[ https://issues.apache.org/jira/browse/AURORA-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bill Farner resolved AURORA-409. -------------------------------- Resolution: Cannot Reproduce > Executor exits with unacknowledged updates while the slave is down, resulting > in LOST tasks. > -------------------------------------------------------------------------------------------- > > Key: AURORA-409 > URL: https://issues.apache.org/jira/browse/AURORA-409 > Project: Aurora > Issue Type: Bug > Components: Executor > Reporter: brian wickman > > Originally filed by [~bmahler] > Currently, it appears as though Thermos will attempt to send status updates > while the slave is down. This is correct, as the executor driver will re-send > unacknowledged updates when the slave reconnects. > However, since Thermos does not wait for re-registered(), it's possible for > Thermos to exit before the slave reconnects and the driver flushes > unacknowledged updates. > To ensure updates are sent to the slave, Thermos must wait for reregistered() > before exiting, if disconnected() was called. That is, in between > disconnected() and re-registered(), Thermos must not send status updates and > exit if reliable status updates are desired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)