Dmitry Lysnichenko created AMBARI-4324:
------------------------------------------
Summary: Server should rely on command reports when considering
tasks timed out
Key: AMBARI-4324
URL: https://issues.apache.org/jira/browse/AMBARI-4324
Project: Ambari
Issue Type: Improvement
Components: agent, controller
Affects Versions: 1.5.0
Reporter: Dmitry Lysnichenko
Assignee: Dmitry Lysnichenko
Fix For: 1.5.0
As of now, task timeout at server and timeout at agent are two different
mechanisms, that currently work independently and duplicate each other.
Such behaviour leads to strange scenario:
- cluster installation is started
- execution of some command exceeds timeout
- server considers this command and *all next* commands in request timed out.
This state is shown at UI as well.
- at the same time, agent considers currently executed command timed out an
kills it. After that, agent starts executing the next command in queue. If next
commands does not fail, agent sends COMPLETE status reports.
- server receives COMPLETE status reports and updates component status.
- if user clicks "Retry installation", only tasks for not installed components
are created.
- as a result, UI shows less tasks than user expects
Changes in scope of this jira:
add TIMEDOUT command status report type at agent. At the server side,
HostRoleStatus enum already has this status type. Modify server behaviour:
server considers a task timed out when it receives appropriate command report
from the agent. In this case, all task time tracking logic is consolidated at
agent. Doing that will simplify timeout handling for CustomCommands and
CustomActions.
Some issues may occur when agent host goes down and therefore does not send any
command reports. Server should have some handling for such case .
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)