[
https://issues.apache.org/jira/browse/TEZ-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059303#comment-14059303
]
Hitesh Shah commented on TEZ-1273:
----------------------------------
[~zjffdu] There are a couple of aspects to consider:
- when should the AM unregister with the RM?
- when should the AM do cleanup of its staging data/tmp resources?
- when should the AM clean up DAG data of a completed/killed DAG?
- what is the state flow when the AM receives a SIGTERM/kill signal? Do all
signals translate into shutdowns?
Other comments:
- AM_REBOOT can be received at any point after the rm heartbeat service
comes up.
- Does a failure in recovery count as internal error?
- Where does a dag submission fit in? Is it a state transition or just a
state check? How do you plan to handle multiple concurrent dag submissions if
its represented into a state transition event?
Also, any thoughts on how can we capture session mode in the state machine
itself so that we do not need isSession checks all over the place?
> Refactor DAGAppMaster to state machine based
> --------------------------------------------
>
> Key: TEZ-1273
> URL: https://issues.apache.org/jira/browse/TEZ-1273
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.4.0
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: dag_app_master.pdf
>
>
> Almost all our entities (Vertex, Task etc) are state machine based and
> written using a formal state machine. But DAGAppMaster is not written on a
> formal state machine even though it has a state machine based behavior. This
> jira is for refactoring it into state machine based
--
This message was sent by Atlassian JIRA
(v6.2#6252)