[ 
https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978739#comment-16978739
 ] 

Ahmed Hussein commented on TEZ-4067:
------------------------------------

[~jeagles], I tried to refresh my memory a little bit. There was check on the 
service state to prevent starting the service more than once.

The workflow of the {{DAGAppMaster}} works as follow and correct me if I a 
wrong:

* {{DAGAppMaster}} is created
* Services get initialized. this is the phase when the services are added to 
the "{{DAGAppMaster.services}}" map.
* all the services are started inside {{serviceStart.startServices()}}. Note 
that the {{DAG}} is not created yet.
* {{startDag()}} and {{startDagExecution}} finally create the DAG 
"{{currentDAG}}" and its vertices.

This workflow requires that speculators are started and initialized separately 
after the DAG is created. Although, we can still add them to the services map 
though, we cannot assume that they will start automatically in 
{{DAGAppMaster.serviceStart()}}.

Same for {{DAGAppMaster.serviceStop()}}. The latter is called at the end of the 
execution. Therefore, a service in "{{DAGAppMaster.services}}" map will stay 
around until the whole DAG is completed. Given that a vertex can be completed, 
the speculator service related to that vertex will hang around until the 
{{DAGAppMaster}} is completed.
If we add the speculators to "{{DAGAppMaster.services}}", we won't be able to 
remove the service when a vertex is completed, since a {{Vertex/DAGImpl}} does 
not have access to the "{{DAGAppMaster.services}}".

I am almost done with implementing the code based on your suggestions. If you 
think that having speculators stay alive until DAG is completed, then I will go 
ahead and upload the patch. Otherwise, I will work on few changes to remove the 
speculator of a completed vertex.

Let me know WDYT.


> Tez Speculation decision is calculated on each update by the dispatcher
> -----------------------------------------------------------------------
>
>                 Key: TEZ-4067
>                 URL: https://issues.apache.org/jira/browse/TEZ-4067
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Minor
>         Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch, 
> TEZ-4067.003.patch, TEZ-4067.004.patch, TEZ-4067.005.patch
>
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are 
> handled synchronously by the caller (dispatcher). This implies the following:
>  # the dispatcher spends long time executing updateStatus as it needs to 
> check the runtime estimation of the tezAttempts within the vertex.
>  # the speculator is per stage: lunching a speculation may not the optimum 
> decision. Ideally, based on resources, speculated tasks should be the ones 
> with slowest progress.
>  # the time between speculation is skewed because there is a big delay for 
> the dispatcher to complete a full cycle. Also, speculation will be more 
> aggressive compared to MR because MR waits for 
> "soonest.retry.after.speculate" whenever a task is speculated. On the other 
> hand, Tez speculates more tasks as it processes stages in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to