[ 
https://issues.apache.org/jira/browse/TEZ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037265#comment-14037265
 ] 

Oleg Zhurakousky commented on TEZ-1206:
---------------------------------------

Sid

Let me do it point-by-point.
1. You are correct, but the things you point out are indeed existed there 
before. However, "If a service running a non daemon thread were to fail during 
shutdown" would mean a bug that would need to be addressed. Relying on sysexit 
simply delays the inevitable.
2. No, its not, but is a good practice (ensure it always has a chance to exit). 
As you said, the CDL will only terminate when all started.
3. Not sure about this one since stopServices invokes shutdown which only 
releases the Semaphore and shuts down the executor. Yes, I agree there are too 
many methods to maintain the life-cycle of this class. However, as you can see 
one of the minor changes I did is to check the service state so these multiple 
attempts are harmless. 
4. Could catch. It was done to ensure that start() does not block, while 
ensuring that DAGAppMaster stays alive, but the mechanism has changed a bit and 
it no longer needed to be started as task. 
5. Those were there.
6. Those are the only two that are declared, The rest will propagate which 
should facilitate shutdown etc. Am I missing some other point?
7. Addressed in the previous answer to Hitesh.
8. Yes it does, but I was hoping that once we have DAGAppMaster relying on TP, 
it could start sharing it with services it works with (something you and I 
discussed few weeks ago).

Yeah, while I still consider the code in TaskScheduler a bug, it doesn't affect 
the Sysexit issue, so I am now trying to keep this as light as possible. But 
yes we should open another JIRA for TaskScheduler.

Yes, we can and I thought about it, but I'll never be convinced that Systexit 
is the right strategy, but I'll go with it if that is a consensus.

The bottom line is that this DAGAPpMaster needs some major surgery. We should 
probably have a thorough review of it and create a list of Why's and What's and 
then address it one at the time. For now, System.exit is the most critical of 
them all IMHO and we need to figure out the way to address it. 

> Lifecycle issues with DAGAppMaster
> ----------------------------------
>
>                 Key: TEZ-1206
>                 URL: https://issues.apache.org/jira/browse/TEZ-1206
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Oleg Zhurakousky
>            Assignee: Oleg Zhurakousky
>         Attachments: TEZ-1206.patch, TEZ-1206.patch.2, TEZ-1206.patch.3.patch
>
>
> This is an umbrella issue to document and address issues with DAGAppMaster 
> lifecycle



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to