[
https://issues.apache.org/jira/browse/TEZ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037265#comment-14037265
]
Oleg Zhurakousky commented on TEZ-1206:
---------------------------------------
Sid
Let me do it point-by-point.
1. You are correct, but the things you point out are indeed existed there
before. However, "If a service running a non daemon thread were to fail during
shutdown" would mean a bug that would need to be addressed. Relying on sysexit
simply delays the inevitable.
2. No, its not, but is a good practice (ensure it always has a chance to exit).
As you said, the CDL will only terminate when all started.
3. Not sure about this one since stopServices invokes shutdown which only
releases the Semaphore and shuts down the executor. Yes, I agree there are too
many methods to maintain the life-cycle of this class. However, as you can see
one of the minor changes I did is to check the service state so these multiple
attempts are harmless.
4. Could catch. It was done to ensure that start() does not block, while
ensuring that DAGAppMaster stays alive, but the mechanism has changed a bit and
it no longer needed to be started as task.
5. Those were there.
6. Those are the only two that are declared, The rest will propagate which
should facilitate shutdown etc. Am I missing some other point?
7. Addressed in the previous answer to Hitesh.
8. Yes it does, but I was hoping that once we have DAGAppMaster relying on TP,
it could start sharing it with services it works with (something you and I
discussed few weeks ago).
Yeah, while I still consider the code in TaskScheduler a bug, it doesn't affect
the Sysexit issue, so I am now trying to keep this as light as possible. But
yes we should open another JIRA for TaskScheduler.
Yes, we can and I thought about it, but I'll never be convinced that Systexit
is the right strategy, but I'll go with it if that is a consensus.
The bottom line is that this DAGAPpMaster needs some major surgery. We should
probably have a thorough review of it and create a list of Why's and What's and
then address it one at the time. For now, System.exit is the most critical of
them all IMHO and we need to figure out the way to address it.
> Lifecycle issues with DAGAppMaster
> ----------------------------------
>
> Key: TEZ-1206
> URL: https://issues.apache.org/jira/browse/TEZ-1206
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Oleg Zhurakousky
> Assignee: Oleg Zhurakousky
> Attachments: TEZ-1206.patch, TEZ-1206.patch.2, TEZ-1206.patch.3.patch
>
>
> This is an umbrella issue to document and address issues with DAGAppMaster
> lifecycle
--
This message was sent by Atlassian JIRA
(v6.2#6252)