[
https://issues.apache.org/jira/browse/TEZ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033684#comment-14033684
]
Oleg Zhurakousky edited comment on TEZ-1206 at 6/17/14 11:30 AM:
-----------------------------------------------------------------
Of course, all existing test pass ;) and with this patch there are more tests.
As far as moving amRmClient.stop() to the finally block; that was the main
issue with the deadlock that was reported on the mailing list. There was
interrupted exception thrown by:
{code}
AppFinalStatus status = appClientDelegate.getFinalAppStatus();
{code}
and *.close() was never called.
Also, its pretty much a common convention to have close* in 'finally' blocks
specifically to handle the unexpected.
Another thing about TaskScheduler is the fact that it had this line:
{code}
appCallbackExecutor.awaitTermination(1000l, TimeUnit.MILLISECONDS);
{code}
which was meaningless since no one checked for the return value and/or did
something about it.
Anyway, I'll swap the stop/shutdown call to look like this:
{code}
appCallbackExecutor.shutdownNow();
amRmClient.stop();
{code}
to address your question and to ensure that everything is closed properly, but
I am not going to worry about removing the white space edits since I won't have
time to do that. I would suggest to simply create a temporary branch and apply
the patch, look through changed code and tests so you can become comfortable.
Obviously this is a timely patch and may become invalid/incompatible if/when
more commits are made, but it blocks all the interesting developer related work
I am doing, so let's see what we all can do to hopefully have it in before the
end of the day.
was (Author: ozhurakousky):
Of course, all existing test pass ;) and with this patch there are more tests.
As far as moving amRmClient.stop() to the finally block; that was the main
issue with the deadlock that was reported on the mailing list. There was
interrupted exception thrown by:
{code}
AppFinalStatus status = appClientDelegate.getFinalAppStatus();
{code}
and *.close() was never called.
Also, its pretty much a common convention to have close* in 'finally' blocks
specifically to handle the unexpected.
Another thing about TaskScheduler is the fact that it had this line:
{code}
appCallbackExecutor.awaitTermination(1000l, TimeUnit.MILLISECONDS);
{code}
which was meaningless since no one checked for the return value and/or did
something about it.
Anyway, I'll swap the stop/shutdown call to look like this:
{code}
appCallbackExecutor.shutdownNow();
amRmClient.stop();
{code}
to address your question and to ensure that everything is closed properly, but
I am not going to worry about removing the white space edits since I won't have
time to do that. I would suggest to simply create a temporary branch and apply
the patch, look through changed code and tests. Obviously this is a timely
patch and may become invalid/incompatible if/when more commits are made, but it
blocks all the interesting developer related work I am doing, so let's see what
we all can do to hopefully have it in before the end of the day.
> Lifecycle issues with DAGAppMaster
> ----------------------------------
>
> Key: TEZ-1206
> URL: https://issues.apache.org/jira/browse/TEZ-1206
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Oleg Zhurakousky
> Assignee: Oleg Zhurakousky
> Attachments: TEZ-1206.patch
>
>
> This is an umbrella issue to document and address issues with DAGAppMaster
> lifecycle
--
This message was sent by Atlassian JIRA
(v6.2#6252)