[ 
https://issues.apache.org/jira/browse/TEZ-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033684#comment-14033684
 ] 

Oleg Zhurakousky edited comment on TEZ-1206 at 6/17/14 11:30 AM:
-----------------------------------------------------------------

Of course, all existing test pass ;) and with this patch there are more tests.
As far as moving amRmClient.stop() to the finally block; that was the main 
issue with the deadlock that was reported on the mailing list. There was 
interrupted exception thrown by:
{code}
AppFinalStatus status = appClientDelegate.getFinalAppStatus();
{code}
and *.close() was never called. 
Also, its pretty much a common convention to have close* in 'finally' blocks 
specifically to handle the unexpected. 
Another thing about TaskScheduler is the fact that it had this line:
{code}
appCallbackExecutor.awaitTermination(1000l, TimeUnit.MILLISECONDS);
{code}
which was meaningless since no one checked for the return value and/or did 
something about it. 

Anyway, I'll swap the stop/shutdown call to look like this:
{code}
appCallbackExecutor.shutdownNow();
amRmClient.stop();
{code}
to address your question and to ensure that everything is closed properly, but 
I am not going to worry about removing the white space edits since I won't have 
time to do that. I would suggest to simply create a temporary branch and apply 
the patch, look through changed code and tests so you can become comfortable. 
Obviously this is a timely patch and may become invalid/incompatible if/when 
more commits are made, but it blocks all the interesting developer related work 
I am doing, so let's see what we all can do to hopefully have it in before the 
end of the day.



was (Author: ozhurakousky):
Of course, all existing test pass ;) and with this patch there are more tests.
As far as moving amRmClient.stop() to the finally block; that was the main 
issue with the deadlock that was reported on the mailing list. There was 
interrupted exception thrown by:
{code}
AppFinalStatus status = appClientDelegate.getFinalAppStatus();
{code}
and *.close() was never called. 
Also, its pretty much a common convention to have close* in 'finally' blocks 
specifically to handle the unexpected. 
Another thing about TaskScheduler is the fact that it had this line:
{code}
appCallbackExecutor.awaitTermination(1000l, TimeUnit.MILLISECONDS);
{code}
which was meaningless since no one checked for the return value and/or did 
something about it. 

Anyway, I'll swap the stop/shutdown call to look like this:
{code}
appCallbackExecutor.shutdownNow();
amRmClient.stop();
{code}
to address your question and to ensure that everything is closed properly, but 
I am not going to worry about removing the white space edits since I won't have 
time to do that. I would suggest to simply create a temporary branch and apply 
the patch, look through changed code and tests. Obviously this is a timely 
patch and may become invalid/incompatible if/when more commits are made, but it 
blocks all the interesting developer related work I am doing, so let's see what 
we all can do to hopefully have it in before the end of the day.


> Lifecycle issues with DAGAppMaster
> ----------------------------------
>
>                 Key: TEZ-1206
>                 URL: https://issues.apache.org/jira/browse/TEZ-1206
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Oleg Zhurakousky
>            Assignee: Oleg Zhurakousky
>         Attachments: TEZ-1206.patch
>
>
> This is an umbrella issue to document and address issues with DAGAppMaster 
> lifecycle



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to