[jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand

Ravi Prakash (Updated) (JIRA) Mon, 27 Feb 2012 17:16:15 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ravi Prakash updated MAPREDUCE-3614:
------------------------------------

    Attachment: MAPREDUCE-3614.patch

Thanks Daryn and Vinod for your careful reviews. Attaching a patch addressing 
comments from them.

Here are some clarifications
@Daryn:
bq. There's some issues with the filesystem handling. It's creating a 
filesystem to get the fs scheme so it can create the same filesystem again.
I chose not to modify the code which had already been checked in. For myself 
I'm now getting the schema directly from the config.

bq. The code is a bit misleading about only running when a SIGTERM hits. Unless 
something unregisters the shutdown hook, it runs on a clean exit too. You have 
to make sure that none of the changes are going to be detrimental to a normal 
shutdown.
You are right! I have renamed the variables to more accurately say isSignalled 
etc. I reviewed the code and it will run fine in case of a normal shutdown too 
and ran it on my 1 node cluster a few times and checked the logs. For JHEH, all 
jobIds should have been closed in case of a normal shutdown, so nothing will be 
different. In RMCommunicator there is the stopped variable which makes sure it 
isn't run on stopped objects.

bq. Is there a compelling need to maintain a parallel set of JobIds? Why not 
check the fileMap values to see if the writer is open? The event construction 
looks a bit odd, but I don't know much about event injection.
You are right! Again! :) I'm checking for open eventWriters in fileMap now. I 
had to modify EventWriter for that. But I like it better.

bq. I'd suggest reverting all the fs manipulation code, and try this:
I had already tried that approach. And failed :( The problem with using 
"fs.automatic.close" was that a FileSystem object had already been opened with 
that parameter set to true. If you see FileSystem.java:2080, it doesn't 
overwrite that value. As a result on SIGTERM, the FileSystem object was being 
closed.

@Vinod
bq. FindBugs warning is still not fixed as Jenkins reports.
Sorry about that. Should be fixed now.
bq. Not sure why you need all the magic with the FileSystem objects, can you 
explain?
FileSystem registers its own Runtime shutdownhook (FileSystem.java:2076) This 
closes all instances of FileSystem objects without the "magic". :( So when JHEH 
wants to write the JobUnsuccessfulCompletionEvent to HDFS / move the files to 
done_intermediate, it would throw an IOException.
bq. You can make MRAppMaster.MRAppMasterShutdownHook extend 
CompositeServiceShutdownHook
Daryn pointed out I'd made another boo boo here =D I should not have been 
starting off another thread in MRAppMasterShutdownHook for 
CompositeServiceShutdownHook. I just needed to call stop() so there was no need 
for extending anymore
bq. RMCommunicator: It seems unnatural to set killed final state if the job was 
running. I think you should set a flag with RMCommunicator similar to 
JobHistoryEventHandler.
Thanks for the suggestion. Done! :) 

bq. Trivial: isSigTermed == false can be written as !isSigTermed
Done!
bq. Again trivial: The sigterm flag will never go back to false from true. So 
the code snippet "while (isSigTermed && jobIt.hasNext()) .." can be rewritten 
as "if (isSigTermed) { while .. }"
Done
bq. For the same snippet, can you simply add a comment saying, we are doing 
this so as to drain everything and only write finish-events to shutdown as fast 
as possible.
Done!
bq. The test seems very wired into the code. Can we instead do something like 
this:
Done! I've tried being as close as possible to the SIGTERM without the test 
becoming too onerous. Please let me know if it needs some change still.


                
>  finalState UNDEFINED if AM is killed by hand
> ---------------------------------------------
>
>                 Key: MAPREDUCE-3614
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Ravi Prakash
>            Assignee: Ravi Prakash
>             Fix For: 0.23.2
>
>         Attachments: MAPREDUCE-3614.branch-0.23.patch, MAPREDUCE-3614.patch, 
> MAPREDUCE-3614.patch, MAPREDUCE-3614.patch, MAPREDUCE-3614.patch, 
> MAPREDUCE-3614.patch
>
>
> Courtesy [~dcapwell]
> {quote}
> If the AM is running and you kill the process (sudo kill #pid), the State in 
> Yarn would be FINISHED and FinalStatus is UNDEFINED.  The Tracking UI would 
> say "History" and point to the proxy url (which will redirect to the history 
> server).
> The state should be more descriptive that the job failed and the tracker url 
> shouldn't point to the history server.
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand

Reply via email to