[jira] [Commented] (MAPREDUCE-5330) Killing M/R JVM's leads to metrics not being uploaded

Xi Fang (JIRA) Tue, 18 Jun 2013 16:04:20 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687346#comment-13687346
 ]


Xi Fang commented on MAPREDUCE-5330:
------------------------------------

If Signal.TERM is sent to a process, then we wait for a delay. But in Windows 
the signal kind is ignored - we just kill it (look at 
Shell#getSignalKillProcessGroupCommand())
{code}
  public static String[] getSignalKillProcessGroupCommand(int code,
                                                          String groupId) {
    if (WINDOWS) {
      return new String[] { Shell.WINUTILS, "task", "kill", groupId };
    } else {
      return new String[] { "kill", "-" + code , "-" + groupId };
    }
  }
{code}

Here is a fix. If the OS is Windows and the signal is TERM, then return 
immediately and let a delayed process killer actually kill this process group. 
This can give this process group a graceful time to clean up itself.
                
> Killing M/R JVM's leads to metrics not being uploaded
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5330
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5330
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1-win
>         Environment: Windows
>            Reporter: Xi Fang
>            Assignee: Xi Fang
>         Attachments: MAPREDUCE-5330.patch
>
>
> In MapReduce, we sometimes kill a task's JVM before it naturally shuts down 
> if we want to launch other tasks (look in 
> JvmManager$JvmManagerForType.reapJvm). This behavior means that if the map 
> task process is in the middle of doing some cleanup/finalization after the 
> task is done, it might be interrupted/killed without giving it a chance. 
> In the Microsoft's Hadoop Service, after a Map/Reduce task is done and during 
> closing file systems in a special shutdown hook, we're typically uploading 
> storage (ASV in our context) usage metrics to Microsoft Azure Tables. So if 
> this kill happens these metrics get lost. The impact is that for many MR jobs 
> we don't see accurate metrics reported most of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5330) Killing M/R JVM's leads to metrics not being uploaded

Reply via email to