[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

Peter Bacsko (JIRA) Fri, 10 Nov 2017 08:29:50 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247740#comment-16247740
 ]


Peter Bacsko commented on MAPREDUCE-5124:
-----------------------------------------

Just a quick update on the GC usage improvement. I know the POC is not the 
final version, but I still, I decided to check how much it improves.

I added a 2 second sleep to {{StatusUpdater.transition()}} to cause event 
backlog and used a mapper code which constantly called {{reporter.progress()}} 
in a loop. I also decreased update interval to 100 ms.

GC events in the with the old code:
{noformat}
[GC (Allocation Failure)  52224K->8221K(200192K), 0.0130368 secs]
[GC (Allocation Failure)  60445K->10200K(252416K), 0.0119459 secs]
[GC (Metadata GC Threshold)  59477K->10902K(252416K), 0.0151800 secs]
[Full GC (Metadata GC Threshold)  10902K->9053K(201216K), 0.0446707 secs]
[GC (Allocation Failure)  113501K->19028K(251904K), 0.0136092 secs]
[GC (Metadata GC Threshold)  78026K->17595K(305664K), 0.0226579 secs]
[Full GC (Metadata GC Threshold)  17595K->12774K(347648K), 0.0501647 secs]
[GC (Allocation Failure)  221670K->24081K(377344K), 0.0199000 secs]
[GC (Allocation Failure)  260113K->29187K(378368K), 0.0277259 secs]
[GC (Allocation Failure)  265219K->39660K(373248K), 0.0384575 secs]
[GC (Allocation Failure)  267500K->48473K(378368K), 0.0370554 secs]
[GC (Allocation Failure)  276313K->55049K(371200K), 0.0417077 secs]
[GC (Allocation Failure)  275721K->61521K(365568K), 0.0270593 secs]
[GC (Allocation Failure)  275025K->67873K(359936K), 0.0417392 secs]
[GC (Allocation Failure)  274721K->74129K(345088K), 0.0531881 secs]
[GC (Allocation Failure)  274833K->80089K(347648K), 0.0270885 secs]
[GC (Allocation Failure)  274649K->85921K(345088K), 0.0313155 secs]   <-- I 
killed the job at this point
{noformat}

With the POC:
{noformat}
[GC (Allocation Failure)  52224K->8183K(200192K), 0.0228069 secs]
[GC (Allocation Failure)  60407K->10370K(252416K), 0.0135163 secs]
[GC (Metadata GC Threshold)  60383K->10958K(252416K), 0.0174618 secs]
[Full GC (Metadata GC Threshold)  10958K->8924K(198144K), 0.0452158 secs]
[GC (Allocation Failure)  113372K->18810K(254976K), 0.0132976 secs]
[GC (Metadata GC Threshold)  80801K->17577K(302592K), 0.0137089 secs]
[Full GC (Metadata GC Threshold)  17577K->12903K(345088K), 0.0579774 secs]
[GC (Allocation Failure)  221799K->24221K(382976K), 0.0188251 secs]
[GC (Allocation Failure)  268445K->24870K(384000K), 0.0164503 secs]
[GC (Allocation Failure)  269094K->19999K(381952K), 0.0155673 secs]  <-- final 
event
{noformat}

I think the difference speaks for itself.

> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, 
> MAPREDUCE-5124-CoalescingPOC2.patch, MAPREDUCE-5124-proto.2.txt, 
> MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

Reply via email to