[
https://issues.apache.org/jira/browse/TEZ-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979069#comment-13979069
]
Siddharth Seth commented on TEZ-1074:
-------------------------------------
This looks good. An alternate is lazy de-serialization of the counters in the
AM - but that would add memory pressure since string interning for counter
names would not be possible.
[~rajesh.balamohan] - could you please add the TODO config parameter in this
patch itself. Also, reverse the logic in terms of getting the Counters.
TezCounters counters = null. counters = task.getCounters only if we need to
send them.
> DAGAppMaster takes lots of CPU when running a reasonably large job in the
> cluster
> ---------------------------------------------------------------------------------
>
> Key: TEZ-1074
> URL: https://issues.apache.org/jira/browse/TEZ-1074
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Attachments: Screen Shot 2014-04-19 at 7.26.36 PM.png,
> TEZ-1074-v1.patch, TEZ-1074-v2.patch, TEZ-1074-v7.patch, TEZ-1074-v8.patch
>
>
> - Ran a job which used 200 containers.
> - DAGAppMaster was running at 70% CPU most of the time during the job.
> - Profiling revealed that lots of time was spent on TezEvent.readFields -->
> ... --> TaskStatusUpdateEvent.readFields().
> - Default "tez.task.am.heartbeat.interval-ms.max=100" ms. With 200
> containers, potentially 2000 events (these events have TezCounters) per
> second would be processed by DAGAppMaster.
> With large job, cpu usage of DAGAppMaster can bloat up significantly.
> One option to reduce CPU usage could be to send modified TezCounters in
> TezStatusUpdateEvent.
--
This message was sent by Atlassian JIRA
(v6.2#6252)