[ https://issues.apache.org/jira/browse/TEZ-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128094#comment-14128094 ]
Hitesh Shah commented on TEZ-1562: ---------------------------------- [~rajesh.balamohan] In that case, I am not sure we can do much until we start supporting running DAGs in parallel within the same AM. > DAGImpl commitOrAbortOutputs takes long time (300+ seconds) for reducer > vertex with 4000+ tasks > ----------------------------------------------------------------------------------------------- > > Key: TEZ-1562 > URL: https://issues.apache.org/jira/browse/TEZ-1562 > Project: Apache Tez > Issue Type: Bug > Reporter: Rajesh Balamohan > Labels: performance > > M7 --> R1 > M2 --> R1 > I was running a job with 4000 reducers. This is in non-session mode. > At the end of the job, it took 300+ seconds just to commit(). > It appears that its doing some cleanup work. But in session mode, this could > lead to the following issue > 1. Open session > 2. Run small job1 > 3. Run large job1 > 4. Run small job2 > In this case, small job2 wouldn't even start running until large-job1's > commit is over which would be in the order of 300+ seconds. > Please refer to the logs here, > 2014-09-09 17:58:20,422 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: Vertex > vertex_1409722953518_0143_1_02 completed., numCompletedVertices=3, > numSuccessfulVertices=3, numFailedVertices=0, numKilledVertices=0, > numVertices=3 > 2014-09-09 17:58:20,422 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: Calling DAG commit/abort for dag: > dag_1409722953518_0143_1 > 2014-09-09 17:58:20,426 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: Committing output: Reducer_6_sink > for vertex: vertex_1409722953518_0143_1_02 > 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: No output committers for vertex: > Map_7 > 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: No output committers for vertex: > Map_2 > 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to > commitOrAbortOutputs : 304784 <==== In milliseconds > 2014-09-09 18:03:25,361 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to log jobhistory > : 304938 <==== Cumulative number > 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to handle > DAGAppMasterEventDAGFinished : 304940 <==== Cumulative number > 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: DAG: dag_1409722953518_0143_1 > finished with state: SUCCEEDED > 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] > org.apache.tez.dag.app.dag.impl.DAGImpl: dag_1409722953518_0143_1 > transitioned from RUNNING to SUCCEEDED > Should this operation be moved to a separate thread as the DAG is already > marked as SUCCEEDED? -- This message was sent by Atlassian JIRA (v6.3.4#6332)