[ 
https://issues.apache.org/jira/browse/TEZ-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1562:
----------------------------------
    Attachment: TEZ-1562-fileoutputcommitter-time.png
                TEZ-1562-job-runtime.png

- Tried a synthetic job which would read and store 32 GB of lineitem table 
(Mapper --> Reducer).
- Used hadoop 2.8 (Mapreduce-4815 got committed in 2.7 itself)
- Tried with "mapreduce.fileoutputcommitter.algorithm.version=1" and 
"mapreduce.fileoutputcommitter.algorithm.version=2".
- Attaching the graphs with both the configs.
- Basically, "mapreduce.fileoutputcommitter.algorithm.version=2" fixes the 
issue.

> DAGImpl commitOrAbortOutputs takes long time (300+ seconds) for reducer 
> vertex with 4000+ tasks
> -----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-1562
>                 URL: https://issues.apache.org/jira/browse/TEZ-1562
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>              Labels: performance
>         Attachments: TEZ-1562-fileoutputcommitter-time.png, 
> TEZ-1562-job-runtime.png
>
>
> M7 --> R1
> M2 --> R1
> I was running a job with 4000 reducers.  This is in non-session mode.
> At the end of the job, it took 300+ seconds just to commit().  
> It appears that its doing some cleanup work. But in session mode, this could 
> lead to the following issue
> 1. Open session
> 2. Run small job1
> 3. Run large job1
> 4. Run small job2
> In this case, small job2 wouldn't even start running until large-job1's 
> commit is over which would be in the order of 300+ seconds.  
> Please refer to the logs here, 
> 2014-09-09 17:58:20,422 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: Vertex 
> vertex_1409722953518_0143_1_02 completed., numCompletedVertices=3, 
> numSuccessfulVertices=3, numFailedVertices=0, numKilledVertices=0, 
> numVertices=3
> 2014-09-09 17:58:20,422 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: Calling DAG commit/abort for dag: 
> dag_1409722953518_0143_1
> 2014-09-09 17:58:20,426 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: Committing output: Reducer_6_sink 
> for vertex: vertex_1409722953518_0143_1_02
> 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: No output committers for vertex: 
> Map_7
> 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: No output committers for vertex: 
> Map_2
> 2014-09-09 18:03:25,207 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to 
> commitOrAbortOutputs : 304784  <==== In milliseconds
> 2014-09-09 18:03:25,361 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to log jobhistory 
> : 304938  <==== Cumulative number
> 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: Patch : Time taken to handle 
> DAGAppMasterEventDAGFinished : 304940 <==== Cumulative number
> 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: DAG: dag_1409722953518_0143_1 
> finished with state: SUCCEEDED
> 2014-09-09 18:03:25,363 INFO [AsyncDispatcher event handler] 
> org.apache.tez.dag.app.dag.impl.DAGImpl: dag_1409722953518_0143_1 
> transitioned from RUNNING to SUCCEEDED
> Should this operation be moved to a separate thread as the DAG is already 
> marked as SUCCEEDED?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to