[ 
https://issues.apache.org/jira/browse/TEZ-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339709#comment-14339709
 ] 

Rajesh Balamohan edited comment on TEZ-2148 at 2/27/15 4:39 AM:
----------------------------------------------------------------

Thanks for sharing the logs.  I have attached the swimlane graph of the 4 DAGs 
attached in your log. 

1. It appears that the first DAG didn't get any container for quite sometime 
(may be till 80 seconds or so, out of the 167.1 seconds). 

2. All 4 DAGs in this session are same jobs?.  If so, 2nd and 3rd DAG ran much 
faster (49 - 50 seconds as they were able to get containers).

Looks like "TEZ-2148.svg" does not get rendered automatically in JIRA.  You 
might have to download it and open it in chrome.


was (Author: rajesh.balamohan):
Thanks for sharing the logs.  I have attached the swimlane graph of the 4 DAGs 
attached in your log. 

1. It appears that the first DAG didn't get any container for quite sometime 
(may be till 80 seconds or so, out of the 167.1 seconds). 

2. All 4 DAGs in this session are same jobs?.  If so, 2nd and 3rd DAG ran much 
faster (49 - 50 seconds as they were able to get containers).

> Slow container grabbing with Capacity Scheduler in comparision to MapReduce
> ---------------------------------------------------------------------------
>
>                 Key: TEZ-2148
>                 URL: https://issues.apache.org/jira/browse/TEZ-2148
>             Project: Apache Tez
>          Issue Type: Task
>    Affects Versions: 0.5.1
>            Reporter: Johannes Zillmann
>         Attachments: TEZ-2148.svg, applicationLogs.zip, 
> capacity-scheduler.xml, client-mapreduce.log, client-tez.log, dag1.pdf, 
> dag2.pdf, dag3.pdf, dag4.pdf
>
>
> A customer experienced the following:
> - Setup a CapacityScheduler for user 'company'
> - Same processing job on same data is faster with MapReduce then with Tez 
> with "normal" cluster business. Only if nothing else runs on Hadoop then Tez 
> outperforms MapReduce. (Its hard to give exact data here since we get every 
> information second hand from the customer, but the timings were pretty stable 
> over a dozen of runs. The MapReduce job in about 70 sec and Tez in about 170 
> sec.)
> So questions is, is there some difference in how Tez is grabbing resources 
> from the capacity scheduler in difference to MapReduce ?
> Looking at the logs it looks like Tez is always very slow in starting the 
> containers where as MapReduce parallelizes very quickly.
> Attached client and application logs for Tez and MapReduce run as well as the 
> scheduler configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to