Johannes Zillmann created TEZ-2148:
--------------------------------------

             Summary: Slow container grabbing with Capacity Scheduler in 
comparision to MapReduce
                 Key: TEZ-2148
                 URL: https://issues.apache.org/jira/browse/TEZ-2148
             Project: Apache Tez
          Issue Type: Task
    Affects Versions: 0.5.1
            Reporter: Johannes Zillmann


A customer experienced the following:
- Setup a CapacityScheduler for user 'company'
- Same processing job on same data is faster with MapReduce then with Tez with 
"normal" cluster business. Only if nothing else runs on Hadoop then Tez 
outperforms MapReduce. (Its hard to give exact data here since we get every 
information second hand from the customer, but the timings were pretty stable 
over a dozen of runs. The MapReduce job in about 70 sec and Tez in about 170 
sec.)

So questions is, is there some difference in how Tez is grabbing resources from 
the capacity scheduler in difference to MapReduce ?
Looking at the logs it looks like Tez is always very slow in starting the 
containers where as MapReduce parallelizes very quickly.

Attached client and application logs for Tez and MapReduce run as well as the 
scheduler configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to