Johannes Zillmann created TEZ-2148: -------------------------------------- Summary: Slow container grabbing with Capacity Scheduler in comparision to MapReduce Key: TEZ-2148 URL: https://issues.apache.org/jira/browse/TEZ-2148 Project: Apache Tez Issue Type: Task Affects Versions: 0.5.1 Reporter: Johannes Zillmann
A customer experienced the following: - Setup a CapacityScheduler for user 'company' - Same processing job on same data is faster with MapReduce then with Tez with "normal" cluster business. Only if nothing else runs on Hadoop then Tez outperforms MapReduce. (Its hard to give exact data here since we get every information second hand from the customer, but the timings were pretty stable over a dozen of runs. The MapReduce job in about 70 sec and Tez in about 170 sec.) So questions is, is there some difference in how Tez is grabbing resources from the capacity scheduler in difference to MapReduce ? Looking at the logs it looks like Tez is always very slow in starting the containers where as MapReduce parallelizes very quickly. Attached client and application logs for Tez and MapReduce run as well as the scheduler configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)