[ https://issues.apache.org/jira/browse/TEZ-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338385#comment-14338385 ]
Johannes Zillmann commented on TEZ-2148: ---------------------------------------- Hey [~jeffzhang], oh yes i missed that part. So its a series of jobs, 4 to exact. For map-reduce we submit 4 jobs/applications and i attached the application of the first job only. Since we use session mode for Tez its application log containing the entries of all 4 DAG submissions, but only the first one is of concern. Also both client logs cover only the 1st of the 4 jobs. HTH > Slow container grabbing with Capacity Scheduler in comparision to MapReduce > --------------------------------------------------------------------------- > > Key: TEZ-2148 > URL: https://issues.apache.org/jira/browse/TEZ-2148 > Project: Apache Tez > Issue Type: Task > Affects Versions: 0.5.1 > Reporter: Johannes Zillmann > Attachments: applicationLogs.zip, capacity-scheduler.xml, > client-mapreduce.log, client-tez.log, dag1.pdf, dag2.pdf, dag3.pdf, dag4.pdf > > > A customer experienced the following: > - Setup a CapacityScheduler for user 'company' > - Same processing job on same data is faster with MapReduce then with Tez > with "normal" cluster business. Only if nothing else runs on Hadoop then Tez > outperforms MapReduce. (Its hard to give exact data here since we get every > information second hand from the customer, but the timings were pretty stable > over a dozen of runs. The MapReduce job in about 70 sec and Tez in about 170 > sec.) > So questions is, is there some difference in how Tez is grabbing resources > from the capacity scheduler in difference to MapReduce ? > Looking at the logs it looks like Tez is always very slow in starting the > containers where as MapReduce parallelizes very quickly. > Attached client and application logs for Tez and MapReduce run as well as the > scheduler configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)