Milind, are you talking about the cache specified by the parameter local.cache.size. I have not actually changed its value and I can see that the default is 10GB. ThanksIman
________________________________ From: "milind.bhandar...@emc.com" <milind.bhandar...@emc.com> To: mapreduce-user@hadoop.apache.org; hadoop_...@yahoo.com Sent: Friday, August 5, 2011 3:47 PM Subject: Re: Performance of mappers Iman, Are you using cache archives ? If yes, what's the size of the cache archive? - milind --- --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any organization, past or present, the author might be affiliated with.) From: Iman E <hadoop_...@yahoo.com<mailto:hadoop_...@yahoo.com>> Reply-To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>, Iman E <hadoop_...@yahoo.com<mailto:hadoop_...@yahoo.com>> Date: Fri, 5 Aug 2011 15:31:51 -0400 To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>> Subject: Re: Performance of mappers Hi Arun, Thanks for your reply. I am running Hadoop-0.20.1 and I also tried the cloudera hadoop-0.20.1+152. The task tracker logs does not show any problem. These are the log entries for a task attempt that is too slow 2011-08-05 14:28:01,644 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201108041814_0035_m_000000_0 task's state:UNASSIGNED 2011-08-05 14:28:01,644 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201108041814_0035_m_000000_0 2011-08-05 14:28:01,644 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201108041814_0035_m_000000_0 2011-08-05 14:28:03,097 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201108041814_0035_m_1371719584 given task: attempt_201108041814_0035_m_000000_0 2011-08-05 14:32:52,341 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0 0.07052739% 2011-08-05 14:32:55,398 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0 0.123025686% 2011-08-05 14:32:58,402 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0 0.16794641% 2011-08-05 14:33:01,419 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0 0.41990894% 2011-08-05 14:33:04,804 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0 0.8607056% 2011-08-05 14:33:06,617 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201108041814_0035_m_000000_0 1.0% 2011-08-05 14:33:06,625 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201108041814_0035_m_000000_0 is done. Thanks Iman From: Arun C Murthy <a...@hortonworks.com<mailto:a...@hortonworks.com>> To: mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>; Iman E <hadoop_...@yahoo.com<mailto:hadoop_...@yahoo.com>> Sent: Friday, August 5, 2011 2:05 PM Subject: Re: Performance of mappers Which release of Hadoop are you running? What do the logs on the TaskTracker tell you during the time the slow tasks are getting launched? hadoop-0.20.203 has a ton of bug fixes since hadoop-0.20.2 which help fix issues with slow launches - you might want to upgrade. Arun On Aug 5, 2011, at 11:02 AM, Iman E wrote: Hello all, I have a question regarding the mappers. I can see from the logs that the start time of the mapper is different from start time of logging. I am having a problem because that time difference sometimes is few seconds, but other times it is For example, one mapper that is supposed to read 65 MB. Its start time is 12:30:53 whereis the logging start time is 12:33:01 and the end time is 12:33:20. All the laoded data are local to the same rack. In a perfect run, these numbers are as follows: the start time is 18:15:45, logging start time: 18:15:48, and end time: 18:16:02. I am running a job of more than 2400 mapper. Because of the above problem, instead of the job taking 15-20 mins on 14 machine ( it happened in few runs), other times it is taking more than 70 mins. Any suggestions how to fix this problem or what could possibly be causing it. Thanks, Iman