[ https://issues.apache.org/jira/browse/MAPREDUCE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajat Jain updated MAPREDUCE-6321: ---------------------------------- Priority: Critical (was: Major) > Map tasks take a lot of time to start up > ---------------------------------------- > > Key: MAPREDUCE-6321 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6321 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 > Affects Versions: 2.6.0 > Reporter: Rajat Jain > Priority: Critical > Labels: performance > > I have noticed repeatedly that the map tasks take a lot of time to startup on > YARN clusters. This is not the scheduling part, this is after the actual > container is launched containing the Map task. Take for example, the sample > log from a mapper of a Pi job that I launched. The command I used to launch > the Pi job was: > {code} > hadoop jar > /usr/lib/hadoop/share/hadoop/mapreduce/hadoop*mapreduce*examples*jar pi 10 100 > {code} > This is the sample job from one of the mappers which took 14 seconds to > complete. If you notice from the logs, most of the time taken by this job is > during the start up. I notice that the most mappers take anywhere between 7 > to 15 seconds during start up and have seen this behavior consistent across > mapreduce jobs. This really affects the performance of short running mappers. > I run a hadoop2 / yarn cluster on a 4-5 node m1.xlarge cluster, and the > mapper memory is always specified as 2048m and so on. > Log: > {code} > 2015-04-18 06:48:34,081 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from > hadoop-metrics2.properties > 2015-04-18 06:48:34,637 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period > at 10 second(s). > 2015-04-18 06:48:34,637 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system > started > 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: > Executing with tokens: > 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: > mapreduce.job, Service: job_1429338752209_0059, Ident: > (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5d48e5d6) > 2015-04-18 06:48:35,391 INFO [main] org.apache.hadoop.mapred.YarnChild: > Sleeping for 0ms before retrying again. Got null now. > 2015-04-18 06:48:36,656 INFO [main] org.apache.hadoop.mapred.YarnChild: > mapreduce.cluster.local.dir for child: > /media/ephemeral3/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral1/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral0/yarn/local/usercache/rjain/appcache/application_1429338752209_0059 > 2015-04-18 06:48:36,657 INFO [main] org.apache.hadoop.mapred.YarnChild: > mapreduce.cluster.reserved.local.dir for child: > 2015-04-18 06:48:36,706 INFO [main] > org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is > deprecated. Instead, use fs.defaultFS > 2015-04-18 06:48:37,387 INFO [main] > org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is > deprecated. Instead, use fs.defaultFS > 2015-04-18 06:48:39,388 INFO [main] > org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. > Instead, use dfs.metrics.session-id > 2015-04-18 06:48:39,448 INFO [main] > org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is > deprecated. Instead, use fs.defaultFS > 2015-04-18 06:48:41,060 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to > org.apache.hadoop.mapred.Task$TaskReporter@601211d0 comment setting up > progress from Task > 2015-04-18 06:48:41,098 INFO [main] org.apache.hadoop.mapred.Task: Using > ResourceCalculatorProcessTree : [ ] > 2015-04-18 06:48:41,585 INFO [main] org.apache.hadoop.mapred.MapTask: > Processing split: > hdfs://ec2-54-211-109-245.compute-1.amazonaws.com:9000/user/rjain/QuasiMonteCarlo_1429339685772_504558444/in/part4:0+118 > 2015-04-18 06:48:43,926 INFO [main] org.apache.hadoop.mapred.MapTask: > (EQUATOR) 0 kvi 234881020(939524080) > 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: > mapreduce.task.io.sort.mb: 896 > 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: soft > limit at 657666880 > 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: > bufstart = 0; bufvoid = 939524096 > 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart > = 234881020; length = 58720256 > 2015-04-18 06:48:43,946 INFO [main] org.apache.hadoop.mapred.MapTask: Map > output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer > 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: > Starting flush of map output > 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: > Spilling map output > 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: > bufstart = 0; bufend = 18; bufvoid = 939524096 > 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart > = 234881020(939524080); kvend = 234881016(939524064); length = 5/58720256 > 2015-04-18 06:48:44,065 INFO [main] > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext$DirSelector: > Returning directory: > /media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059/attempt_1429338752209_0059_m_000004_0_spill_0.out > 2015-04-18 06:48:44,089 INFO [main] org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor [.snappy] > 2015-04-18 06:48:44,100 INFO [main] org.apache.hadoop.mapred.MapTask: > Finished spill 0 > 2015-04-18 06:48:44,111 INFO [main] org.apache.hadoop.mapred.Task: > Task:attempt_1429338752209_0059_m_000004_0 is done. And is in the process of > committing > 2015-04-18 06:48:44,164 INFO [main] > org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to null > comment clearing progress now that task is done > 2015-04-18 06:48:44,191 INFO [main] org.apache.hadoop.mapred.Task: Counters > for attempt_1429338752209_0059_m_000004_0 > 2015-04-18 06:48:44,203 INFO [main] org.apache.hadoop.mapred.Task: File > System Counters > 2015-04-18 06:48:44,206 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of bytes read = 0 > 2015-04-18 06:48:44,206 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of bytes written = 116166 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of read operations = 0 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of large read operations = 0 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of write operations = 0 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of bytes read = 297 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of bytes written = 0 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of read operations = 4 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of large read operations = 0 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of write operations = 0 > 2015-04-18 06:48:44,207 INFO [main] org.apache.hadoop.mapred.Task: Map-Reduce > Framework > 2015-04-18 06:48:44,208 INFO [main] org.apache.hadoop.mapred.Task: Map input > records = 1 > 2015-04-18 06:48:44,208 INFO [main] org.apache.hadoop.mapred.Task: Map output > records = 2 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Map output > bytes = 18 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Map output > materialized bytes = 34 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Input > split bytes = 179 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Combine > input records = 0 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Spilled > Records = 2 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Failed > Shuffles = 0 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: Merged Map > outputs = 0 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: GC time > elapsed (ms) = 158 > 2015-04-18 06:48:44,209 INFO [main] org.apache.hadoop.mapred.Task: CPU time > spent (ms) = 2080 > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: Physical > memory (bytes) snapshot = 1104715776 > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: Virtual > memory (bytes) snapshot = 2900013056 > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: Total > committed heap usage (bytes) = 1251475456 > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: File Input > Format Counters > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: Bytes Read > = 118 > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: File > System Counters > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of bytes read = 0 > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of bytes written = 116166 > 2015-04-18 06:48:44,210 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of read operations = 0 > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of large read operations = 0 > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: FILE: > Number of write operations = 0 > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of bytes read = 297 > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of bytes written = 0 > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of read operations = 4 > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of large read operations = 0 > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: HDFS: > Number of write operations = 0 > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: Map-Reduce > Framework > 2015-04-18 06:48:44,211 INFO [main] org.apache.hadoop.mapred.Task: Map input > records = 1 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Map output > records = 2 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Map output > bytes = 18 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Map output > materialized bytes = 34 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Input > split bytes = 179 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Combine > input records = 0 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Spilled > Records = 2 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Failed > Shuffles = 0 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: Merged Map > outputs = 0 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: GC time > elapsed (ms) = 158 > 2015-04-18 06:48:44,212 INFO [main] org.apache.hadoop.mapred.Task: CPU time > spent (ms) = 2080 > 2015-04-18 06:48:44,213 INFO [main] org.apache.hadoop.mapred.Task: Physical > memory (bytes) snapshot = 1104715776 > 2015-04-18 06:48:44,213 INFO [main] org.apache.hadoop.mapred.Task: Virtual > memory (bytes) snapshot = 2900013056 > 2015-04-18 06:48:44,213 INFO [main] org.apache.hadoop.mapred.Task: Total > committed heap usage (bytes) = 1251475456 > 2015-04-18 06:48:44,214 INFO [main] org.apache.hadoop.mapred.Task: File path: > /media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059/output/attempt_1429338752209_0059_m_000004_0/file.out > 2015-04-18 06:48:44,223 INFO [main] org.apache.hadoop.mapred.Task: Task > 'attempt_1429338752209_0059_m_000004_0' done. > 2015-04-18 06:48:44,225 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics > system... > 2015-04-18 06:48:44,231 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system > stopped. > 2015-04-18 06:48:44,233 INFO [main] > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system > shutdown complete. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)