[ https://issues.apache.org/jira/browse/HADOOP-12033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559543#comment-14559543 ]
Ivan Mitic commented on HADOOP-12033: ------------------------------------- Thanks for responding [~zxu]. The reducer task would succeed on retry, so I assumed it's not an environment problem. Below is the task syslog: {noformat} 2015-05-21 18:33:10,773 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-05-21 18:33:10,976 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 60 second(s). 2015-05-21 18:33:10,976 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system started 2015-05-21 18:33:10,991 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2015-05-21 18:33:10,991 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1432143397187_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5df3ade7) 2015-05-21 18:33:11,132 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: RM_DELEGATION_TOKEN, Service: 100.76.156.98:9010, Ident: (owner=btbig2, renewer=mr token, realUser=hdp, issueDate=1432225097662, maxDate=1432829897662, sequenceNumber=2, masterKeyId=2) 2015-05-21 18:33:11,351 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2015-05-21 18:33:12,335 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 500ms before retrying again. Got null now. 2015-05-21 18:33:13,804 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 1000ms before retrying again. Got null now. 2015-05-21 18:33:16,308 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: c:/apps/temp/hdfs/nm-local-dir/usercache/btbig2/appcache/application_1432143397187_0004 2015-05-21 18:33:17,199 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2015-05-21 18:33:17,402 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties 2015-05-21 18:33:17,418 INFO [main] org.apache.hadoop.metrics2.sink.WindowsAzureETWSink: Init starting. 2015-05-21 18:33:17,418 INFO [main] org.apache.hadoop.metrics2.sink.WindowsAzureETWSink: Successfully loaded native library. LibraryName = EtwLogger 2015-05-21 18:33:17,418 INFO [main] org.apache.hadoop.metrics2.sink.WindowsAzureETWSink: Init completed. Native library loaded and ETW handle obtained. 2015-05-21 18:33:17,418 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink azurefs2 started 2015-05-21 18:33:17,433 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 60 second(s). 2015-05-21 18:33:17,433 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: azure-file-system metrics system started 2015-05-21 18:33:17,699 INFO [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux. 2015-05-21 18:33:17,714 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@36c76ec3 2015-05-21 18:33:17,746 INFO [main] org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@5c7b1796 2015-05-21 18:33:17,793 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=741710208, maxSingleShuffleLimit=185427552, mergeThreshold=489528768, ioSortFactor=100, memToMemMergeOutputsThreshold=100 2015-05-21 18:33:17,793 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1432143397187_0004_r_001735_0 Thread started: EventFetcher for fetching Map Completion Events 2015-05-21 18:33:19,187 INFO [fetcher#30] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning workernode165.btbig2.c2.internal.cloudapp.net:13562 with 1 to fetcher#30 2015-05-21 18:33:19,187 INFO [fetcher#30] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to workernode165.btbig2.c2.internal.cloudapp.net:13562 to fetcher#30 2015-05-21 18:33:19,187 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning workernode279.btbig2.c2.internal.cloudapp.net:13562 with 1 to fetcher#1 2015-05-21 18:33:19,187 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to workernode279.btbig2.c2.internal.cloudapp.net:13562 to fetcher#1 (fetch logs removed) 2015-05-21 19:25:08,983 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning workernode133.btbig2.c2.internal.cloudapp.net:13562 with 88 to fetcher#9 2015-05-21 19:25:08,983 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 20 of 88 to workernode133.btbig2.c2.internal.cloudapp.net:13562 to fetcher#9 2015-05-21 19:25:08,983 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=13562/mapOutput?job=job_1432143397187_0004&reduce=1735&map=attempt_1432143397187_0004_m_006276_0,attempt_1432143397187_0004_m_006308_0,attempt_1432143397187_0004_m_006355_0,attempt_1432143397187_0004_m_006349_0,attempt_1432143397187_0004_m_006360_0,attempt_1432143397187_0004_m_006368_0,attempt_1432143397187_0004_m_006658_0,attempt_1432143397187_0004_m_008329_0,attempt_1432143397187_0004_m_008443_0,attempt_1432143397187_0004_m_008448_0,attempt_1432143397187_0004_m_008423_0,attempt_1432143397187_0004_m_008441_0,attempt_1432143397187_0004_m_008588_0,attempt_1432143397187_0004_m_008393_0,attempt_1432143397187_0004_m_010397_0,attempt_1432143397187_0004_m_010486_0,attempt_1432143397187_0004_m_010459_0,attempt_1432143397187_0004_m_010522_0,attempt_1432143397187_0004_m_010537_0,attempt_1432143397187_0004_m_010548_0 sent hash and received reply 2015-05-21 19:25:08,999 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#9 about to shuffle output of map attempt_1432143397187_0004_m_006276_0 decomp: 307061 len: 69824 to MEMORY 2015-05-21 19:25:08,999 INFO [fetcher#9] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: workernode133.btbig2.c2.internal.cloudapp.net:13562 freed by fetcher#9 in 23ms 2015-05-21 19:25:08,999 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#9 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NoClassDefFoundError: Ljava/lang/InternalError at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native Method) at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:239) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:534) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) Caused by: java.lang.ClassNotFoundException: Ljava.lang.InternalError at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more 2015-05-21 19:25:08,999 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task {noformat} > Reducer task failure with java.lang.NoClassDefFoundError: > Ljava/lang/InternalError at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-12033 > URL: https://issues.apache.org/jira/browse/HADOOP-12033 > Project: Hadoop Common > Issue Type: Bug > Reporter: Ivan Mitic > > We have noticed intermittent reducer task failures with the below exception: > {code} > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#9 at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: > java.lang.NoClassDefFoundError: Ljava/lang/InternalError at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native > Method) at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:239) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:534) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > Caused by: java.lang.ClassNotFoundException: Ljava.lang.InternalError at > java.net.URLClassLoader$1.run(URLClassLoader.java:366) at > java.net.URLClassLoader$1.run(URLClassLoader.java:355) at > java.security.AccessController.doPrivileged(Native Method) at > java.net.URLClassLoader.findClass(URLClassLoader.java:354) at > java.lang.ClassLoader.loadClass(ClassLoader.java:425) at > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at > java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more > {code} > Usually, the reduce task succeeds on retry. > Some of the symptoms are similar to HADOOP-8423, but this fix is already > included (this is on Hadoop 2.6). -- This message was sent by Atlassian JIRA (v6.3.4#6332)