[ 
https://issues.apache.org/jira/browse/HADOOP-12033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559543#comment-14559543
 ] 

Ivan Mitic commented on HADOOP-12033:
-------------------------------------

Thanks for responding [~zxu]. The reducer task would succeed on retry, so I 
assumed it's not an environment problem. Below is the task syslog:
{noformat}
2015-05-21 18:33:10,773 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2015-05-21 18:33:10,976 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 
60 second(s).
2015-05-21 18:33:10,976 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system 
started
2015-05-21 18:33:10,991 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Executing with tokens:
2015-05-21 18:33:10,991 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: 
mapreduce.job, Service: job_1432143397187_0004, Ident: 
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5df3ade7)
2015-05-21 18:33:11,132 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: 
RM_DELEGATION_TOKEN, Service: 100.76.156.98:9010, Ident: (owner=btbig2, 
renewer=mr token, realUser=hdp, issueDate=1432225097662, maxDate=1432829897662, 
sequenceNumber=2, masterKeyId=2)
2015-05-21 18:33:11,351 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Sleeping for 0ms before retrying again. Got null now.
2015-05-21 18:33:12,335 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Sleeping for 500ms before retrying again. Got null now.
2015-05-21 18:33:13,804 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Sleeping for 1000ms before retrying again. Got null now.
2015-05-21 18:33:16,308 INFO [main] org.apache.hadoop.mapred.YarnChild: 
mapreduce.cluster.local.dir for child: 
c:/apps/temp/hdfs/nm-local-dir/usercache/btbig2/appcache/application_1432143397187_0004
2015-05-21 18:33:17,199 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. 
Instead, use dfs.metrics.session-id
2015-05-21 18:33:17,402 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
hadoop-metrics2-azure-file-system.properties
2015-05-21 18:33:17,418 INFO [main] 
org.apache.hadoop.metrics2.sink.WindowsAzureETWSink: Init starting.
2015-05-21 18:33:17,418 INFO [main] 
org.apache.hadoop.metrics2.sink.WindowsAzureETWSink: Successfully loaded native 
library. LibraryName = EtwLogger
2015-05-21 18:33:17,418 INFO [main] 
org.apache.hadoop.metrics2.sink.WindowsAzureETWSink: Init completed. Native 
library loaded and ETW handle obtained.
2015-05-21 18:33:17,418 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink azurefs2 started
2015-05-21 18:33:17,433 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 
60 second(s).
2015-05-21 18:33:17,433 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: azure-file-system metrics 
system started
2015-05-21 18:33:17,699 INFO [main] 
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: ProcfsBasedProcessTree 
currently is supported only on Linux.
2015-05-21 18:33:17,714 INFO [main] org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorProcessTree : 
org.apache.hadoop.yarn.util.WindowsBasedProcessTree@36c76ec3
2015-05-21 18:33:17,746 INFO [main] org.apache.hadoop.mapred.ReduceTask: Using 
ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@5c7b1796
2015-05-21 18:33:17,793 INFO [main] 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: 
memoryLimit=741710208, maxSingleShuffleLimit=185427552, 
mergeThreshold=489528768, ioSortFactor=100, memToMemMergeOutputsThreshold=100
2015-05-21 18:33:17,793 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1432143397187_0004_r_001735_0 Thread started: EventFetcher for fetching 
Map Completion Events
2015-05-21 18:33:19,187 INFO [fetcher#30] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning 
workernode165.btbig2.c2.internal.cloudapp.net:13562 with 1 to fetcher#30
2015-05-21 18:33:19,187 INFO [fetcher#30] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 
to workernode165.btbig2.c2.internal.cloudapp.net:13562 to fetcher#30
2015-05-21 18:33:19,187 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning 
workernode279.btbig2.c2.internal.cloudapp.net:13562 with 1 to fetcher#1
2015-05-21 18:33:19,187 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 
to workernode279.btbig2.c2.internal.cloudapp.net:13562 to fetcher#1
(fetch logs removed)
2015-05-21 19:25:08,983 INFO [fetcher#9] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning 
workernode133.btbig2.c2.internal.cloudapp.net:13562 with 88 to fetcher#9
2015-05-21 19:25:08,983 INFO [fetcher#9] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 20 of 88 
to workernode133.btbig2.c2.internal.cloudapp.net:13562 to fetcher#9
2015-05-21 19:25:08,983 INFO [fetcher#9] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=13562/mapOutput?job=job_1432143397187_0004&reduce=1735&map=attempt_1432143397187_0004_m_006276_0,attempt_1432143397187_0004_m_006308_0,attempt_1432143397187_0004_m_006355_0,attempt_1432143397187_0004_m_006349_0,attempt_1432143397187_0004_m_006360_0,attempt_1432143397187_0004_m_006368_0,attempt_1432143397187_0004_m_006658_0,attempt_1432143397187_0004_m_008329_0,attempt_1432143397187_0004_m_008443_0,attempt_1432143397187_0004_m_008448_0,attempt_1432143397187_0004_m_008423_0,attempt_1432143397187_0004_m_008441_0,attempt_1432143397187_0004_m_008588_0,attempt_1432143397187_0004_m_008393_0,attempt_1432143397187_0004_m_010397_0,attempt_1432143397187_0004_m_010486_0,attempt_1432143397187_0004_m_010459_0,attempt_1432143397187_0004_m_010522_0,attempt_1432143397187_0004_m_010537_0,attempt_1432143397187_0004_m_010548_0
 sent hash and received reply
2015-05-21 19:25:08,999 INFO [fetcher#9] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#9 about to shuffle 
output of map attempt_1432143397187_0004_m_006276_0 decomp: 307061 len: 69824 
to MEMORY
2015-05-21 19:25:08,999 INFO [fetcher#9] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: 
workernode133.btbig2.c2.internal.cloudapp.net:13562 freed by fetcher#9 in 23ms
2015-05-21 19:25:08,999 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in fetcher#9
        at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NoClassDefFoundError: Ljava/lang/InternalError
        at 
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
 Method)
        at 
org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:239)
        at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
        at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
        at 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
        at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:534)
        at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
        at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
Caused by: java.lang.ClassNotFoundException: Ljava.lang.InternalError
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 9 more

2015-05-21 19:25:08,999 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
cleanup for the task

{noformat}

> Reducer task failure with java.lang.NoClassDefFoundError: 
> Ljava/lang/InternalError at 
> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12033
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12033
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Ivan Mitic
>
> We have noticed intermittent reducer task failures with the below exception:
> {code}
> Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#9 at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
> java.lang.NoClassDefFoundError: Ljava/lang/InternalError at 
> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
>  Method) at 
> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:239)
>  at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
>  at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>  at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:534)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
>  at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) 
> Caused by: java.lang.ClassNotFoundException: Ljava.lang.InternalError at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more 
> {code}
> Usually, the reduce task succeeds on retry. 
> Some of the symptoms are similar to HADOOP-8423, but this fix is already 
> included (this is on Hadoop 2.6).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to