[ https://issues.apache.org/jira/browse/HADOOP-12033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583680#comment-14583680 ]
zhihai xu commented on HADOOP-12033: ------------------------------------ Hi [~ivanmi], I looked at the hadoop-snappy library source code, it looks like the exception {{java.lang.NoClassDefFoundError: Ljava/lang/InternalError}} is from the following code at [SnappyDecompressor.c|https://github.com/electrum/hadoop-snappy/blob/master/src/main/native/src/org/apache/hadoop/io/compress/snappy/SnappyDecompressor.c#L127] {code} if (ret == SNAPPY_BUFFER_TOO_SMALL){ THROW(env, "Ljava/lang/InternalError", "Could not decompress data. Buffer length is too small."); } else if (ret == SNAPPY_INVALID_INPUT){ THROW(env, "Ljava/lang/InternalError", "Could not decompress data. Input is invalid."); } else if (ret != SNAPPY_OK){ THROW(env, "Ljava/lang/InternalError", "Could not decompress data."); } {code} And also based on another HBASE issue HBASE-9644, this issue may be because corrupted map output data is fed to the SnappyDecompressor. I also found a bug at the above code in SnappyDecompressor.c. We should change the above code to: {code} if (ret == SNAPPY_BUFFER_TOO_SMALL){ THROW(env, "java/lang/InternalError", "Could not decompress data. Buffer length is too small."); } else if (ret == SNAPPY_INVALID_INPUT){ THROW(env, "java/lang/InternalError", "Could not decompress data. Input is invalid."); } else if (ret != SNAPPY_OK){ THROW(env, "java/lang/InternalError", "Could not decompress data."); } {code} I think SnappyDecompressor really want to throw java.lang.InternalError exception, but due to this bug, it throws {{java.lang.NoClassDefFoundError}}/{{ClassNotFoundException}}. {{THROW}} is defined at [org_apache_hadoop.h|https://github.com/electrum/hadoop-snappy/blob/master/src/main/native/src/org_apache_hadoop.h#L44] {code} #define THROW(env, exception_name, message) \ { \ jclass ecls = (*env)->FindClass(env, exception_name); \ if (ecls) { \ (*env)->ThrowNew(env, ecls, message); \ (*env)->DeleteLocalRef(env, ecls); \ } \ } {code} Based on the above code, you can see the correct parameter passed to {{FindClass}} should be "java/lang/InternalError" instead of "Ljava/lang/InternalError". Also {{java.lang.InternalError}} exception will be handled correctly in Fetcher.java at the following code: {code} // The codec for lz0,lz4,snappy,bz2,etc. throw java.lang.InternalError // on decompression failures. Catching and re-throwing as IOException // to allow fetch failure logic to be processed try { // Go! LOG.info("fetcher#" + id + " about to shuffle output of map " + mapOutput.getMapId() + " decomp: " + decompressedLength + " len: " + compressedLength + " to " + mapOutput.getDescription()); mapOutput.shuffle(host, is, compressedLength, decompressedLength, metrics, reporter); } catch (java.lang.InternalError e) { LOG.warn("Failed to shuffle for fetcher#"+id, e); throw new IOException(e); } {code} So if SnappyDecompressor throws java.lang.InternalError exception, the reduce task won't fail and the map task may be rerun on another node after too many fetch failures. > Reducer task failure with java.lang.NoClassDefFoundError: > Ljava/lang/InternalError at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-12033 > URL: https://issues.apache.org/jira/browse/HADOOP-12033 > Project: Hadoop Common > Issue Type: Bug > Reporter: Ivan Mitic > > We have noticed intermittent reducer task failures with the below exception: > {code} > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#9 at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: > java.lang.NoClassDefFoundError: Ljava/lang/InternalError at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native > Method) at > org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:239) > at > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at > org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:534) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > Caused by: java.lang.ClassNotFoundException: Ljava.lang.InternalError at > java.net.URLClassLoader$1.run(URLClassLoader.java:366) at > java.net.URLClassLoader$1.run(URLClassLoader.java:355) at > java.security.AccessController.doPrivileged(Native Method) at > java.net.URLClassLoader.findClass(URLClassLoader.java:354) at > java.lang.ClassLoader.loadClass(ClassLoader.java:425) at > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at > java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more > {code} > Usually, the reduce task succeeds on retry. > Some of the symptoms are similar to HADOOP-8423, but this fix is already > included (this is on Hadoop 2.6). -- This message was sent by Atlassian JIRA (v6.3.4#6332)