[ 
https://issues.apache.org/jira/browse/HADOOP-12033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583680#comment-14583680
 ] 

zhihai xu commented on HADOOP-12033:
------------------------------------

Hi [~ivanmi], I looked at the hadoop-snappy library source code, it looks like 
the exception {{java.lang.NoClassDefFoundError: Ljava/lang/InternalError}} is 
from the following code at 
[SnappyDecompressor.c|https://github.com/electrum/hadoop-snappy/blob/master/src/main/native/src/org/apache/hadoop/io/compress/snappy/SnappyDecompressor.c#L127]
{code}
  if (ret == SNAPPY_BUFFER_TOO_SMALL){
    THROW(env, "Ljava/lang/InternalError", "Could not decompress data. Buffer 
length is too small.");
  } else if (ret == SNAPPY_INVALID_INPUT){
    THROW(env, "Ljava/lang/InternalError", "Could not decompress data. Input is 
invalid.");
  } else if (ret != SNAPPY_OK){
    THROW(env, "Ljava/lang/InternalError", "Could not decompress data.");
  }
{code}
And also based on another HBASE issue HBASE-9644, this issue may be because 
corrupted map output data is fed to the SnappyDecompressor.

I also found a bug at the above code in SnappyDecompressor.c. We should change 
the above code to:
{code}
  if (ret == SNAPPY_BUFFER_TOO_SMALL){
    THROW(env, "java/lang/InternalError", "Could not decompress data. Buffer 
length is too small.");
  } else if (ret == SNAPPY_INVALID_INPUT){
    THROW(env, "java/lang/InternalError", "Could not decompress data. Input is 
invalid.");
  } else if (ret != SNAPPY_OK){
    THROW(env, "java/lang/InternalError", "Could not decompress data.");
  }
{code}
I think SnappyDecompressor really want to throw java.lang.InternalError 
exception, but due to this bug, it throws 
{{java.lang.NoClassDefFoundError}}/{{ClassNotFoundException}}.

{{THROW}} is defined at 
[org_apache_hadoop.h|https://github.com/electrum/hadoop-snappy/blob/master/src/main/native/src/org_apache_hadoop.h#L44]
{code}
#define THROW(env, exception_name, message) \
  { \
        jclass ecls = (*env)->FindClass(env, exception_name); \
        if (ecls) { \
          (*env)->ThrowNew(env, ecls, message); \
          (*env)->DeleteLocalRef(env, ecls); \
        } \
  }
{code}
Based on the above code, you can see the correct parameter passed to 
{{FindClass}} should be "java/lang/InternalError" instead of 
"Ljava/lang/InternalError".
Also {{java.lang.InternalError}} exception will be handled correctly in 
Fetcher.java at the following code:
{code}
      // The codec for lz0,lz4,snappy,bz2,etc. throw java.lang.InternalError
      // on decompression failures. Catching and re-throwing as IOException
      // to allow fetch failure logic to be processed
      try {
        // Go!
        LOG.info("fetcher#" + id + " about to shuffle output of map "
            + mapOutput.getMapId() + " decomp: " + decompressedLength
            + " len: " + compressedLength + " to " + 
mapOutput.getDescription());
        mapOutput.shuffle(host, is, compressedLength, decompressedLength,
            metrics, reporter);
      } catch (java.lang.InternalError e) {
        LOG.warn("Failed to shuffle for fetcher#"+id, e);
        throw new IOException(e);
      }
{code}
So if SnappyDecompressor throws java.lang.InternalError exception, the reduce 
task won't fail and the map task may be rerun on another node after too many 
fetch failures.

> Reducer task failure with java.lang.NoClassDefFoundError: 
> Ljava/lang/InternalError at 
> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12033
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12033
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Ivan Mitic
>
> We have noticed intermittent reducer task failures with the below exception:
> {code}
> Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#9 at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
> java.lang.NoClassDefFoundError: Ljava/lang/InternalError at 
> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native
>  Method) at 
> org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompress(SnappyDecompressor.java:239)
>  at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
>  at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
>  at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at 
> org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:534)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
>  at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) 
> Caused by: java.lang.ClassNotFoundException: Ljava.lang.InternalError at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more 
> {code}
> Usually, the reduce task succeeds on retry. 
> Some of the symptoms are similar to HADOOP-8423, but this fix is already 
> included (this is on Hadoop 2.6).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to