[ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021669#comment-16021669
 ] 

Sailesh Mukil commented on HDFS-11851:
--------------------------------------

[~jzhuge] Apologies for the slow response. It seems non-trivial to add tests 
for this fix.

One way to add it would be to add some functions only for the sake of testing 
to expose these mutexes to the test files and try to recursively lock and 
unlock. What do you think?

> getGlobalJNIEnv() may deadlock if exception is thrown
> -----------------------------------------------------
>
>                 Key: HDFS-11851
>                 URL: https://issues.apache.org/jira/browse/HDFS-11851
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: libhdfs
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Henry Robinson
>            Assignee: Sailesh Mukil
>            Priority: Blocker
>         Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>                          "org/apache/hadoop/fs/FileSystem",
>                          "loadFileSystems", "()V");
>         if (jthr) {
>             printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
>         }
>     }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x00007ffff4a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x00007ffff4a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> <jvmMutex>) at ../nptl/pthread_mutex_lock.c:79
> #3  0x0000000002f06056 in mutexLock (m=<optimized out>) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x0000000002efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x0000000002f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=<optimized out>, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffffffb660)
>     at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x0000000002f0683d in printExceptionAndFree (env=<optimized out>, 
> exc=<optimized out>, noPrintFlags=<optimized out>, fmt=<optimized out>)
>     at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x0000000002eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to