[ https://issues.apache.org/jira/browse/HDFS-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874765#comment-17874765 ]
ASF GitHub Bot commented on HDFS-16084: --------------------------------------- kevincai commented on code in PR #6969: URL: https://github.com/apache/hadoop/pull/6969#discussion_r1721198372 ########## hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/tests/libhdfs_getjni_test.cc: ########## @@ -0,0 +1,44 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include <gmock/gmock.h> +#include <hdfs/hdfs.h> +#include <jni.h> + +// hook the jvm runtime function. expect always failure +_JNI_IMPORT_OR_EXPORT_ jint JNICALL JNI_GetDefaultJavaVMInitArgs(void*) { + return 1; +} + +// hook the jvm runtime function. expect always failure +_JNI_IMPORT_OR_EXPORT_ jint JNICALL JNI_CreateJavaVM(JavaVM**, void**, void*) { + return 1; +} + +TEST(GetJNITest, TestRepeatedGetJNIFailsButNoCrash) { + // connect to nothing, should fail but not crash + EXPECT_EQ(NULL, hdfsConnectNewInstance(NULL, 0)); + + // try again, should fail but not crash + EXPECT_EQ(NULL, hdfsConnectNewInstance(NULL, 0)); Review Comment: did a `getJniEnv` again in line 38, expect no dirty tls there which caused the crash. Without the fix, the crash will be on line 38, line 35 will just set the incomplete tls and free it. > getJNIEnv() returns invalid pointer when called twice after getGlobalJNIEnv() > failed > ------------------------------------------------------------------------------------ > > Key: HDFS-16084 > URL: https://issues.apache.org/jira/browse/HDFS-16084 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs > Affects Versions: 3.2.1 > Reporter: Antoine Pitrou > Priority: Major > Labels: pull-request-available > > First reported in ARROW-13011: when a libhdfs API call fails because > CLASSPATH isn't set, calling the API a second time leads to a crash. > *Backtrace* > This was obtained from the ARROW-13011 reproducer: > {code:java} > #0 globalClassReference (className=className@entry=0x7f75883c13b0 > "org/apache/hadoop/conf/Configuration", env=env@entry=0x6c2f2f3a73666468, > out=out@entry=0x7fffd86e3020) at > /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:279 > #1 0x00007f75883b9511 in constructNewObjectOfClass > (env=env@entry=0x6c2f2f3a73666468, out=out@entry=0x7fffd86e3148, > className=className@entry=0x7f75883c13b0 > "org/apache/hadoop/conf/Configuration", > ctorSignature=ctorSignature@entry=0x7f75883c1180 "()V") > at > /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:212 > #2 0x00007f75883bb6d0 in hdfsBuilderConnect (bld=0x5562e4bbb3e0) at > /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c:700 > #3 0x00007f758de31ef3 in arrow::io::internal::LibHdfsShim::BuilderConnect > (this=0x7f758e768240 <arrow::io::internal::(anonymous > namespace)::libhdfs_shim>, > bld=0x5562e4bbb3e0) at /arrow/cpp/src/arrow/io/hdfs_internal.cc:366 > #4 0x00007f758de2d098 in > arrow::io::HadoopFileSystem::HadoopFileSystemImpl::Connect > (this=0x5562e4a9f750, config=0x5562e46edc30) > at /arrow/cpp/src/arrow/io/hdfs.cc:372 > #5 0x00007f758de2e646 in arrow::io::HadoopFileSystem::Connect > (config=0x5562e46edc30, fs=0x5562e46edd08) at > /arrow/cpp/src/arrow/io/hdfs.cc:590 > #6 0x00007f758d532d2a in arrow::fs::HadoopFileSystem::Impl::Init > (this=0x5562e46edc30) at /arrow/cpp/src/arrow/filesystem/hdfs.cc:59 > #7 0x00007f758d536931 in arrow::fs::HadoopFileSystem::Make (options=..., > io_context=...) at /arrow/cpp/src/arrow/filesystem/hdfs.cc:409 > #8 0x00007f75885d7445 in > __pyx_pf_7pyarrow_5_hdfs_16HadoopFileSystem___init__ > (__pyx_v_self=0x7f758871a970, __pyx_v_host=0x7f758871cc00, __pyx_v_port=8020, > __pyx_v_user=0x5562e3af6d30 <_Py_NoneStruct>, __pyx_v_replication=3, > __pyx_v_buffer_size=0, __pyx_v_default_block_size=0x5562e3af6d30 > <_Py_NoneStruct>, > __pyx_v_kerb_ticket=0x5562e3af6d30 <_Py_NoneStruct>, > __pyx_v_extra_conf=0x5562e3af6d30 <_Py_NoneStruct>) at _hdfs.cpp:4759 > #9 0x00007f75885d4c88 in > __pyx_pw_7pyarrow_5_hdfs_16HadoopFileSystem_1__init__ > (__pyx_v_self=0x7f758871a970, __pyx_args=0x7f75900bb048, > __pyx_kwds=0x7f7590033a68) > at _hdfs.cpp:4343 > #10 0x00005562e38ca747 in type_call () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Objects/typeobject.c:915 > #11 0x00005562e39117a3 in _PyObject_FastCallDict (kwargs=<optimized out>, > nargs=<optimized out>, args=<optimized out>, > func=0x7f75885f1420 <__pyx_type_7pyarrow_5_hdfs_HadoopFileSystem>) at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Objects/tupleobject.c:76 > #12 _PyObject_FastCallKeywords () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Objects/abstract.c:2496 > #13 0x00005562e39121d5 in call_function () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:4875 > #14 0x00005562e3973d68 in _PyEval_EvalFrameDefault () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:3351 > #15 0x00005562e38b98f5 in PyEval_EvalFrameEx (throwflag=0, f=0x7f74c0664768) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:4166 > #16 _PyEval_EvalCodeWithName () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:4166 > #17 0x00005562e38bad79 in PyEval_EvalCodeEx (_co=<optimized out>, > globals=<optimized out>, locals=<optimized out>, args=<optimized out>, > argcount=<optimized out>, > kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, > closure=0x0) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:4187 > #18 0x00005562e398b6eb in PyEval_EvalCode (co=<optimized out>, > globals=<optimized out>, locals=<optimized out>) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:731 > #19 0x00005562e39f30e3 in run_mod () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/pythonrun.c:1025 > #20 0x00005562e3896dd3 in PyRun_InteractiveOneObjectEx (fp=0x7f758f30aa00 > <_IO_2_1_stdin_>, filename=0x7f75900391b8, flags=0x7fffd86e40bc) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/pythonrun.c:246 > #21 0x00005562e3896f85 in PyRun_InteractiveLoopFlags (fp=0x7f758f30aa00 > <_IO_2_1_stdin_>, filename_str=<optimized out>, flags=0x7fffd86e40bc) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/pythonrun.c:114 > #22 0x00005562e3897024 in PyRun_AnyFileExFlags (fp=0x7f758f30aa00 > <_IO_2_1_stdin_>, filename=0x5562e3a32ee6 "<stdin>", closeit=0, > flags=0x7fffd86e40bc) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/pythonrun.c:75 > #23 0x00005562e39f8cc7 in run_file (p_cf=0x7fffd86e40bc, filename=<optimized > out>, fp=0x7f758f30aa00 <_IO_2_1_stdin_>) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Modules/main.c:340 > #24 Py_Main () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Modules/main.c:810 > #25 0x00005562e389bf77 in main (argc=1, argv=0x7fffd86e42c8) at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Programs/python.c:69 > {code} > *Analysis* > The first time {{getJNIEnv()}} is called, no thread-local state is registered > yet. It therefore starts by doing three initialization steps: > 1) allocate a new {{ThreadLocalState}} structure on the heap > 2) associate a POSIX thread-local state to the {{ThreadLocalState}} pointer > 3) associate a native ({{_thread}}) shortcut to the {{ThreadLocalState}} > pointer > Then {{getGlobalJNIEnv()}} is called to actually fetch a valid JNI > environment pointer. However, this call may fail (e.g. CLASSPATH not set > properly). Then the following happens: > 1) the {{ThreadLocalState}} is deallocated from the heap > 2) and... that's all! > Neither the POSIX thread-local-state nor the native {{__thread}} shortcut are > reinitialized. They still hold the {{ThreadLocalState}} pointer, but the > corresponding memory was freed and returned to the allocator. > The next time the user tries to call a libhdfs API again, {{getJNIEnv()}} > returns successfully... with an invalid pointer (or pointing to random data). > For example: > {code} > (gdb) p getJNIEnv() > $2 = (JNIEnv *) 0x6c2f2f3a73666468 > (gdb) p *getJNIEnv() > Cannot access memory at address 0x6c2f2f3a73666468 > {code} > (0x6c2f2f3a73666468 is the little-endian representation of the string > "hdfs://l") > *Note* > This analysis was done with Hadoop 3.2.1. However, examination of the 3.3.2 > or trunk source code seems to show that {{getJNIEnv()}} hasn't changed > in-between. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org