[ 
https://issues.apache.org/jira/browse/IMPALA-10039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172663#comment-17172663
 ] 

Wenzhe Zhou commented on IMPALA-10039:
--------------------------------------

Thanks Joe for the script which can easily reproduce the issue.

Recent patch for [IMPALA-5746|http://issues.apache.org/jira/browse/IMPALA-5746] 
registers a callback function for the updating of cluster membership. The 
callback function cancels the queries scheduled by the failed coordinators. 
This callback function was called during Expr-test running. In some cases, it 
make QueryState::Cancel() get called before thread unsafe function 
QueryState::Init() is completed. Hence QueryState::Cancel()  is called with 
instances_prepared_barrier_ as nullptr, and cause crash.

There is another dead-lock. If QueryState::Cancel() is called right after 
QueryState::Init() return error with not null instances_prepared_barrier_ , 
QueryState::Cancel() wait on  instances_prepared_barrier_ forever since 
fragment instances are not executed and instances_prepared_barrier_ will not be 
notified.

To fix it, we should make QueryState::Cancel() to wait until QueryState::Init() 
is completed, and reset instances_prepared_barrier_ if Init() failed. Also 
checks if the process running for tests and only registers the callback 
function if it's not running for BE/FE tests.

 

> Expr-test crash in ExprTest.LiteralExprs during core run
> --------------------------------------------------------
>
>                 Key: IMPALA-10039
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10039
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.0
>            Reporter: Laszlo Gaal
>            Assignee: Wenzhe Zhou
>            Priority: Blocker
>              Labels: broken-build
>
> Expr-test crashed with a minidump during a core-mode run.
> The test log:
> {code}
>  22/123 Test  #22: expr-test ........................***Failed    4.42 sec
> Turning perftools heap leak checking off
> seed = 1596358469
> Note: Google Test filter = Instantiations/ExprTest.*
> [==========] Running 192 tests from 1 test case.
> [----------] Global test environment set-up.
> [----------] 192 tests from Instantiations/ExprTest
> 20/08/02 01:54:29 INFO util.JvmPauseMonitor: Starting JVM pause monitor
> Running without optimization passes.
> [ RUN      ] Instantiations/ExprTest.NullLiteral/0
> [       OK ] Instantiations/ExprTest.NullLiteral/0 (1 ms)
> [ RUN      ] Instantiations/ExprTest.NullLiteral/1
> [       OK ] Instantiations/ExprTest.NullLiteral/1 (1 ms)
> [ RUN      ] Instantiations/ExprTest.NullLiteral/2
> [       OK ] Instantiations/ExprTest.NullLiteral/2 (0 ms)
> [ RUN      ] Instantiations/ExprTest.LiteralConstruction/0
> [       OK ] Instantiations/ExprTest.LiteralConstruction/0 (4 ms)
> [ RUN      ] Instantiations/ExprTest.LiteralConstruction/1
> [       OK ] Instantiations/ExprTest.LiteralConstruction/1 (1 ms)
> [ RUN      ] Instantiations/ExprTest.LiteralConstruction/2
> [       OK ] Instantiations/ExprTest.LiteralConstruction/2 (2 ms)
> [ RUN      ] Instantiations/ExprTest.LiteralExprs/0
> Wrote minidump to 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/be_tests/minidumps/unifiedbetests/3c669d32-0e5a-42d6-ae70e79b-9f91038f.dmp
> Wrote minidump to 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/be_tests/minidumps/unifiedbetests/3c669d32-0e5a-42d6-ae70e79b-9f91038f.dmp
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f1e95c21c30, pid=4127, tid=0x00007f1e3d322700
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 
> 1.8.0_144-b01)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libpthread.so.0+0x9c30]  pthread_mutex_lock+0x0
> #
> # Core dump written. Default location: 
> /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/exprs/core
>  or core.4127
> #
> # An error report file with more information is saved as:
> # 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/hs_err_pid4127.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/build/debug//exprs/expr-test:
>  line 10:  4127 Aborted                 (core dumped) 
> ${IMPALA_HOME}/bin/run-jvm-binary.sh 
> ${IMPALA_HOME}/be/build/latest/service/unifiedbetests 
> --gtest_filter=${GTEST_FILTER} 
> --gtest_output=xml:${IMPALA_BE_TEST_LOGS_DIR}/${TEST_EXEC_NAME}.xml 
> -log_filename="${TEST_EXEC_NAME}" "$@"
> Traceback (most recent call last):
>   File 
> "/data/jenkins/workspace/impala-asf-master-core/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 71, in <module>
>     if __name__ == "__main__": main()
>   File 
> "/data/jenkins/workspace/impala-asf-master-core/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 68, in main
>     junitxml_prune_notrun(options.filename)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core/repos/Impala/bin/junitxml_prune_notrun.py",
>  line 31, in junitxml_prune_notrun
>     root = tree.parse(junitxml_filename)
>   File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 647, in parse
>     source = open(source, "rb")
> IOError: [Errno 2] No such file or directory: 
> '/data/jenkins/workspace/impala-asf-master-core/repos/Impala/logs/be_tests/expr-test.xml'
> {code}
> Minidump backtrace:
> {code}
> #0  0x00007f1e925f31f7 in raise () from /lib64/libc.so.6
> #1  0x00007f1e925f48e8 in abort () from /lib64/libc.so.6
> #2  0x00007f1e9532d185 in os::abort(bool) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> #3  0x00007f1e954cf593 in VMError::report_and_die() () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> #4  0x00007f1e9533268f in JVM_handle_linux_signal () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> #5  0x00007f1e95328be3 in signalHandler(int, siginfo*, void*) () from 
> /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
> #6  <signal handler called>
> #7  0x00007f1e95c21c30 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #8  0x0000000002217537 in __gthread_mutex_lock (__mutex=0x38) at 
> /data0/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/include/c++/7.5.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
> #9  0x0000000002233f7c in std::mutex::lock (this=0x38) at 
> /data0/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/include/c++/7.5.0/bits/std_mutex.h:103
> #10 0x000000000223bb4b in std::unique_lock<std::mutex>::lock 
> (this=0x7f1e3d3212e0) at 
> /data0/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/include/c++/7.5.0/bits/std_mutex.h:267
> #11 0x0000000002238fae in std::unique_lock<std::mutex>::unique_lock 
> (this=0x7f1e3d3212e0, __m=...) at 
> /data0/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/gcc-7.5.0/include/c++/7.5.0/bits/std_mutex.h:197
> #12 0x000000000314f4b4 in impala::Promise<bool, (impala::PromiseMode)0>::Get 
> (this=0x0) at 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/util/promise.h:85
> #13 0x0000000003408022 in impala::TypedCountingBarrier<bool>::Wait (this=0x0) 
> at 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/util/counting-barrier.h:61
> #14 0x0000000003407a68 in impala::CountingBarrier::Wait (this=0x0) at 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/util/counting-barrier.h:97
> #15 0x0000000003402ddc in impala::QueryState::WaitForPrepare (this=0xe9d0900) 
> at 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/runtime/query-state.cc:671
> #16 0x0000000003405028 in impala::QueryState::Cancel (this=0xe9d0900) at 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/runtime/query-state.cc:850
> #17 0x00000000033f43cd in impala::QueryExecMgr::CancelFromThreadPool 
> (this=0x132b8400, cancellation_task=...) at 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/runtime/query-exec-mgr.cc:262
> #18 0x00000000033fb107 in boost::_mfi::mf1<void, impala::QueryExecMgr, 
> impala::QueryExecMgr::QueryCancellationTask const&>::operator() 
> (this=0x11d3ac58, p=0x132b8400, a1=...) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165
> #19 0x00000000033fa56f in 
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, boost::arg<2> 
> >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr, 
> impala::QueryExecMgr::QueryCancellationTask const&>, boost::_bi::rrlist2<int, 
> impala::QueryExecMgr::QueryCancellationTask const&> > (this=0x11d3ac68, 
> f=..., a=...) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:319
> #20 0x00000000033f98a4 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, 
> impala::QueryExecMgr, impala::QueryExecMgr::QueryCancellationTask const&>, 
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, boost::arg<2> > 
> >::operator()<int, impala::QueryExecMgr::QueryCancellationTask const&>(int&&, 
> impala::QueryExecMgr::QueryCancellationTask const&) (this=0x11d3ac58, 
> a1=@0x7f1e3d3215f4: 0, a2=...) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1246
> #21 0x00000000033f892a in 
> boost::detail::function::void_function_obj_invoker2<boost::_bi::bind_t<void, 
> boost::_mfi::mf1<void, impala::QueryExecMgr, 
> impala::QueryExecMgr::QueryCancellationTask const&>, 
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, boost::arg<2> > 
> >, void, int, impala::QueryExecMgr::QueryCancellationTask const&>::invoke 
> (function_obj_ptr=..., a0=0, a1=...) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #22 0x00000000033f79e7 in boost::function2<void, int, 
> impala::QueryExecMgr::QueryCancellationTask const&>::operator() 
> (this=0x11d3ac50, a0=0, a1=...) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #23 0x00000000033f61a1 in 
> impala::ThreadPool<impala::QueryExecMgr::QueryCancellationTask>::WorkerThread 
> (this=0x11d3ac00, thread_id=0) at 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/util/thread-pool.h:166
> #24 0x00000000033fbdb8 in boost::_mfi::mf1<void, 
> impala::ThreadPool<impala::QueryExecMgr::QueryCancellationTask>, 
> int>::operator() (this=0x129de000, p=0x11d3ac00, a1=0) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/mem_fn_template.hpp:165
> #25 0x00000000033fb9f8 in 
> boost::_bi::list2<boost::_bi::value<impala::ThreadPool<impala::QueryExecMgr::QueryCancellationTask>*>,
>  boost::_bi::value<int> >::operator()<boost::_mfi::mf1<void, 
> impala::ThreadPool<impala::QueryExecMgr::QueryCancellationTask>, int>, 
> boost::_bi::list0> (this=0x129de010, f=..., a=...) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:319
> #26 0x00000000033fb1cd in boost::_bi::bind_t<void, boost::_mfi::mf1<void, 
> impala::ThreadPool<impala::QueryExecMgr::QueryCancellationTask>, int>, 
> boost::_bi::list2<boost::_bi::value<impala::ThreadPool<impala::QueryExecMgr::QueryCancellationTask>*>,
>  boost::_bi::value<int> > >::operator() (this=0x129de000) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #27 0x00000000033fa961 in 
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
> boost::_mfi::mf1<void, 
> impala::ThreadPool<impala::QueryExecMgr::QueryCancellationTask>, int>, 
> boost::_bi::list2<boost::_bi::value<impala::ThreadPool<impala::QueryExecMgr::QueryCancellationTask>*>,
>  boost::_bi::value<int> > >, void>::invoke (function_obj_ptr=...) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #28 0x0000000003237382 in boost::function0<void>::operator() 
> (this=0x7f1e3d321ba0) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #29 0x000000000446aac6 in 
> impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., 
> functor=..., parent_thread_info=0x0, thread_started=0x7fffae137320) at 
> /data/jenkins/workspace/impala-asf-master-core/repos/Impala/be/src/util/thread.cc:360
> #30 0x00000000044731c9 in 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> 
> >::operator()<void (*)(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*), 
> boost::_bi::list0>(boost::_bi::type<void>, void 
> (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), boost::_bi::list0&, int) (this=0x13d4c380, 
> f=@0x13d4c378: 0x446a780 
> <impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, 
> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, 
> impala::Promise<long, (impala::PromiseMode)0>*)>, a=...) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
> #31 0x00000000044730ed in boost::_bi::bind_t<void, void 
> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > 
> >::operator()() (this=0x13d4c378) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #32 0x00000000044730ae in boost::detail::thread_data<boost::_bi::bind_t<void, 
> void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > const&, std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > const&, boost::function<void 
> ()>, impala::ThreadDebugInfo const*, impala::Promise<long, 
> (impala::PromiseMode)0>*), 
> boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::ThreadDebugInfo*>, 
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > 
> >::run() (this=0x13d4c1c0) at 
> /data/jenkins/workspace/impala-asf-master-core/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
> #33 0x0000000004714232 in thread_proxy ()
> #34 0x00007f1e95c1fe25 in start_thread () from /lib64/libpthread.so.0
> #35 0x00007f1e926b634d in clone () from /lib64/libc.so.6
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to