Crashes in tcmalloc::CentralFreeList::FetchFromSpans() or in TCMalloc functions in general usually indicate a heap corruption bug (e.g. a double-free or a free of an invalid pointer). The stack trace often isn't useful because the corruption happened on a different thread.
These are among the hardest bugs to diagnose. The most reliable way I know of is to find a way to reproduce the bug on demand, then trigger it on a dev setup with an Address Sanitizer build, which can detect most such bugs and provide diagnostics. One thing worth checking is whether there's another thread that's also crashing in the core dump. We've seen cases in the past where one thread crashed and in the process of writing the breakpad minidump and tearing down the process, other threads hit random errors like the one above. If you have some clues about what query triggered it, what kind of files it was scanning, whether the system was under memory pressure, etc, that can provide clues. On Wed, Nov 22, 2017 at 3:00 AM, Jeszy <jes...@gmail.com> wrote: > Searching for 'impala FetchFromSpans' on > https://issues.apache.org/jira shows me IMPALA-2693, which has pretty > much the same stack trace, however the root cause is not clear > upstream. Do you know if it's a specific single query that causes this > crash, or does it happen randomly, under stress, etc? What release are > you using? > Maybe a new bug, it would be good to figure it out if so. > > Thanks! > > On 22 November 2017 at 07:51, sky <x_h...@163.com> wrote: > > Hi all, > > Sometimes, when I do SQL queries, the impalad process crashes. > What's the reason for that? > > Here are the logs(core dump): > > (gdb) bt #0 0x00007f9f27f795e5 in raise () from /lib64/libc.so.6 #1 > 0x00007f9f27f7adc5 in abort () from /lib64/libc.so.6 #2 0x00007f9f29d1e9c5 > in os::abort(bool) () from /home/impala/impala_deploy/ > thirdparty/jdk/jre/lib/amd64/server/libjvm.so #3 0x00007f9f29e9f607 in > VMError::report_and_die() () from /home/impala/impala_deploy/ > thirdparty/jdk/jre/lib/amd64/server/libjvm.so #4 0x00007f9f29d238af in > JVM_handle_linux_signal () from /home/impala/impala_deploy/ > thirdparty/jdk/jre/lib/amd64/server/libjvm.so #5 <signal handler called> > #6 0x00000000016b8b29 in tcmalloc::CentralFreeList::FetchFromSpans() () > #7 0x00000000016b8ec7 in tcmalloc::CentralFreeList::RemoveRange(void**, > void**, int) () #8 0x00000000016bb573 in > tcmalloc::ThreadCache::FetchFromCentralCache(unsigned > long, unsigned long) () #9 0x00000000016d53d8 in tc_new () #10 > 0x000000000159710b in llvm::User::operator new(unsigned long, unsigned int) > () #11 0x0000000000d79b72 in > llvm::BitcodeReader::ParseFunctionBody(llvm::Function*) > () #12 0x0000000000d7b39a in llvm::BitcodeReader:: > Materialize(llvm::GlobalValue*, std::basic_string<char, > std::char_traits<char>, std::allocator<char> >*) () ---Type <return> to > continue, or q <return> to quit---8745664d2f0.debug #13 0x0000000000d76add > in llvm::BitcodeReader::MaterializeModule(llvm::Module*, > std::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () > #14 0x000000000157d93d in > llvm::Module::MaterializeAllPermanently(std::basic_string<char, > std::char_traits<char>, std::allocator<char> >*) () #15 0x0000000000d76629 > in llvm::ParseBitcodeFile(llvm::MemoryBuffer*, llvm::LLVMContext&, > std::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () > ging symbols found)...done. Loaded symbols for /home/impala/impala_deploy/ > thirdparty/jdk/jre/lib/amd64/libnio.so Reading symbols from > /home/impala/impala_deploy/thirdparty/jdk/jre/lib/amd64/libnet.so...(no > debugging symbols found)...done. Loaded symbols for > /home/impala/impala_deploy/thirdparty/jdk/jre/lib/amd64/libnet.so Reading > symbols from > /home/impala/impala_deploy/thirdparty/jdk/jre/lib/amd64/libmanagement.so...Missing > separate debuginfo for /home/impala/impala_deploy/ > thirdparty/jdk/jre/lib/amd64/libmanagement.so Try: yum > --enablerepo='*-debug*' install /usr/lib/debug/.build-id/14/ > 7fd68f515d743bfef3686ec77339d2bcd85cfc.debug (no debugging symbols > found)...done. Loaded symbols for /home/impala/impala_deploy/ > thirdparty/jdk/jre/lib/amd64/libmanagement.so Reading symbols from > /home/impala/impala_deploy/thirdparty/hadoop-2.6.0-cdh5. > 7.0/lib/native/libhadoop.so...done. Loaded symbols for > /home/impala/impala_deploy/thirdparty/hadoop-2.6.0-cdh5.7.0/lib/native/libhadoop.so > Reading symbols from /home/impala/impala_deploy/ > thirdparty/jdk/jre/lib/amd64/libjaas_unix.so...(no debugging symbols > found)...done. Loaded symbols for /home/impala/impala_deploy/ > thirdparty/jdk/jre/lib/amd64/libjaas_unix.so Core was generated by > `/home/impala/impala_deploy/sbin/impalad -log_filename=impalad_node1'. > Program terminated with signal 6, Aborted. #0 0x00007f9f27f795e5 in raise > () from /lib64/libc.so.6 Missing separate debuginfos, use: > debuginfo-install cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 > glibc-2.12-1.192.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 > krb5-libs-1.10.3-57.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 > libselinux-2.0.94-7.el6.x86_64 nss-softokn-freebl-3.14.3-23.el6_7.x86_64 > openssl-1.0.1e-48.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt >