[ 
https://issues.apache.org/jira/browse/IMPALA-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoram Thanga resolved IMPALA-6764.
----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

> Codegend UnionNode::MaterializeBatch() causes memory corruption crash of 
> Impalad
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-6764
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6764
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Zoram Thanga
>            Assignee: Zoram Thanga
>            Priority: Critical
>             Fix For: Impala 2.12.0
>
>         Attachments: bad-materializebatch-disasm.txt, 
> good-materializebatch-disasm.txt
>
>
> A CTAS statement involving UNION ALL with LEFT JOIN children is reliably 
> crashing with a stack trace similar to the following:
> {noformat}
> (gdb) bt
> #0  0x00007fb85fdf11f7 in raise () from ./debug-stuff/lib64/libc.so.6
> #1  0x00007fb85fdf28e8 in abort () from ./debug-stuff/lib64/libc.so.6
> #2  0x00007fb862106f35 in os::abort(bool) () from 
> ./debug-stuff/usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
> #3  0x00007fb8622aaf33 in VMError::report_and_die() () from 
> ./debug-stuff/usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
> #4  0x00007fb86210d22f in JVM_handle_linux_signal () from 
> ./debug-stuff/usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
> #5  0x00007fb862103253 in signalHandler(int, siginfo*, void*) () from 
> ./debug-stuff/usr/java/jdk1.8.0_162/jre/lib/amd64/server/libjvm.so
> #6  <signal handler called>
> #7  0x00007fb85ff08706 in __memcpy_ssse3_back () from 
> ./debug-stuff/lib64/libc.so.6
> #8  0x00007fb840700d73 in 
> impala::UnionNode::MaterializeBatch(impala::RowBatch*, unsigned char**) 
> [clone .588] ()
> #9  0x0000000001001806 in impala::UnionNode::GetNextMaterialized 
> (this=this@entry=0x8280000, state=state@entry=0x848ed00, 
> row_batch=row_batch@entry=0xcef9950)
>     at /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/exec/union-node.cc:228
> #10 0x0000000001001b5c in impala::UnionNode::GetNext (this=0x8280000, 
> state=0x848ed00, row_batch=0xcef9950, eos=0x7fb7fe9a987e)
>     at /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/exec/union-node.cc:294
> #11 0x0000000000b724d2 in impala::FragmentInstanceState::ExecInternal 
> (this=this@entry=0x4c030c0)
>     at 
> /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/runtime/fragment-instance-state.cc:270
> #12 0x0000000000b74e42 in impala::FragmentInstanceState::Exec 
> (this=this@entry=0x4c030c0) at 
> /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/runtime/fragment-instance-state.cc:89
> #13 0x0000000000b64488 in impala::QueryState::ExecFInstance (this=0x8559200, 
> fis=0x4c030c0) at 
> /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/runtime/query-state.cc:382
> #14 0x0000000000d13613 in boost::function0<void>::operator() 
> (this=0x7fb7fe9a9c60)
>     at 
> /usr/src/debug/impala-2.11.0-cdh5.14.0/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
> #15 impala::Thread::SuperviseThread(std::string const&, std::string const&, 
> boost::function<void ()>, impala::Promise<long>*) (name=..., category=..., 
> functor=..., 
>     thread_started=0x7fb7f999f0f0) at 
> /usr/src/debug/impala-2.11.0-cdh5.14.0/be/src/util/thread.cc:352
> #16 0x0000000000d13d54 in 
> boost::_bi::list4<boost::_bi::value<std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, 
> boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const 
> std::basic_string<char>&, const std::basic_string<char>&, 
> boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0> (
>     f=@0x808bfb8: 0xd13460 <impala::Thread::SuperviseThread(std::string 
> const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*)>, a=<synthetic pointer>, 
>     this=0x808bfc0) at 
> /usr/src/debug/impala-2.11.0-cdh5.14.0/toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:457
> #17 boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, 
> boost::function<void ()>, impala::Promise<long>*), 
> boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > >::operator()() (this=0x808bfb8)
>     at 
> /usr/src/debug/impala-2.11.0-cdh5.14.0/toolchain/boost-1.57.0-p3/include/boost/bind/bind_template.hpp:20
> #18 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string 
> const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > > >::run() (this=0x808be00)
>     at 
> /usr/src/debug/impala-2.11.0-cdh5.14.0/toolchain/boost-1.57.0-p3/include/boost/thread/detail/thread.hpp:116
> #19 0x000000000128e8ea in thread_proxy ()
> #20 0x00007fb860186e25 in start_thread () from 
> ./debug-stuff/lib64/libpthread.so.0
> #21 0x00007fb85feb434d in clone () from ./debug-stuff/lib64/libc.so.6
> {noformat}
> The exact location or reason of the crash varies, i.e., sometimes we crash 
> while accessing the source address of memcpy, other times we crash on 
> accessing the destination address. In this particular instance, we see:
> {noformat}
>    0x00007fb85ff086e4 <+6676>:  add    %rdx,%rsi
>    0x00007fb85ff086e7 <+6679>:  add    %rdx,%rdi
>    0x00007fb85ff086ea <+6682>:  lea    0x375df(%rip),%r11        # 
> 0x7fb85ff3fcd0
>    0x00007fb85ff086f1 <+6689>:  movslq (%r11,%rdx,4),%rdx
>    0x00007fb85ff086f5 <+6693>:  lea    (%r11,%rdx,1),%rdx
>    0x00007fb85ff086f9 <+6697>:  jmpq   *%rdx
>    0x00007fb85ff086fb <+6699>:  ud2    
>    0x00007fb85ff086fd <+6701>:  nopl   (%rax)
>    0x00007fb85ff08700 <+6704>:  add    %rdx,%rsi
>    0x00007fb85ff08703 <+6707>:  add    %rdx,%rdi
> => 0x00007fb85ff08706 <+6710>:  movdqu -0x10(%rsi),%xmm0
>    0x00007fb85ff0870b <+6715>:  lea    -0x10(%rdi),%r8
>    0x00007fb85ff0870f <+6719>:  mov    %rdi,%r9
>    0x00007fb85ff08712 <+6722>:  and    $0xfffffffffffffff0,%rdi
>    0x00007fb85ff08716 <+6726>:  sub    %rdi,%r9
>    0x00007fb85ff08719 <+6729>:  sub    %r9,%rsi
>    0x00007fb85ff0871c <+6732>:  sub    %r9,%rdx
>    0x00007fb85ff0871f <+6735>:  mov    0x26fb0a(%rip),%rcx        # 
> 0x7fb860178230 <__x86_64_shared_cache_size_half>
>    0x00007fb85ff08726 <+6742>:  cmp    %rcx,%rdx
> {noformat}
> which looks like the source address is whacked.
> Setting DISABLE_CODEGEN=TRUE for the statement avoids the crash, which means 
> that the generated code is somehow using invalid pointers.
> The crash has reproduced on RHEL/CENTOS 6 and 7.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to