[ 
https://issues.apache.org/jira/browse/IMPALA-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772744#comment-16772744
 ] 

Michael Ho commented on IMPALA-8212:
------------------------------------

Looking at the stack trace of the crash, it seems that the Kudu code made calls 
to some Kerberos code which made some modification to {{g_krb5_ctx}} 
inadvertently. As far as I understand, the assumption is that {{g_krb5_ctx}} is 
global, shared and it should not be modified after initialization.

However, the default initialization code {{krb5_init_context(&g_krb5_ctx)}} 
called by {{kudu::security:: InitKrb5Ctx()}} only sets 
{{g_krb5_ctx->default_realm}} to 0. Upon the first call to 
{{krb5_parse_name()}}, the Kerberos library will call 
{{krb5_get_default_realm()}} to get the default relam as the Sasl client we 
created didn't actually take the Kerberos realm as argument. 

Apparently, {{krb5_get_default_realm}} may modify {{g_krb5_context}} and it's 
not thread safe. As shown in the stack trace and the code below, 
{{context->default_realm}} is most likely {{NULL}}. So, if multiple negotiation 
threads get into the same code path of calling {{krb5_get_default_realm()}} at 
the same time, they may end up stepping on each other and corrupting 
{{g_krb5_ctx}}, leading to the crash as we saw above or some error messages 
like the following:

{noformat}
0216 14:26:07.459600 (+   296us) negotiation.cc:304] Negotiation complete: 
Runtime error: Server connection negotiation failed: server connection from 
X.X.X.X:37070: could not canonicalize krb5 principal: could not parse 
principal: Configuration file does not specify default realm
{noformat}

[~tlipcon] kindly pointed out that someone reported similar issue in Kerberos 
upstream in the past (http://krbdev.mit.edu/rt/Ticket/Display.html?id=2855).

{noformat}
krb5_error_code KRB5_CALLCONV
krb5_get_default_realm(krb5_context context, char **realm_out)
{
    krb5_error_code ret;

    *realm_out = NULL;

    if (context == NULL || context->magic != KV5M_CONTEXT)
        return KV5M_CONTEXT;

    if (context->default_realm == NULL) {
        ret = get_default_realm(context, &context->default_realm); <<<----- // 
non-thread safe call
        if (ret)
            return ret;
    }
    *realm_out = strdup(context->default_realm);
    return (*realm_out == NULL) ? ENOMEM : 0;
}
{noformat}

Stack trace showing 
{noformat}
#30 <signal handler called>
#31 0x00000000048d0a53 in 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long, int) ()
#32 0x00000000048d0aec in 
tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
long) ()
#33 0x0000000004a0b4c0 in tc_free ()
#34 0x00007fb03f051720 in profile_iterator_free () from 
sysroot/lib64/libkrb5.so.3
#35 0x00007fb03f0519a4 in profile_get_value () from sysroot/lib64/libkrb5.so.3
#36 0x00007fb03f051a18 in profile_get_string () from sysroot/lib64/libkrb5.so.3
#37 0x00007fb03f044dde in profile_default_realm () from 
sysroot/lib64/libkrb5.so.3
#38 0x00007fb03f044509 in krb5_get_default_realm () from 
sysroot/lib64/libkrb5.so.3
#39 0x00007fb03f0245e8 in krb5_parse_name_flags () from 
sysroot/lib64/libkrb5.so.3
#40 0x0000000001ff7bbf in 
kudu::security::CanonicalizeKrb5Principal(std::string*) ()
#41 0x00000000026ee4df in 
kudu::rpc::ServerNegotiation::AuthenticateBySasl(kudu::faststring*) ()
#42 0x00000000026ea929 in kudu::rpc::ServerNegotiation::Negotiate() ()
#43 0x000000000271035b in 
kudu::rpc::DoServerNegotiation(kudu::rpc::Connection*, kudu::TriStateFlag, 
kudu::TriStateFlag, kudu::MonoTime const&) ()
#44 0x000000000271070d in 
kudu::rpc::Negotiation::RunNegotiation(scoped_refptr<kudu::rpc::Connection> 
const&, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime) ()
{noformat}


> Crash during startup in kudu::security::CanonicalizeKrb5Principal()
> -------------------------------------------------------------------
>
>                 Key: IMPALA-8212
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8212
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>         Environment: CentOS Linux release 7.4.1708 (Core)
> Linux vc0512.halxg.cloudera.com 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 
> 19:03:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Tim Armstrong
>            Assignee: Michael Ho
>            Priority: Blocker
>              Labels: crash
>         Attachments: gdb-core-60055.txt, gdb.txt, hs_err_pid60055.log, 
> hs_err_pid65365.log, 
> impalad.vc0512.halxg.cloudera.com.impala.log.INFO.20190218-140034.65365, 
> impalad.vc0513.halxg.cloudera.com.impala.log.INFO.20190216-142536.60055
>
>
> I saw this crash twice will working on the stress test. It *seems* to happen 
> when the stress infrastructure switches the service to a debug build, 
> restarts the service, then starts running queries. I haven't seen it happen 
> once the service is up and running for a while.
> {noformat}
> #0  0x00007fb03e1fa1f7 in raise () from sysroot/lib64/libc.so.6
> #1  0x00007fb03e1fb8e8 in abort () from sysroot/lib64/libc.so.6
> #2  0x00007fb041159185 in os::abort(bool) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #3  0x00007fb0412fb593 in VMError::report_and_die() () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #4  0x00007fb04115e68f in JVM_handle_linux_signal () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #5  0x00007fb041154be3 in signalHandler(int, siginfo*, void*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #6  <signal handler called>
> #7  0x00000000048d0a53 in 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int) ()
> #8  0x00000000048d0aec in 
> tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
> long) ()
> #9  0x0000000004a0b4c0 in tc_free ()
> #10 0x00007fb040d32933 in ElfDecoder::demangle(char const*, char*, int) () 
> from sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #11 0x00007fb040d3222a in Decoder::demangle(char const*, char*, int) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #12 0x00007fb04115695d in os::dll_address_to_function_name(unsigned char*, 
> char*, int, int*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #13 0x00007fb040dc0222 in frame::print_C_frame(outputStream*, char*, int, 
> unsigned char*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #14 0x00007fb040d2e925 in print_native_stack(outputStream*, frame, Thread*, 
> char*, int) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #15 0x00007fb0412f9cc8 in VMError::report(outputStream*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #16 0x00007fb0412fb18a in VMError::report_and_die() () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #17 0x00007fb04115e68f in JVM_handle_linux_signal () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #18 0x00007fb041154be3 in signalHandler(int, siginfo*, void*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #19 <signal handler called>
> #20 0x00000000048d0a53 in 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int) ()
> #21 0x00000000048d0aec in 
> tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
> long) ()
> #22 0x0000000004a0b4c0 in tc_free ()
> #23 0x00007fb03e5915dd in pthread_attr_destroy () from 
> sysroot/lib64/libpthread.so.0
> #24 0x00007fb04115e49f in current_stack_region(unsigned char**, unsigned 
> long*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #25 0x00007fb04115e535 in os::current_stack_base() () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #26 0x00007fb0412faeb4 in VMError::report(outputStream*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #27 0x00007fb0412fb18a in VMError::report_and_die() () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #28 0x00007fb04115e68f in JVM_handle_linux_signal () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #29 0x00007fb041154be3 in signalHandler(int, siginfo*, void*) () from 
> sysroot/usr/java/jdk1.8.0_141-cloudera/jre/lib/amd64/server/libjvm.so
> #30 <signal handler called>
> #31 0x00000000048d0a53 in 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int) ()
> #32 0x00000000048d0aec in 
> tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
> long) ()
> #33 0x0000000004a0b4c0 in tc_free ()
> #34 0x00007fb03f051720 in profile_iterator_free () from 
> sysroot/lib64/libkrb5.so.3
> #35 0x00007fb03f0519a4 in profile_get_value () from sysroot/lib64/libkrb5.so.3
> #36 0x00007fb03f051a18 in profile_get_string () from 
> sysroot/lib64/libkrb5.so.3
> #37 0x00007fb03f044dde in profile_default_realm () from 
> sysroot/lib64/libkrb5.so.3
> #38 0x00007fb03f044509 in krb5_get_default_realm () from 
> sysroot/lib64/libkrb5.so.3
> #39 0x00007fb03f0245e8 in krb5_parse_name_flags () from 
> sysroot/lib64/libkrb5.so.3
> #40 0x0000000001ff7bbf in 
> kudu::security::CanonicalizeKrb5Principal(std::string*) ()
> #41 0x00000000026ee4df in 
> kudu::rpc::ServerNegotiation::AuthenticateBySasl(kudu::faststring*) ()
> #42 0x00000000026ea929 in kudu::rpc::ServerNegotiation::Negotiate() ()
> #43 0x000000000271035b in 
> kudu::rpc::DoServerNegotiation(kudu::rpc::Connection*, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime const&) ()
> #44 0x000000000271070d in 
> kudu::rpc::Negotiation::RunNegotiation(scoped_refptr<kudu::rpc::Connection> 
> const&, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime) ()
> #45 0x00000000026ca8ab in kudu::internal::RunnableAdapter<void 
> (*)(scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, 
> kudu::MonoTime)>::Run(scoped_refptr<kudu::rpc::Connection> const&, 
> kudu::TriStateFlag const&, kudu::TriStateFlag const&, kudu::MonoTime const&) 
> ()
> #46 0x00000000026c9bf4 in kudu::internal::InvokeHelper<false, void, 
> kudu::internal::RunnableAdapter<void (*)(scoped_refptr<kudu::rpc::Connection> 
> const&, kudu::TriStateFlag, kudu::TriStateFlag, ku---Type <return> to 
> continue, or q <return> to quit---
> du::MonoTime)>, void (kudu::rpc::Connection*, kudu::TriStateFlag const&, 
> kudu::TriStateFlag const&, kudu::MonoTime 
> const&)>::MakeItSo(kudu::internal::RunnableAdapter<void 
> (*)(scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime)>, kudu::rpc::Connection*, 
> kudu::TriStateFlag const&, kudu::TriStateFlag const&, kudu::MonoTime const&) 
> ()
> #47 0x00000000026c8ad3 in kudu::internal::Invoker<4, 
> kudu::internal::BindState<kudu::internal::RunnableAdapter<void 
> (*)(scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime)>, void 
> (scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime), void 
> (scoped_refptr<kudu::rpc::Connection>, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime)>, void 
> (scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, 
> kudu::TriStateFlag, kudu::MonoTime)>::Run(kudu::internal::BindStateBase*) ()
> #48 0x0000000001dae84c in kudu::Callback<void ()>::Run() const ()
> #49 0x000000000295a66a in kudu::ClosureRunnable::Run() ()
> #50 0x00000000029595fd in kudu::ThreadPool::DispatchThread() ()
> #51 0x00000000029650d5 in boost::_mfi::mf0<void, 
> kudu::ThreadPool>::operator()(kudu::ThreadPool*) const ()
> #52 0x0000000002964602 in void 
> boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> 
> >::operator()<boost::_mfi::mf0<void, kudu::ThreadPool>, 
> boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, 
> kudu::ThreadPool>&, boost::_bi::list0&, int) ()
> #53 0x0000000002963a05 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, 
> kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > 
> >::operator()() ()
> #54 0x0000000002962b61 in 
> boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, 
> boost::_mfi::mf0<void, kudu::ThreadPool>, 
> boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >, 
> void>::invoke(boost::detail::function::function_buffer&) ()
> #55 0x0000000001d76514 in boost::function0<void>::operator()() const ()
> #56 0x0000000001d72da2 in kudu::Thread::SuperviseThread(void*) ()
> #57 0x00007fb03e58fe25 in start_thread () from sysroot/lib64/libpthread.so.0
> #58 0x00007fb03e2bd34d in clone () from sysroot/lib64/libc.so.6
> {noformat}
> This was a downstream Cloudera build, but the code is the same as this 
> upstream commit:
> {noformat}
> Author: Andrew Sherman <asher...@cloudera.com>
> Date:   Tue Feb 12 16:17:13 2019 -0800
>     IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor.
> {noformat}
> cc [~twm378]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to