[ https://issues.apache.org/jira/browse/IMPALA-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825675#comment-17825675 ]
Evgeniy commented on IMPALA-12439: ---------------------------------- It seems that Impala daemon hangs on RPC. We have seen a lot of rpc method "CancelQueryFInstances" for inbound connections. Something like below: { "remote_ip": "ip.ip.ip.ip:44876", "num_calls_in_flight": 58, ""socket_status": { "rtt": 2442, "rttvar": 4604, "snd_cwnd": 10, "total_retrans": 0, "pacing_rate": 11957411, "max_pacing_rate": 18446744073709551615, "bytes_acked": 12566319, "bytes_received": 10993660785, "segs_out": 901401, "segs_in": 7990372, "send_queue_bytes": 0, "receive_queue_bytes": 0, "send_bytes_per_sec": 5978705 }, "calls_in_flight": [ { "header": { "call_id": 169977, "remote_method": { "service_name": "impala.Control Service", "method_name": "CancelQueryFInstances" }, "timeout_millis": 10000 }, "micros_elapsed": 3175972030 }, { "header": { "call_id": 169975, "remote_method": { "service_name": "impala.Control Service", "method_name": "CancelQueryFInstances" }, "timeout_millis": 10000 }, "micros_elapsed": 3185965057 },.... > Impala Daemon stucks on random executors > ---------------------------------------- > > Key: IMPALA-12439 > URL: https://issues.apache.org/jira/browse/IMPALA-12439 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec > Affects Versions: Impala 3.4.0 > Reporter: Evgeniy > Priority: Critical > Attachments: resolved_420a96bf.txt, resolved_d7750c55.txt > > > Hi! > In our cluster we face the next problem periodically: > 1. The query fails with the error like this "Exec() rpc failed: Timed out: > ExecQueryFInstances RPC to <node_ip>:27000 timed out after 300.000s". Every > time when the problem appears the problem node may be different. > 2. We have analyzed minidumps of the impala daemon from two different cases > (there are resolving minidumps in attachment). It seems that impala daemon > stuck on cancelation query fragment: > Thread 244 > 0 libpthread-2.17.so + 0xba35 > rax = 0xfffffffffffffe00 rdx = 0x0000000000000002 > rcx = 0xffffffffffffffff rbx = 0x000000007cd81b10 > rsi = 0x0000000000000080 rdi = 0x000000007cd81b14 > rbp = 0x00007f7ba5ae8580 rsp = 0x00007f7ba5ae8520 > r8 = 0x000000007cd81b00 r9 = 0x0000000000000000 > r10 = 0x0000000000000000 r11 = 0x0000000000000246 > r12 = 0x00000000eafe6400 r13 = 0x00007f7ba5ae85c0 > r14 = 0x00007f845b7287d0 r15 = 0x00007f7ba5ae8660 > rip = 0x00007f845b727a35 > Found by: given as instruction pointer in context > 1 impalad!impala::QueryState::Cancel() + 0xdb > rbp = 0x00007f7ba5ae8600 rsp = 0x00007f7ba5ae8590 > rip = 0x00000000011791bb > Found by: previous frame's frame pointer > 2 > impalad!impala::ControlService::CancelQueryFInstances(impala::CancelQueryFInstancesRequestPB > const*, impala::CancelQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) + > 0x177 > rbx = 0x00007f8458e136a0 rbp = 0x00007f7ba5ae8780 > rsp = 0x00007f7ba5ae8610 r12 = 0x00007f7ba5ae8720 > r13 = 0x00007f7ba5ae86a0 rip = 0x0000000001218f77 > Found by: call frame info > 3 impalad!kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) + > 0x17c > rbx = 0x0000000015e4e460 rbp = 0x00007f7ba5ae87e0 > rsp = 0x00007f7ba5ae8790 r12 = 0x00000007a6bf8ee0 > r13 = 0x0000000014f86740 r14 = 0x0000000014f86f00 > r15 = 0x0000000014f87480 rip = 0x0000000001788ffc > Found by: call frame info > 4 impalad!impala::ImpalaServicePool::RunThread() + 0x1be > rbx = 0x00007f840000000d rbp = 0x00007f7ba5ae88a0 > rsp = 0x00007f7ba5ae87f0 r12 = 0x0000000018b30f80 > r13 = 0x0000000000000000 r14 = 0x0000000000000051 > r15 = 0x00007f840000000d rip = 0x00000000010dbdee > Found by: call frame info > 5 impalad!impala::Thread::SuperviseThread(std::string const&, std::string > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, > impala::Promise<long, (impala::PromiseMode)0>*) + 0x30b > rbx = 0x00007f7ba5ae8970 rbp = 0x00007f7ba5ae8be0 > rsp = 0x00007f7ba5ae88b0 r12 = 0x00007ffed2cdb298 > r13 = 0x000000000592ee20 r14 = 0x00007f7ba5ae8910 > r15 = 0x00007f8458e136a0 rip = 0x0000000001435f8b > Found by: call frame info > 6 impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void > (*)(std::string const&, std::string const&, boost::function<void ()>, > impala::ThreadDebugInfo const*, impala::Promise<long, > (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::string>, > boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, > boost::_bi::value<impala::ThreadDebugInfo*>, > boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > > >::run() + 0x7a > rbx = 0x0000000015e34e00 rbp = 0x00007f7ba5ae8c40 > rsp = 0x00007f7ba5ae8bf0 r12 = 0x00007f7ba5ae8c00 > r13 = 0x0000000001435c80 r14 = 0x0000000000000000 > r15 = 0x00007f7ba5ae9700 rip = 0x0000000001436e5a > Found by: call frame info > 7 impalad!thread_proxy + 0xea > rbx = 0x0000000015e34e00 rbp = 0x0000000000000000 > rsp = 0x00007f7ba5ae8c50 r12 = 0x00007f7ba5ae8c50 > r13 = 0x0000000000801000 r14 = 0x0000000000000000 > r15 = 0x00007f7ba5ae9700 rip = 0x0000000001c18e1a > Found by: call frame info > 8 libpthread-2.17.so + 0x7ea5 > rbx = 0x0000000000000000 rbp = 0x0000000000000000 > rsp = 0x00007f7ba5ae8ca0 r12 = 0x0000000000000000 > r13 = 0x0000000000801000 r14 = 0x0000000000000000 > r15 = 0x00007f7ba5ae9700 rip = 0x00007f845b723ea5 > Found by: call frame info > 9 libc-2.17.so + 0xfeb0d > rsp = 0x00007f7ba5ae8d40 rip = 0x00007f8458321b0d > Found by: stack scanning -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org