[ https://issues.apache.org/jira/browse/IMPALA-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell resolved IMPALA-10047. ------------------------------------ Fix Version/s: Impala 4.0 Resolution: Fixed The core part of IMPALA-6684 was reverted, and subsequent testing found this regression eliminated. > Performance regression on short queries due to IMPALA-6984 fix > -------------------------------------------------------------- > > Key: IMPALA-10047 > URL: https://issues.apache.org/jira/browse/IMPALA-10047 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 4.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Blocker > Fix For: Impala 4.0 > > > When doing some TPC-DS benchmarking with mt_dop, we encountered intermittent > performance regressions on short queries. Some query executions seem to be > taking an extra 10 seconds in exec status reports due to delays in sending a > cancel RPC. From the coordinator logs: > > {noformat} > W0804 02:52:33.922088 108 rpcz_store.cc:253] Call > impala.ControlService.CancelQueryFInstances from 127.0.0.1:46738 (request > call id 3134) took 10007 ms (10 s). Client timeout 10000 ms (10 s) > W0804 02:52:33.922143 108 rpcz_store.cc:259] Trace: > 0804 02:52:23.914291 (+ 0us) impala-service-pool.cc:170] Inserting onto > call queue > 0804 02:52:33.922079 (+10007788us) impala-service-pool.cc:255] Skipping call > since client already timed out > 0804 02:52:33.922086 (+ 7us) inbound_call.cc:162] Queueing failure > response > Metrics: {} > I0804 02:52:33.922214 101 connection.cc:730] Got response to call id 3134 > after client already timed out or cancelled > I0804 02:52:33.923286 20276 coordinator-backend-state.cc:889] > query_id=f442e73a0d35c136:c9993d7700000000 target backend=xx.xx.xx.xx:27000: > Sending CancelQueryFInstances rpc{noformat} > The rpcz page also shows that some ReportExecStatus RPCs are taking 10 > seconds: > > > {noformat} > "incoming_queue_time": "Count: 671901, min / max: 1000.000ns / 10s347ms, 25th > %-ile: 12.000us, 50th %-ile: 18.000us, 75th %-ile: 28.000us, 90th %-ile: > 67.000us, 95th %-ile: 456.000us, 99.9th %-ile: 10s133ms", > { > "method_name": "ReportExecStatus", > "handler_latency": "Count: 169653, min / max: 38.000us / > 10s173ms, 25th %-ile: 9.024ms, 50th %-ile: 20.352ms, 75th %-ile: 35.840ms, > 90th %-ile: 94.720ms, 95th %-ile: 177.152ms, 99.9th %-ile: 10s027ms", > "payload_size": "Count: 169653, min / max: 5.81 KB / 3.81 MB, > 25th %-ile: 425.00 KB, 50th %-ile: 760.00 KB, 75th %-ile: 1.47 MB, 90th > %-ile: 1.96 MB, 95th %-ile: 2.31 MB, 99.9th %-ile: 3.73 MB" > }]{noformat} > > IMPALA-6984 introduced a Coordinator::CancelBackends() call to > Coordinator::HandleExecStateTransition() for the ExecState::RETURNED_RESULTS > case: > {noformat} > if (new_state == ExecState::RETURNED_RESULTS) { > // Cancel all backends, but wait for the final status reports to be > received so that > // we have a complete profile for this successful query. > CancelBackends(/*fire_and_forget=*/ false); > WaitForBackends(); > } else { > CancelBackends(/*fire_and_forget=*/ true); > }{noformat} > Removing this call eliminates the performance regression, so it will need > more investigation. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org