Hello Michael Ho, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11339

to look at the new patch set (#3).

Change subject: IMPALA-7464: fix race when ExecFInstance() RPC times out
......................................................................

IMPALA-7464: fix race when ExecFInstance() RPC times out

The "exec resources" reference count on the QueryState expects that
it will transition from 0 -> non-zero -> 0 at most once.  The
reference count is taken on the coordinator side (sender of this
RPC) and also the backend (receiver of this RPC).  Usually, the
lifetimes of those references overlap (the coordinator won't give up
the reference until the backend execution is complete or failed),
and so the assumption is not violated. However, when the RPC times
out, the receiver may run after the sender has given up its
reference (since the sender doesn't know the receiver is actually
still executing).

As it turns out, the coordinator doesn't really need to take a
reference given the current code (verified via code inspection), as
these resources are backend-only). So, stop taking the reference on
the coordinator side, and add some DCHECKs to document that (the
dchecks aren't particularly good at verifying it, however, since the
lifetimes generally will overlap).

Note that this patch can't be easily backported to older versions
without careful inspection since older versions of the code may have
relied on the reference count protecting things used by the
coordinator.

Testing:
- New test_rpc_timeout case that reproduced the problem 100%
- exhaustive build

Change-Id: If60d983e0e68b00e6557185db1f86757ab8b3f2d
---
M be/src/benchmarks/process-wide-locks-benchmark.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/test-env.cc
M tests/custom_cluster/test_rpc_timeout.py
8 files changed, 76 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/11339/3
--
To view, visit http://gerrit.cloudera.org:8080/11339
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If60d983e0e68b00e6557185db1f86757ab8b3f2d
Gerrit-Change-Number: 11339
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to