Hello Thomas Marshall, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13057 to look at the new patch set (#3). Change subject: IMPALA-8270: fix MemTracker teardown in FeSupport ...................................................................... IMPALA-8270: fix MemTracker teardown in FeSupport This patch tries to simplify and standardise the order in which control structures are torn down. As a consequence the bug is fixed. I've described the bug below. The changes are: * Make more control structures owned directly by QueryState::obj_pool_, so that they are all destroyed at the same time via ~QueryState. * Tear down local_query_state_ explicitly before other destructors run. Either change is sufficient to fix the bug, but I preferred to do both to reduce the chances of similar bugs in future. Description of bug: =================== In the normal query execution flow: - RuntimeState is in QueryState::obj_pool_ - RuntimeState owns RuntimeState::instance_mem_tracker_ via scoped_ptr - QueryState::query_mem_tracker_ is in QueryState::obj_pool_ - QueryState::query_mem_tracker_ has a reference to RuntimeState::instance_mem_tracker_ The tear-down works because ~QueryState unregisters query_mem_tracker_ from its parent, making the whole subtree unreachable before destroying QueryState::obj_pool_. It is thus safe to destroy instance_mem_tracker_ along with the rest of obj_pool_. FeSupport messes this up by having RuntimeState own the QueryState RuntimeState::local_query_state_ via a scoped_ptr, and the implied destructor order means that RuntimeState::instance_mem_tracker_ is destroyed before RuntimeState::local_query_state_, which breaks the above flow and the destroyed instance_mem_tracker_ is reachable from the process MemTracker via QueryState::query_mem_tracker_ for a small window until it is unregistered. Testing: Added a backend test that reproduced the ASAN use-after-free failure when run against unmodified RuntimeState code. I did not make it a unified backend test so that it would be easier to backport this fix to older versions that don't have unified tests. Change-Id: If815130cd4db00917746f10b28514f779ee254f0 --- M be/src/runtime/CMakeLists.txt A be/src/runtime/runtime-state-test.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h 4 files changed, 92 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/13057/3 -- To view, visit http://gerrit.cloudera.org:8080/13057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If815130cd4db00917746f10b28514f779ee254f0 Gerrit-Change-Number: 13057 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>