[ https://issues.apache.org/jira/browse/IMPALA-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Hecht resolved IMPALA-7046. ------------------------------- Resolution: Fixed Fix Version/s: Impala 3.1.0 commit 11aaa6caa0818a30db662a4b6f07147faf1f52b2 Author: Dan Hecht <dhe...@cloudera.com> Date: Mon Jun 11 16:30:06 2018 -0700 IMPALA-7046: introduce "global" debug_actions The motivation is to add jitter to backend startup in test_failpoints. The race in IMPALA-7033 can be reproduced by adding jitter to the exec rpcs when some backends fail. Let's add jitter to test_failpoints to get better coverage of exec startup races. This builds on top of the debug action extensions added in the async admission control patch by allowing the new "global" debug actions (i.e. actions that can be used in points outside of the ExecNodes). See the code comments for details. For now, we're only using the SLEEP and JITTER commands, but I've included a FAIL command as well since I'll want to use that to write a test for IMPALA-6788 to simulate exec rpc failure. Note that I don't bother resolving the actions ahead of time (like we do for ExecNode actions). It doesn't seem worth it since the resolution only needs to occur after we've matched the label and I don't expect the same label to be hit many times within a single thread. We can always optimize this later if needed. Testing: - Verified that test_failpoints can reproduce the race in IMPALA-7033 by reverting that fix and testing. - Ran the modified tests and grepped the impalad log to see that the sleeps are still occuring. - Manually verify global FAIL command (in a build with another patch). - Manually verified invalid debug_actions (both ExecNode and global) Change-Id: I77663a539be18711a4f12c470ffd7474e3d69388 Reviewed-on: http://gerrit.cloudera.org:8080/10690 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Add targeted regression test for race in IMPALA-7033 > ---------------------------------------------------- > > Key: IMPALA-7046 > URL: https://issues.apache.org/jira/browse/IMPALA-7046 > Project: IMPALA > Issue Type: Task > Components: Backend > Affects Versions: Impala 3.1.0 > Reporter: Dan Hecht > Assignee: Dan Hecht > Priority: Major > Fix For: Impala 3.1.0 > > > I'd like to add a regression test to trigger the race in IMPALA-7033 more > reliably, but it will involve doing some sleeps at specific places, so I'd > like to add it after [~bikramjeet.vig] commits a change that provides some > infrastructure for that. > The race was: > 1) Coordinator::Exec() takes the QueryState ExecResources reference count. > 2) Coordinator sends out exec rpc to non-coordinator backend. > 3) Some non-coordinator backend sends a failure report which invokes > HandleExecStateTransition, which drops the coordinator's reference to the > exec resources. > 4) Coordinator sends out exec rpc to coordinator backend, which takes the > exec resources reference and releases it. We don't expect the reference count > to become non-zero after it has already gone through a cycle. > The fix for this race is included in [https://gerrit.cloudera.org/#/c/10440] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org