Dan Hecht has posted comments on this change. Change subject: IMPALA-1972/IMPALA-3882: Fix client_request_state_map_lock_ contention ......................................................................
Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-beeswax-server.cc File be/src/service/impala-beeswax-server.cc: PS7, Line 296: NULL nit: change that one too to at least keep functions consistent http://gerrit.cloudera.org:8080/#/c/6707/7/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: PS7, Line 721: just return that's not what the code does (it also sets plan_metadata_unavailable), please rephrase. Could rephrase the whole comment as: If the query plan isn't generated, avoid waiting for the lock, which could take a while if catalog metadata is being loaded. PS7, Line 730: adopt_lock_t shouldn't that be deleted? http://gerrit.cloudera.org:8080/#/c/6707/7/tests/custom_cluster/test_query_concurrency.py File tests/custom_cluster/test_query_concurrency.py: PS7, Line 32: The intention here is to check contention on the query_exec_state_map_lock_ This is talking about how the old code worked, which won't make sense to people reading the current code (after this change). It should say something like: The intention is to check that the webserver does not hold any global locks or otherwise prevent impalad from servicing new requests. PS7, Line 54: This creates lock contention on the coordinator by : calling QuerySummaryHandler() method This is no longer true with your fix. How about saying: This is to verify that QuerySummaryHandler() does not hold any global locks that would, for example, prevent another query from starting. PS7, Line 74: time.sleep(2) I'm worried that this will be flaky, especially with ASAN. Instead of this delay, couldn't we just wait for in_flight_queries to become 1? And you could use the parameter to get_in_flight_queries() to do that by passing some largish value. That has the advantage that we'll wait only as long as necessary for the value to change to 1, so we can have a relatively long timeout (rather than delay). PS7, Line 83: time.sleep(2) this delay is a bit harder to eliminate. How about we increase --stress_metadata_loading_pause_injection_ms to something really large, say 1000 seconds (which doesn't matter -- we don't actually need the queries to finish planning to end the test, right?). And then we can use a larger timeout here, but we don't need to delay for it. We can just do: inflight_query_ids = impalad.service.get_in_flight_queries(30) which will poll the webui once per second and give up after 30 seconds. -- To view, visit http://gerrit.cloudera.org:8080/6707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie44daa93e3ae4d04d091261f3ec4891caffe8026 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-HasComments: Yes