[ https://issues.apache.org/jira/browse/IMPALA-9054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953170#comment-16953170 ]
ASF subversion and git services commented on IMPALA-9054: --------------------------------------------------------- Commit b0c6740faec6b0a00dcfee126ab39324026c0ca9 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b0c6740 ] IMPALA-8998: admission control accounting for mt_dop This integrates mt_dop with the "slots" mechanism that's used for non-default executor groups. The idea is simple - the degree of parallelism on a backend determines the number of slots consumed. The effective degree of parallelism is used, not the raw mt_dop setting. E.g. if the query only has a single input split and executes only a single fragment instance on a host, we don't want to count the full mt_dop value for admission control. --admission_control_slots is added as a new flag that replaces --max_concurrent_queries, since the name better reflects the concept. --max_concurrent_queries is kept for backwards compatibility and has the same meaning as --admission_control_slots. The admission control logic is extended to take this into account. We also add an immediate rejection code path since it is now possible for queries to not be admittable based on the # of available slots. We only factor in the "width" of the plan - i.e. the number of instances of fragments. We don't account for the number of distinct fragments, since they may not actually execute in parallel with each other because of dependencies. This number is added to the per-host profile as the "AdmissionSlots" counter. Testing: Added unit tests for rejection and queue/admit checks. Also includes a fix for IMPALA-9054 where we increase the timeout. Added end-to-end tests: * test_admission_slots in test_mt_dop.py that checks the admission slot calculation via the profile. * End-to-end admission test that exercises the admit immediately and queueing code paths. Added checks to test_verify_metrics (which runs after end-to-end tests) to ensure that the per-backend slots in use goes to 0 when the cluster is quiesced. Change-Id: I7b6b6262ef238df26b491352656a26e4163e46e5 Reviewed-on: http://gerrit.cloudera.org:8080/14357 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> > Flaky test: test_misformatted_profile_text in query_test/test_cancellation.py > ----------------------------------------------------------------------------- > > Key: IMPALA-9054 > URL: https://issues.apache.org/jira/browse/IMPALA-9054 > Project: IMPALA > Issue Type: Bug > Reporter: Quanlong Huang > Assignee: Tim Armstrong > Priority: Major > > Saw this in several builds in ubuntu-16.04-dockerised-tests: > {code} > FAIL > query_test/test_cancellation.py::TestCancellationParallel::()::test_misformatted_profile_text > =================================== FAILURES > =================================== > ___________ TestCancellationParallel.test_misformatted_profile_text > ____________ > [gw8] linux2 -- Python 2.7.12 > /home/ubuntu/Impala/bin/../infra/python/env/bin/python > query_test/test_cancellation.py:171: in test_misformatted_profile_text > assert any(client.get_state(handle) == 'RUNNING_STATE' or sleep(1) > E AssertionError: Query failed to start > E assert any(<generator object <genexpr> at 0x7f99c462acd0>) > ---------------------------- Captured stderr setup > ----------------------------- > SET > client_identifier=query_test/test_cancellation.py::TestCancellationParallel::()::test_misformatted_profile_text; > -- connecting to: localhost:21000 > -- connecting to localhost:21050 with impyla > -- 2019-10-16 03:20:40,776 INFO MainThread: Closing active operation > ----------------------------- Captured stderr call > ----------------------------- > -- executing against Impala at localhost:21050 > select count(*) from functional_parquet.alltypes where bool_col = sleep(100); > -- getting state for operation: > <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50> > -- getting state for operation: > <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50> > -- getting state for operation: > <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50> > -- getting state for operation: > <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50> > -- getting state for operation: > <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50> > ====== 1 failed, 2607 passed, 151 skipped, 54 xfailed in 3706.37 seconds > ======= > {code} > The test waits 5 seconds for the query to run and then test on cancel it. But > somehow the query failed to start in 5 seconds. Maybe 5 seconds is too short > for a dockerised env. > Test failures can be found in: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1427/ > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1424/ -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org