[ 
https://issues.apache.org/jira/browse/IMPALA-9054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953170#comment-16953170
 ] 

ASF subversion and git services commented on IMPALA-9054:
---------------------------------------------------------

Commit b0c6740faec6b0a00dcfee126ab39324026c0ca9 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b0c6740 ]

IMPALA-8998: admission control accounting for mt_dop

This integrates mt_dop with the "slots" mechanism that's used
for non-default executor groups.

The idea is simple - the degree of parallelism on a backend
determines the number of slots consumed. The effective
degree of parallelism is used, not the raw mt_dop setting.
E.g. if the query only has a single input split and executes
only a single fragment instance on a host, we don't want
to count the full mt_dop value for admission control.

--admission_control_slots is added as a new flag that
replaces --max_concurrent_queries, since the name better
reflects the concept. --max_concurrent_queries is kept
for backwards compatibility and has the same meaning
as --admission_control_slots.

The admission control logic is extended to take this into
account. We also add an immediate rejection code path
since it is now possible for queries to not be admittable
based on the # of available slots.

We only factor in the "width" of the plan - i.e. the number
of instances of fragments. We don't account for the number
of distinct fragments, since they may not actually execute
in parallel with each other because of dependencies.

This number is added to the per-host profile as the
"AdmissionSlots" counter.

Testing:
Added unit tests for rejection and queue/admit checks.

Also includes a fix for IMPALA-9054 where we increase
the timeout.

Added end-to-end tests:
* test_admission_slots in test_mt_dop.py that checks the
  admission slot calculation via the profile.
* End-to-end admission test that exercises the admit
  immediately and queueing code paths.

Added checks to test_verify_metrics (which runs after
end-to-end tests) to ensure that the per-backend
slots in use goes to 0 when the cluster is quiesced.

Change-Id: I7b6b6262ef238df26b491352656a26e4163e46e5
Reviewed-on: http://gerrit.cloudera.org:8080/14357
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>


> Flaky test: test_misformatted_profile_text in query_test/test_cancellation.py
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-9054
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9054
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Tim Armstrong
>            Priority: Major
>
> Saw this in several builds in ubuntu-16.04-dockerised-tests:
> {code}
> FAIL 
> query_test/test_cancellation.py::TestCancellationParallel::()::test_misformatted_profile_text
> =================================== FAILURES 
> ===================================
> ___________ TestCancellationParallel.test_misformatted_profile_text 
> ____________
> [gw8] linux2 -- Python 2.7.12 
> /home/ubuntu/Impala/bin/../infra/python/env/bin/python
> query_test/test_cancellation.py:171: in test_misformatted_profile_text
>     assert any(client.get_state(handle) == 'RUNNING_STATE' or sleep(1)
> E   AssertionError: Query failed to start
> E   assert any(<generator object <genexpr> at 0x7f99c462acd0>)
> ---------------------------- Captured stderr setup 
> -----------------------------
> SET 
> client_identifier=query_test/test_cancellation.py::TestCancellationParallel::()::test_misformatted_profile_text;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2019-10-16 03:20:40,776 INFO     MainThread: Closing active operation
> ----------------------------- Captured stderr call 
> -----------------------------
> -- executing against Impala at localhost:21050
> select count(*) from functional_parquet.alltypes where bool_col = sleep(100);
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50>
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50>
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50>
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50>
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x7f99c4637e50>
> ====== 1 failed, 2607 passed, 151 skipped, 54 xfailed in 3706.37 seconds 
> =======
> {code}
> The test waits 5 seconds for the query to run and then test on cancel it. But 
> somehow the query failed to start in 5 seconds. Maybe 5 seconds is too short 
> for a dockerised env.
> Test failures can be found in:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1427/
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1424/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to