[ 
https://issues.apache.org/jira/browse/IMPALA-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217245#comment-17217245
 ] 

Tim Armstrong commented on IMPALA-9884:
---------------------------------------

This assertion is meant to check that, when the first wave of queries is 
submitted, none of them will be queued and then dequeued (because none of the 
admitted queries will finish running before the initial wave of admission 
decisions is made). 

This could be because a query was dequeued when it shouldn't have been. But it 
could also be explained if the test is incorrect in detecting when the 
admission decisions have been made for the first wave. One thing that's

Piecing together what happened, the test was stuck for a long time waiting for 
the initial admission decisions. I see that the last query rejected (out of 34 
rejected) in the initial wave was at 01:11:31:
{noformat}
$ grep 'Rejected' 
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-011116.23933
...
I1017 01:11:31.068819 25916 admission-controller.cc:1169] 
8448348c952b1d85:ac9a6bb100000000] Rejected query from pool default-pool: queue 
full, limit=10, num_queued=10.
{noformat}

There are 5 queries admitted right away, then a 6th admitted from the queue, 
which is likely the problematic one:
{noformat}
$grep 'Admitting.*que' 
impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-011116.23933
I1017 01:11:24.310699 25150 admission-controller.cc:1185] 
cf498fd1ece032b6:b4f673d100000000] Admitting query 
id=cf498fd1ece032b6:b4f673d100000000
I1017 01:11:24.317937 25152 admission-controller.cc:1185] 
1b4e1ee5d51fc461:1221932500000000] Admitting query 
id=1b4e1ee5d51fc461:1221932500000000
I1017 01:11:24.321123 25154 admission-controller.cc:1185] 
4a4d18e5caa85310:022e022900000000] Admitting query 
id=4a4d18e5caa85310:022e022900000000
I1017 01:11:24.327414 25157 admission-controller.cc:1185] 
8f462fa2ce60d289:e063147100000000] Admitting query 
id=8f462fa2ce60d289:e063147100000000
I1017 01:11:24.334887 25161 admission-controller.cc:1185] 
d5466702e1e5c14e:43f31d3000000000] Admitting query 
id=d5466702e1e5c14e:43f31d3000000000
I1017 01:11:29.677701 24428 admission-controller.cc:1786] Admitting from queue: 
query=3144178b629c699c:dde994b700000000
I1017 01:21:24.471750 24428 admission-controller.cc:1786] Admitting from queue: 
query=bd408b203d1cc15e:141cd95400000000
I1017 01:21:24.471935 24428 admission-controller.cc:1786] Admitting from queue: 
query=7e47cf24a70c6e29:fe48a51700000000
I1017 01:21:24.472086 24428 admission-controller.cc:1786] Admitting from queue: 
query=9041a95ec5b2de44:536b85bc00000000
I1017 01:21:24.472236 24428 admission-controller.cc:1786] Admitting from queue: 
query=9b4b9f3a12265bad:3c8fbdab00000000
I1017 01:21:24.472388 24428 admission-controller.cc:1786] Admitting from queue: 
query=254d8911a574bbda:4ddbf95c00000000
I1017 01:21:29.131278 24428 admission-controller.cc:1786] Admitting from queue: 
query=5744d3d253edafeb:f0bbc04800000000
{noformat}

I think the 10 minute gap might be from the test getting stuck at 
wait_for_admitted_threads here, but I'm not sure:
{noformat}
    LOG.info("Wait for initial admission decisions")
    (metric_deltas, curr_metrics) = self.wait_for_metric_changes(
        ['admitted', 'queued', 'rejected'], initial_metrics, num_queries)
    # Also wait for the test threads that submitted the queries to start 
executing.
    self.wait_for_admitted_threads(metric_deltas['admitted'])
{noformat}

Next is to figure out what happened with 3144178b629c699c:dde994b700000000

> TestAdmissionControllerStress.test_mem_limit failing occasionally
> -----------------------------------------------------------------
>
>                 Key: IMPALA-9884
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9884
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.0
>            Reporter: Vihang Karajgaonkar
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: broken-build, flaky
>         Attachments: 
> impalad.impala-ec2-centos74-m5-4xlarge-ondemand-1925.vpc.cloudera.com.jenkins.log.INFO.20201017-011116.23933.gz
>
>
> Recently, I saw this test failing with the exception trace below. 
> {noformat}
> custom_cluster/test_admission_controller.py:1782: in test_mem_limit
>     {'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
> custom_cluster/test_admission_controller.py:1638: in run_admission_test
>     assert metric_deltas['dequeued'] == 0,\
> E   AssertionError: Queued queries should not run until others are made to 
> finish
> E   assert 1 == 0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to