[ 
https://issues.apache.org/jira/browse/PROTON-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254516#comment-17254516
 ] 

Roddie Kieley edited comment on PROTON-2231 at 12/24/20, 12:11 PM:
-------------------------------------------------------------------

I've now tested in the same way on both an 8 core atom Fedora 32 box as well as 
the 6 core macOS box. The behaviour with libuv is broken on both while the 
epoll proactor implementation on the Fedora 32 box is well behaved every run. 
Looking more closely at libuv we see that the problem behaviours, i.e. stopping 
new listening, hanging on exit, or asserting as per the description, appear to 
be caused by the fact that at some point in the test execution uv_run does not 
return when called in the chain pn_proactor_wait -> leader_lead_lh -> uv_run. 
This causes the current leader to not relinquish the lead and the other threads 
to be unable to do more work or exit. NOTE that we see this more clearly when 
we change the uv_cond_wait call in pn_proactor_wait to un_cond_timedwait for 1 
second and observe that the non leader threads dutifully continue to check once 
per second rather than blocking while uv_run is blocked and the leader doesn't 
change.

Temporarily switching to pn_proactor_get instead of pn_proactor_wait for the 
test shows that execution runs successfully until the end when leader_lead_lh 
again blocks at uv_run, does not exit, cannot join, and thus the test is still 
hung although main test execution behaviour appears correct. If we add a 
uv_loop_alive check before the uv_run this check passes, i.e. returns != 0, and 
uv_run continues to cause the problem.

As the libuv documentation says that for 
[UV_RUN_ONCE|http://docs.libuv.org/en/v1.x/loop.html]
{quote}Note that this function blocks if there are no pending callbacks.
{quote}
Further investigation will be along those lines, however suggestions welcome 
[~astitcher] or [~cliffjansen].


was (Author: rkieley):
I've now tested in the same way on both an 8 core atom Fedora 32 box as well as 
the 6 core macOS box. The behaviour with libuv is broken on both while the 
epoll proactor implementation on the Fedora 32 box is well behaved every run. 
Looking more closely at libuv we see that the problem behaviours, i.e. stopping 
new listening, hanging on exit, or asserting as per the description, appear to 
be caused by the fact that at some point in the test execution uv_run does not 
return when called in the chain pn_proactor_wait -> leader_lead_lh -> uv_run. 
This causes the current leader to not relinquish the lead and the other threads 
to be unable to do more work or exit.

Temporarily switching to pn_proactor_get instead of pn_proactor_wait for the 
test shows that execution runs successfully until the end when leader_lead_lh 
again blocks at uv_run, does not exit, cannot join, and thus the test is still 
hung although main test execution behaviour appears correct. If we add a 
uv_loop_alive check before the uv_run this check passes, i.e. returns != 0, and 
uv_run continues to cause the problem.

As the libuv documentation says that for 
[UV_RUN_ONCE|http://docs.libuv.org/en/v1.x/loop.html]
{quote}Note that this function blocks if there are no pending callbacks.
{quote}
Further investigation will be along those lines, however suggestions welcome 
[~astitcher] or [~cliffjansen].

> Assertion fail on macOS with libuv in c-threaderciser test
> ----------------------------------------------------------
>
>                 Key: PROTON-2231
>                 URL: https://issues.apache.org/jira/browse/PROTON-2231
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: proton-c-0.32.0
>            Reporter: Jiri Daněk
>            Assignee: Roddie Kieley
>            Priority: Major
>              Labels: freebsd, macOS
>
> As described on PROTON-2225, the test fails with assertion error. It is 
> currently disabled on macOS for this reason (in .travis.yml).
> {noformat}
> 6: Test command: /usr/local/opt/python/libexec/bin/python 
> "/Users/travis/build/jiridanek/qpid-proton/scripts/env.py" "--" 
> "/Users/travis/build/jiridanek/qpid-proton/build/c/tests/c-threaderciser"
> 6: Test timeout computed to be: 1500
> 6: threaderciser start: threads=8, time=1, actions=[listen, close-listen, 
> connect, close-connect, wake, timeout, cancel-timeout]
> 6: Assertion failed: (p->active > 0), function remove_active_lh, file 
> /Users/travis/build/jiridanek/qpid-proton/c/src/proactor/libuv.c, line 392.
>  6/31 Test  #6: c-threaderciser ..................***Failed    0.18 sec
> {noformat}
> If the test is meant to stay disabled for a longer time, it will have to be 
> disabled in CMakeLists.txt, so that users compiling the project do not run it 
> accidentally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to