[ https://issues.apache.org/jira/browse/PROTON-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Justin Ross updated PROTON-639: ------------------------------- Labels: messenger (was: ) > pn_messenger_recv hangs / spins on connection refused > ----------------------------------------------------- > > Key: PROTON-639 > URL: https://issues.apache.org/jira/browse/PROTON-639 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c > Affects Versions: 0.7, 0.8 > Environment: Red Hat Enterprise Linux 6.5 > kernel: 2.6.32-431.1.2.el6.x86_64 > qpid-proton 0.7 and 9939b8a990cd53c1b5e099c083bdcf61ad22232b git-svn-id: > https://svn.apache.org/repos/asf/qpid/proton/trunk@1613151 > 13f79535-47bb-0310-9956-ffa450edef68 > Reporter: Rohan McGovern > Labels: messenger > > If I try to connect to a closed port with a messenger, pn_messenger_recv > outputs messages to stderr and then spins at high CPU usage, rather than > returning with an error as expected. > This seems to be impacted by kernel version. I have a RHEL 6.5 machine which > demonstrates this problem reliably when using kernel > 2.6.32-431.1.2.el6.x86_64 and not when using 3.10.28-1.el6.elrepo.x86_64 . > This can be easily reproduced using the "recv" example in the qpid-proton > sources. > {noformat:title=kernel 2.6.32 - broken} > $ build/examples/messenger/c/recv amqp://127.0.0.1:1 > recv: Connection refused > [0x63d8e0]:ERROR amqp:connection:framing-error SASL header mismatch: '' > CONNECTION ERROR connection aborted (remote) > # hangs at this point with high CPU usage > {noformat} > Compare with the behavior on a later kernel version, which seems right: > {noformat:title=kernel 3.10.28 - expected behavior} > $ build/examples/messenger/c/recv amqp://127.0.0.1:1 > recv: Connection refused > [0x15af8e0]:ERROR amqp:connection:framing-error SASL header mismatch: '' > CONNECTION ERROR connection aborted (remote) > send: Broken pipe > /home/rmcgover/src/qpid-proton/examples/messenger/c/recv.c:132: no valid > sources > # exits with exit code 1 > {noformat} > Here's a sample backtrace when the hang is occurring: > {noformat} > (gdb) bt > #0 0x00007ffff7ffea11 in clock_gettime () > #1 0x0000003a51e03e46 in clock_gettime () from /lib64/librt.so.1 > #2 0x00007ffff7de6b5e in pn_i_now () from > /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2 > #3 0x00007ffff7de4c06 in pn_selector_select () from > /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2 > #4 0x00007ffff7ddf736 in pni_wait () from > /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2 > #5 0x00007ffff7ddf869 in pn_messenger_tsync () from > /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2 > #6 0x00007ffff7ddf8df in pn_messenger_sync () from > /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2 > #7 0x00007ffff7de1676 in pn_messenger_recv () from > /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2 > #8 0x00000000004014b2 in main () > {noformat} > There's a while(true) loop in pn_messenger_tsync which seems like it never > escapes. strace also shows that the process is repeatedly doing a poll. -- This message was sent by Atlassian JIRA (v6.3.4#6332)