[jira] [Reopened] (QPIDJMS-376) notify the ExceptionListner when a consumer with a MessageListener remotely closes
[ https://issues.apache.org/jira/browse/QPIDJMS-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Gemmell reopened QPIDJMS-376: > notify the ExceptionListner when a consumer with a MessageListener remotely > closes > -- > > Key: QPIDJMS-376 > URL: https://issues.apache.org/jira/browse/QPIDJMS-376 > Project: Qpid JMS > Issue Type: Bug > Components: qpid-jms-client >Affects Versions: 0.31.0 > Environment: AMQP Server: Enmasse 0.17.1 > Enmasse Address Type: anycast >Reporter: Daniel Maier >Priority: Major > Fix For: 0.32.0 > > Attachments: clientlogs.txt > > > When I create a consumer to an address that just does not exist, I expected > to get some exception or that the client retries the operation. But there > seems not even to be a log message which indicates a failure. > Is this intended behavior or is this a bug? A more general description is: If > AMQP server closes the receiver link, qpid jms client does not notify the > user anyhow or does not re-establish the link. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (QPIDJMS-376) notify the ExceptionListener when a consumer with a MessageListener remotely closes
[ https://issues.apache.org/jira/browse/QPIDJMS-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Gemmell closed QPIDJMS-376. -- Resolution: Fixed > notify the ExceptionListener when a consumer with a MessageListener remotely > closes > --- > > Key: QPIDJMS-376 > URL: https://issues.apache.org/jira/browse/QPIDJMS-376 > Project: Qpid JMS > Issue Type: Bug > Components: qpid-jms-client >Affects Versions: 0.31.0 > Environment: AMQP Server: Enmasse 0.17.1 > Enmasse Address Type: anycast >Reporter: Daniel Maier >Priority: Major > Fix For: 0.32.0 > > Attachments: clientlogs.txt > > > When I create a consumer to an address that just does not exist, I expected > to get some exception or that the client retries the operation. But there > seems not even to be a log message which indicates a failure. > Is this intended behavior or is this a bug? A more general description is: If > AMQP server closes the receiver link, qpid jms client does not notify the > user anyhow or does not re-establish the link. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (QPIDJMS-376) notify the ExceptionListener when a consumer with a MessageListener remotely closes
[ https://issues.apache.org/jira/browse/QPIDJMS-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Gemmell updated QPIDJMS-376: --- Summary: notify the ExceptionListener when a consumer with a MessageListener remotely closes (was: notify the ExceptionListner when a consumer with a MessageListener remotely closes) > notify the ExceptionListener when a consumer with a MessageListener remotely > closes > --- > > Key: QPIDJMS-376 > URL: https://issues.apache.org/jira/browse/QPIDJMS-376 > Project: Qpid JMS > Issue Type: Bug > Components: qpid-jms-client >Affects Versions: 0.31.0 > Environment: AMQP Server: Enmasse 0.17.1 > Enmasse Address Type: anycast >Reporter: Daniel Maier >Priority: Major > Fix For: 0.32.0 > > Attachments: clientlogs.txt > > > When I create a consumer to an address that just does not exist, I expected > to get some exception or that the client retries the operation. But there > seems not even to be a log message which indicates a failure. > Is this intended behavior or is this a bug? A more general description is: If > AMQP server closes the receiver link, qpid jms client does not notify the > user anyhow or does not re-establish the link. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (QPIDJMS-379) Reduce garbage created on input from transport
[ https://issues.apache.org/jira/browse/QPIDJMS-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Gemmell closed QPIDJMS-379. -- Resolution: Fixed > Reduce garbage created on input from transport > -- > > Key: QPIDJMS-379 > URL: https://issues.apache.org/jira/browse/QPIDJMS-379 > Project: Qpid JMS > Issue Type: Improvement > Components: qpid-jms-client >Affects Versions: 0.31.0 >Reporter: Timothy Bish >Assignee: Timothy Bish >Priority: Major > Fix For: 0.32.0 > > > The input processor can be simplified to reduce temporary objects create on > incoming bytes processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Reopened] (QPIDJMS-379) Reduce garbage create on input from transport
[ https://issues.apache.org/jira/browse/QPIDJMS-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Gemmell reopened QPIDJMS-379: > Reduce garbage create on input from transport > - > > Key: QPIDJMS-379 > URL: https://issues.apache.org/jira/browse/QPIDJMS-379 > Project: Qpid JMS > Issue Type: Improvement > Components: qpid-jms-client >Affects Versions: 0.31.0 >Reporter: Timothy Bish >Assignee: Timothy Bish >Priority: Major > Fix For: 0.32.0 > > > The input processor can be simplified to reduce temporary objects create on > incoming bytes processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (QPIDJMS-379) Reduce garbage created on input from transport
[ https://issues.apache.org/jira/browse/QPIDJMS-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Gemmell updated QPIDJMS-379: --- Summary: Reduce garbage created on input from transport (was: Reduce garbage create on input from transport) > Reduce garbage created on input from transport > -- > > Key: QPIDJMS-379 > URL: https://issues.apache.org/jira/browse/QPIDJMS-379 > Project: Qpid JMS > Issue Type: Improvement > Components: qpid-jms-client >Affects Versions: 0.31.0 >Reporter: Timothy Bish >Assignee: Timothy Bish >Priority: Major > Fix For: 0.32.0 > > > The input processor can be simplified to reduce temporary objects create on > incoming bytes processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[ANNOUNCE] Apache Qpid JMS 0.32.0 released
The Apache Qpid (http://qpid.apache.org) community is pleased to announce the immediate availability of Apache Qpid JMS 0.32.0. This is the latest release of our newer JMS client supporting the Advanced Message Queuing Protocol 1.0 (AMQP 1.0, ISO/IEC 19464, http://www.amqp.org), based around the Apache Qpid Proton protocol engine and implementing the AMQP JMS Mapping as it evolves at OASIS. The release is available now from our website: http://qpid.apache.org/download.html Binaries are also available via Maven Central: http://qpid.apache.org/maven.html Release notes can be found at: http://qpid.apache.org/releases/qpid-jms-0.32.0/release-notes.html Thanks to all involved, Robbie - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-989) symlinks in tree to non existent files, possibly stale and could be removed?
Robbie Gemmell created DISPATCH-989: --- Summary: symlinks in tree to non existent files, possibly stale and could be removed? Key: DISPATCH-989 URL: https://issues.apache.org/jira/browse/DISPATCH-989 Project: Qpid Dispatch Issue Type: Task Components: Console Affects Versions: 1.1.0 Reporter: Robbie Gemmell Assignee: Ernest Allen Fix For: 1.2.0 There are a number of symlinks in the console tree which point to files that no longer exist. Its not clear these are actually required anymore, and may be stale and could be removed? The targets were seemingly removed by DISPATCH-917 Dir console/test/css/: brokers.ttf -> ../../stand-alone/plugin/css/brokers.ttf dispatch.css -> ../../stand-alone/plugin/css/dispatch.css plugin.css -> ../../stand-alone/plugin/css/plugin.css site-base.css -> ../../stand-alone/plugin/css/site-base.css Dir console/test/html/: qdrConnect.html -> ../../stand-alone/plugin/html/qdrConnect.html qdrLayout.html -> ../../stand-alone/plugin/html/qdrLayout.html Dir console/test/js/: qdrService.js -> ../../stand-alone/plugin/js/qdrService.js Dir console/test/lib/: rhea-min.js -> ../../stand-alone/plugin/lib/rhea-min.js -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (DISPATCH-988) Documentation of policy default vhost is wrong
[ https://issues.apache.org/jira/browse/DISPATCH-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Rolke resolved DISPATCH-988. -- Resolution: Fixed Fix Version/s: 1.2.0 Fixed at Commit 945cac6fd > Documentation of policy default vhost is wrong > -- > > Key: DISPATCH-988 > URL: https://issues.apache.org/jira/browse/DISPATCH-988 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 1.0.1 >Reporter: Chuck Rolke >Assignee: Chuck Rolke >Priority: Major > Fix For: 1.2.0 > > > The policy defaultVhost property is described incorrectly. it is enabled by > default and set to the vhost name _$default_. Default vhost processing is > disabled when 1) the defaultVhost property is set to blank or 2) when there > is no vhost whose hostname matches the defaultVhost setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[GitHub] qpid-dispatch pull request #301: DISPATCH-927 - System test for fix. Makes s...
Github user asfgit closed the pull request at: https://github.com/apache/qpid-dispatch/pull/301 --- - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-927) detach not echoed back on multi-hop link route
[ https://issues.apache.org/jira/browse/DISPATCH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467416#comment-16467416 ] ASF subversion and git services commented on DISPATCH-927: -- Commit 7e4dfd7334ea994719e178cba78998c1933f60dc in qpid-dispatch's branch refs/heads/master from [~fgiorget] [ https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=7e4dfd7 ] DISPATCH-927 - System test for fix. Makes sure both detaches are echoed back > detach not echoed back on multi-hop link route > -- > > Key: DISPATCH-927 > URL: https://issues.apache.org/jira/browse/DISPATCH-927 > Project: Qpid Dispatch > Issue Type: Bug > Components: Container >Affects Versions: 1.0.0 >Reporter: Gordon Sim >Assignee: Ganesh Murthy >Priority: Major > Fix For: 1.1.0 > > Attachments: DISPATCH-927.patch, broker.xml, simple-topic-a.conf, > simple-topic-b.conf, simple_recv_modified.py > > > In a two router network, router-a and router-b, a link route is defined in > both directions on both routers. There is also an associated connector to a > broker on router-b. The address is configured to be a topic on the broker. > If two receivers attach on this address to router-a, and then detach at the > same time having received the defined number of messages, frequently (but not > always), one of the receivers will not get a detach echoed back to it. > On inspection of protocol traces, it appears that router-b, though it gets > the detach echoed back from the broker, does not forward this back to the > client (via router-a). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-927) detach not echoed back on multi-hop link route
[ https://issues.apache.org/jira/browse/DISPATCH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467418#comment-16467418 ] ASF GitHub Bot commented on DISPATCH-927: - Github user asfgit closed the pull request at: https://github.com/apache/qpid-dispatch/pull/301 > detach not echoed back on multi-hop link route > -- > > Key: DISPATCH-927 > URL: https://issues.apache.org/jira/browse/DISPATCH-927 > Project: Qpid Dispatch > Issue Type: Bug > Components: Container >Affects Versions: 1.0.0 >Reporter: Gordon Sim >Assignee: Ganesh Murthy >Priority: Major > Fix For: 1.1.0 > > Attachments: DISPATCH-927.patch, broker.xml, simple-topic-a.conf, > simple-topic-b.conf, simple_recv_modified.py > > > In a two router network, router-a and router-b, a link route is defined in > both directions on both routers. There is also an associated connector to a > broker on router-b. The address is configured to be a topic on the broker. > If two receivers attach on this address to router-a, and then detach at the > same time having received the defined number of messages, frequently (but not > always), one of the receivers will not get a detach echoed back to it. > On inspection of protocol traces, it appears that router-b, though it gets > the detach echoed back from the broker, does not forward this back to the > client (via router-a). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-990) Use patterns for policy vhost hostnames
Chuck Rolke created DISPATCH-990: Summary: Use patterns for policy vhost hostnames Key: DISPATCH-990 URL: https://issues.apache.org/jira/browse/DISPATCH-990 Project: Qpid Dispatch Issue Type: Bug Reporter: Chuck Rolke Currently policy vhost hostnames identify a single host. Vhost policy would be much more flexible if the hostnames could be specified with pattern matching wildcards: {{ #.corporate.example.com}} {{ #.labs.example.com}} {{ *.users.example.com}} {{ #.example.com}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467486#comment-16467486 ] Cliff Jansen commented on PROTON-1842: -- This test case is quite devilish. Thank-you. The above fix is necessary but the test case exposes more problems. The epoll callbacks for socket IO are "mushy". The epoll proactor handles this quite well (except at tear down as this JIRA has pointed out). The proactor regularly flips between "wake me when there is socket data to read" and "wake me when I can read OR write". On any transition, even with EPOLLONESHOT, it is not possible to know if one or two threads might be awoken, and if two are, which will get the context lock first. I am seeing the following calls to pconnection process where the first two are sequential and the latter two obviously overlap: event = RW .. rearm(RW) .. wake self inbound_wake .. rearm(R) event = R .. segfault on NULL or assert fail on closed fd event = RW .. begin close .. cleanup .. self delete > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (QPID-8185) [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on failure with sending connection.close
Alex Rudyy created QPID-8185: Summary: [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on failure with sending connection.close Key: QPID-8185 URL: https://issues.apache.org/jira/browse/QPID-8185 Project: Qpid Issue Type: Improvement Components: JMS AMQP 0-x Affects Versions: qpid-java-client-0-x-6.3.0, qpid-java-6.0.8, 0.32, 0.30, 0.28, 0.26, 0.24, 0.22, 0.20, 0.18, qpid-java-6.1.6 Reporter: Alex Rudyy Fix For: qpid-java-client-0-x-6.3.1 Sending connection.close as part of {{Connection#close}} can end-up in timeout exception. The underlying TCP connection remains open and Broker can continue sending data to the client when session close ends up in timeout as well. The incoming frames cannot be associated with the sessions, as the client removes session information on connection close, which results in a number of confusing exceptions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (QPID-8185) [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on failure with sending connection.close
[ https://issues.apache.org/jira/browse/QPID-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rudyy updated QPID-8185: - Description: Sending connection.close as part of {{Connection#close}} can end-up in timeout exception. The underlying TCP connection remains open and Broker can continue sending data to the client when session close ends up in timeout as well. The incoming frames cannot be associated with the sessions, as the client removes session information on connection close. As result in a number of confusing exceptions is reported (was: Sending connection.close as part of {{Connection#close}} can end-up in timeout exception. The underlying TCP connection remains open and Broker can continue sending data to the client when session close ends up in timeout as well. The incoming frames cannot be associated with the sessions, as the client removes session information on connection close, which results in a number of confusing exceptions.) > [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on > failure with sending connection.close > --- > > Key: QPID-8185 > URL: https://issues.apache.org/jira/browse/QPID-8185 > Project: Qpid > Issue Type: Improvement > Components: JMS AMQP 0-x >Affects Versions: qpid-java-6.1.6, 0.18, 0.20, 0.22, 0.24, 0.26, 0.28, > 0.30, 0.32, qpid-java-6.0.8, qpid-java-client-0-x-6.3.0 >Reporter: Alex Rudyy >Priority: Major > Fix For: qpid-java-client-0-x-6.3.1 > > > Sending connection.close as part of {{Connection#close}} can end-up in > timeout exception. The underlying TCP connection remains open and Broker can > continue sending data to the client when session close ends up in timeout as > well. The incoming frames cannot be associated with the sessions, as the > client removes session information on connection close. As result in a number > of confusing exceptions is reported -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (QPID-8185) [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on failure with sending connection.close
[ https://issues.apache.org/jira/browse/QPID-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rudyy updated QPID-8185: - Description: Sending connection.close as part of {{Connection#close}} can end-up in timeout exception. The underlying TCP connection remains open and Broker can continue sending data to the client when session close ends up in timeout as well. The incoming frames cannot be associated with the sessions, as the client removes session information on connection close. As result, a number of confusing exceptions is reported (was: Sending connection.close as part of {{Connection#close}} can end-up in timeout exception. The underlying TCP connection remains open and Broker can continue sending data to the client when session close ends up in timeout as well. The incoming frames cannot be associated with the sessions, as the client removes session information on connection close. As result in a number of confusing exceptions is reported) > [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on > failure with sending connection.close > --- > > Key: QPID-8185 > URL: https://issues.apache.org/jira/browse/QPID-8185 > Project: Qpid > Issue Type: Improvement > Components: JMS AMQP 0-x >Affects Versions: qpid-java-6.1.6, 0.18, 0.20, 0.22, 0.24, 0.26, 0.28, > 0.30, 0.32, qpid-java-6.0.8, qpid-java-client-0-x-6.3.0 >Reporter: Alex Rudyy >Priority: Major > Fix For: qpid-java-client-0-x-6.3.1 > > > Sending connection.close as part of {{Connection#close}} can end-up in > timeout exception. The underlying TCP connection remains open and Broker can > continue sending data to the client when session close ends up in timeout as > well. The incoming frames cannot be associated with the sessions, as the > client removes session information on connection close. As result, a number > of confusing exceptions is reported -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (QPID-8185) [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on failure with sending connection.close
[ https://issues.apache.org/jira/browse/QPID-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rudyy updated QPID-8185: - Attachment: 0001-JMS-AMQP-0-x-AMQP-0-8.0-91-Make-sure-that-client-clo.patch > [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on > failure with sending connection.close > --- > > Key: QPID-8185 > URL: https://issues.apache.org/jira/browse/QPID-8185 > Project: Qpid > Issue Type: Improvement > Components: JMS AMQP 0-x >Affects Versions: qpid-java-6.1.6, 0.18, 0.20, 0.22, 0.24, 0.26, 0.28, > 0.30, 0.32, qpid-java-6.0.8, qpid-java-client-0-x-6.3.0 >Reporter: Alex Rudyy >Priority: Major > Fix For: qpid-java-client-0-x-6.3.1 > > Attachments: > 0001-JMS-AMQP-0-x-AMQP-0-8.0-91-Make-sure-that-client-clo.patch > > > Sending connection.close as part of {{Connection#close}} can end-up in > timeout exception. The underlying TCP connection remains open and Broker can > continue sending data to the client when session close ends up in timeout as > well. The incoming frames cannot be associated with the sessions, as the > client removes session information on connection close. As result, a number > of confusing exceptions is reported -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (QPID-8184) [linearstore] Recovery intermittently produces JERR_EFP_BADEFPDIRNAME error followed by core
[ https://issues.apache.org/jira/browse/QPID-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466468#comment-16466468 ] Kim van der Riet edited comment on QPID-8184 at 5/8/18 3:41 PM: Pavel Moravec has discovered the root cause of this issue, see [https://bugzilla.redhat.com/show_bug.cgi?id=1561819#c18]. It appears that when using ::readlink(), the string containing the link destination is copied into the supplied buffer, but without being terminated with a '\0'. In some cases, there is remaining data in the buffer which when searched from the rear of the string yields odd results. The issue appears to be solved by simply terminating the string in the buffer with a '\0'. was (Author: kpvdr): Pavel Moravec has discovered the root cause of this issue, see [https://bugzilla.redhat.com/show_bug.cgi?id=1561819#c18.] It appears that when using ::readlink(), the string containing the link destination is copied into the supplied buffer, but without being terminated with a '\0'. In some cases, there is remaining data in the buffer which when searched from the rear of the string yields odd results. The issue appears to be solved by simply terminating the string in the buffer with a '\0'. > [linearstore] Recovery intermittently produces JERR_EFP_BADEFPDIRNAME error > followed by core > > > Key: QPID-8184 > URL: https://issues.apache.org/jira/browse/QPID-8184 > Project: Qpid > Issue Type: Bug > Components: C++ Broker >Reporter: Kim van der Riet >Assignee: Kim van der Riet >Priority: Major > > Some users are experiencing difficulty recovering the store, especially when > there are a large number of queues (several thousand). The log files show > the following pattern: > {{JERR_EFP_BADEFPDIRNAME}} in which some arbitrary number which is not > divisible by 4 is being used as the EFP file size (called EFP directory in > the log), followed by a segfault: > {noformat} > May 4 18:55:00 prodrhs1l qpidd[6240]: 2018-05-04 18:55:00 [Store] warning > Linear Store: EmptyFilePool create failed: jexception 0x0d03 > EmptyFilePool::fileSizeKbFromDirName() threw JERR_EFP_BADEFPDIRNAME: Bad > Empty File Pool directory name (must be 'NNNk', where NNN is a number which > is a multiple of 4) (Partition: 1; EFP directory: '9k') > May 4 18:55:00 prodrhs1l kernel: qpidd[6240]: segfault at 10 ip > 7f4219af8e19 sp 7ffc227a6350 error 4 in > linearstore.so[7f4219ac4000+bd000]{noformat} > In the event that the random number _is_ divisible by 4, a randomly sized > directory containing no files may appear in the partition EFP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (QPID-8185) [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on failure with sending connection.close
[ https://issues.apache.org/jira/browse/QPID-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rudyy updated QPID-8185: - Description: Sending connection.close as part of {{Connection#close}} can end-up in timeout exception. The underlying TCP connection remains open and Broker can continue sending data to the client when session close ends up in timeout as well. The incoming frames cannot be associated with the sessions, as the client removes session information on connection close. As result, a number of confusing exceptions is reported. Here are the examples of exception stack-traces reported for the issue {noformat} INFO Unsuspending channel threw an exception: [Thread-227][AMQSession.java:2374] org.apache.qpid.AMQTimeoutException: Server did not respond in a timely fashion at org.apache.qpid.client.util.BlockingWaiter.block(BlockingWaiter.java:170) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.BlockingMethodFrameListener.blockForFrame(BlockingMethodFrameListener.java:115) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.AMQProtocolHandler.writeCommandFrameAndWaitForReply(AMQProtocolHandler.java:715) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.AMQProtocolHandler.syncWrite(AMQProtocolHandler.java:736) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.AMQProtocolHandler.syncWrite(AMQProtocolHandler.java:730) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession_0_8.sendSuspendChannel(AMQSession_0_8.java:728) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3156) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.startDispatcherIfNecessary(AMQSession.java:2370) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.syncDispatchQueue(AMQSession.java:2223) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1881) [qpid-client-0.32.jar:0.32] ERROR Error closing session: javax.jms.JMSException: Error closing session: org.apache.qpid.AMQTimeoutException: Server did not respond in a timely fashion [error code 408: Request Timeout][DefaultMessageListenerContainer-2][AMQConnection.java:1039] ERROR Error closing connection [DefaultMessageListenerContainer-2][AMQConnection.java:971] javax.jms.JMSException: Error closing session: org.apache.qpid.AMQTimeoutException: Server did not respond in a timely fashion [error code 408: Request Timeout] at org.apache.qpid.client.AMQSession.close(AMQSession.java:764) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.close(AMQSession.java:730) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:1035) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.doClose(AMQConnection.java:962) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.doClose(AMQConnection.java:951) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.doClose(AMQConnection.java:951) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.close(AMQConnection.java:935) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.close(AMQConnection.java:916) [qpid-client-0.32.jar:0.32] at org.springframework.jms.connection.ConnectionFactoryUtils.releaseConnection(ConnectionFactoryUtils.java:80) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at org.springframework.jms.listener.AbstractJmsListeningContainer.refreshSharedConnection(AbstractJmsListeningContainer.java:395) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at org.springframework.jms.listener.DefaultMessageListenerContainer.refreshConnectionUntilSuccessful(DefaultMessageListenerContainer.java:915) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at org.springframework.jms.listener.DefaultMessageListenerContainer.recoverAfterListenerSetupFailure(DefaultMessageListenerContainer.java:890) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1061) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40] Caused by: org.apache.qpid.AMQTimeoutException: Server did not respond in a timely fashion at org.apache.qpid.client.util.BlockingWaiter.block(BlockingWaiter.java:170) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.BlockingMethodFrameListener.blockForFrame(BlockingMethodFrameListener.java:115) ~[qpid-client-0.32.ja
[jira] [Updated] (QPID-8185) [JMS AMQP 0-x][AMQP 0-8..0-91] Make sure that client closes TCP connection on failure with sending connection.close
[ https://issues.apache.org/jira/browse/QPID-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rudyy updated QPID-8185: - Description: Sending connection.close as part of {{Connection#close}} can end-up in timeout exception. The underlying TCP connection remains open and Broker can continue sending data to the client when session close ends up in timeout as well. The incoming frames cannot be associated with the sessions, as the client removes session information on connection close. As result, a number of confusing exceptions is reported. Here are the examples of exception stack-traces reported for the issue {noformat} INFO Unsuspending channel threw an exception: [Thread-227][AMQSession.java:2374] org.apache.qpid.AMQTimeoutException: Server did not respond in a timely fashion at org.apache.qpid.client.util.BlockingWaiter.block(BlockingWaiter.java:170) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.BlockingMethodFrameListener.blockForFrame(BlockingMethodFrameListener.java:115) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.AMQProtocolHandler.writeCommandFrameAndWaitForReply(AMQProtocolHandler.java:715) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.AMQProtocolHandler.syncWrite(AMQProtocolHandler.java:736) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.AMQProtocolHandler.syncWrite(AMQProtocolHandler.java:730) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession_0_8.sendSuspendChannel(AMQSession_0_8.java:728) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.suspendChannel(AMQSession.java:3156) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.startDispatcherIfNecessary(AMQSession.java:2370) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.syncDispatchQueue(AMQSession.java:2223) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.rollback(AMQSession.java:1881) [qpid-client-0.32.jar:0.32] ERROR Error closing session: javax.jms.JMSException: Error closing session: org.apache.qpid.AMQTimeoutException: Server did not respond in a timely fashion [error code 408: Request Timeout][DefaultMessageListenerContainer-2][AMQConnection.java:1039] ERROR Error closing connection [DefaultMessageListenerContainer-2][AMQConnection.java:971] javax.jms.JMSException: Error closing session: org.apache.qpid.AMQTimeoutException: Server did not respond in a timely fashion [error code 408: Request Timeout] at org.apache.qpid.client.AMQSession.close(AMQSession.java:764) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQSession.close(AMQSession.java:730) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.closeAllSessions(AMQConnection.java:1035) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.doClose(AMQConnection.java:962) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.doClose(AMQConnection.java:951) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.doClose(AMQConnection.java:951) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.close(AMQConnection.java:935) [qpid-client-0.32.jar:0.32] at org.apache.qpid.client.AMQConnection.close(AMQConnection.java:916) [qpid-client-0.32.jar:0.32] at org.springframework.jms.connection.ConnectionFactoryUtils.releaseConnection(ConnectionFactoryUtils.java:80) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at org.springframework.jms.listener.AbstractJmsListeningContainer.refreshSharedConnection(AbstractJmsListeningContainer.java:395) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at org.springframework.jms.listener.DefaultMessageListenerContainer.refreshConnectionUntilSuccessful(DefaultMessageListenerContainer.java:915) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at org.springframework.jms.listener.DefaultMessageListenerContainer.recoverAfterListenerSetupFailure(DefaultMessageListenerContainer.java:890) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1061) [spring-jms-4.2.3.RELEASE.jar:4.2.3.RELEASE] at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40] Caused by: org.apache.qpid.AMQTimeoutException: Server did not respond in a timely fashion at org.apache.qpid.client.util.BlockingWaiter.block(BlockingWaiter.java:170) ~[qpid-client-0.32.jar:0.32] at org.apache.qpid.client.protocol.BlockingMethodFrameListener.blockForFrame(BlockingMethodFrameListener.java:115) ~[qpid-client-0.32.ja
[jira] [Commented] (PROTON-1841) [cpp] add missing ostream<< and to_string for proton::message
[ https://issues.apache.org/jira/browse/PROTON-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467603#comment-16467603 ] ASF subversion and git services commented on PROTON-1841: - Commit 3d46b4f0220e7e56ce5167ab87d4df15f3ca1583 in qpid-proton's branch refs/heads/master from [~aconway] [ https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=3d46b4f ] PROTON-1841: [cpp] add missing ostream<< and to_string for proton::message > [cpp] add missing ostream<< and to_string for proton::message > - > > Key: PROTON-1841 > URL: https://issues.apache.org/jira/browse/PROTON-1841 > Project: Qpid Proton > Issue Type: Bug > Components: cpp-binding >Affects Versions: proton-c-0.22.0 >Reporter: Alan Conway >Assignee: Alan Conway >Priority: Major > Fix For: proton-c-0.23.0 > > > proton::message lacks an ostream operator<< and to_string function, which are > provided for proton::value and most other types in the library. It can be > implemented using C pn_inspect. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-1816) [c] deprecate old netaddr function names
[ https://issues.apache.org/jira/browse/PROTON-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Conway updated PROTON-1816: Priority: Minor (was: Major) > [c] deprecate old netaddr function names > > > Key: PROTON-1816 > URL: https://issues.apache.org/jira/browse/PROTON-1816 > Project: Qpid Proton > Issue Type: Improvement > Components: proton-c >Affects Versions: proton-j-0.22.0 >Reporter: Alan Conway >Assignee: Alan Conway >Priority: Minor > Fix For: proton-c-0.23.0 > > > See PROTON-1781 - the functions were re-named but the deprecation macros were > commented out to give people a release cycle to adjust to the new names. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (PROTON-1841) [cpp] add missing ostream<< and to_string for proton::message
[ https://issues.apache.org/jira/browse/PROTON-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Conway resolved PROTON-1841. - Resolution: Fixed > [cpp] add missing ostream<< and to_string for proton::message > - > > Key: PROTON-1841 > URL: https://issues.apache.org/jira/browse/PROTON-1841 > Project: Qpid Proton > Issue Type: Bug > Components: cpp-binding >Affects Versions: proton-c-0.22.0 >Reporter: Alan Conway >Assignee: Alan Conway >Priority: Major > Fix For: proton-c-0.23.0 > > > proton::message lacks an ostream operator<< and to_string function, which are > provided for proton::value and most other types in the library. It can be > implemented using C pn_inspect. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Conway updated PROTON-1842: Attachment: race.vg race.tsan > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467645#comment-16467645 ] Alan Conway commented on PROTON-1842: - The threaderciser is showing races in connection close, I'm not sure if they are the same issue we are looking at here. Attached output race.vg and race.tsan from helgrind and the thread sanitizer. Valigrind detects a *lot* more races, probaby because it is slowing things down so much, but the tsan stack traces are consistent with valgrind. > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467645#comment-16467645 ] Alan Conway edited comment on PROTON-1842 at 5/8/18 4:36 PM: - The threaderciser is showing races in connection close, I'm not sure if they are the same issue we are looking at here. Attached output race.vg and race.tsan from helgrind and the thread sanitizer. Valigrind detects a *lot* more races, probaby because it is slowing things down so much, but the tsan stack traces are consistent with valgrind. This looks consistent with your theory, in particular a mutex being destroyed concurrently with being unlocked during shutdown. One thread locks, sees everything is ready to finalize and destroys the connection state while the second thread is blocked on the mutex - it gets released when the first thread unlocks before pthread_destroy but explodes when it tries to unlock after the destroy. was (Author: aconway): The threaderciser is showing races in connection close, I'm not sure if they are the same issue we are looking at here. Attached output race.vg and race.tsan from helgrind and the thread sanitizer. Valigrind detects a *lot* more races, probaby because it is slowing things down so much, but the tsan stack traces are consistent with valgrind. > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467645#comment-16467645 ] Alan Conway edited comment on PROTON-1842 at 5/8/18 4:52 PM: - The threaderciser is showing races in connection close, I'm not sure if they are the same issue we are looking at here. Attached output race.vg and race.tsan from helgrind and the thread sanitizer. Valigrind detects a *lot* more races, probaby because it is slowing things down so much, but the tsan stack traces are consistent with valgrind. This looks consistent with your theory, in particular a mutex being destroyed concurrently with being unlocked during shutdown. One thread locks, sees everything is ready to finalize and destroys the connection state while the second thread is blocked on the mutex - it gets released when the first thread unlocks before pthread_destroy but explodes when it tries to unlock after the destroy. To run: {code:java} cmake -DTHREADERCISER=ON .. && make && valgrind --tool=helgrind c/tests/c-threaderciser -time 60 cmake -DENABLE_TSAN=ON -DTHREADERCISER=ON .. && make && c/tests/c-threaderciser -time 60{code} was (Author: aconway): The threaderciser is showing races in connection close, I'm not sure if they are the same issue we are looking at here. Attached output race.vg and race.tsan from helgrind and the thread sanitizer. Valigrind detects a *lot* more races, probaby because it is slowing things down so much, but the tsan stack traces are consistent with valgrind. This looks consistent with your theory, in particular a mutex being destroyed concurrently with being unlocked during shutdown. One thread locks, sees everything is ready to finalize and destroys the connection state while the second thread is blocked on the mutex - it gets released when the first thread unlocks before pthread_destroy but explodes when it tries to unlock after the destroy. > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sy
[jira] [Commented] (DISPATCH-989) symlinks in tree to non existent files, possibly stale and could be removed?
[ https://issues.apache.org/jira/browse/DISPATCH-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467720#comment-16467720 ] ASF subversion and git services commented on DISPATCH-989: -- Commit 7b2d8225e28c295328d3926b9dc5b26d44795540 in qpid-dispatch's branch refs/heads/master from [~eallen] [ https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=7b2d822 ] DISPATCH-989 Replace broken symlink with original files. Add npm dependency on rhea. > symlinks in tree to non existent files, possibly stale and could be removed? > > > Key: DISPATCH-989 > URL: https://issues.apache.org/jira/browse/DISPATCH-989 > Project: Qpid Dispatch > Issue Type: Task > Components: Console >Affects Versions: 1.1.0 >Reporter: Robbie Gemmell >Assignee: Ernest Allen >Priority: Minor > Fix For: 1.2.0 > > > There are a number of symlinks in the console tree which point to files that > no longer exist. Its not clear these are actually required anymore, and may > be stale and could be removed? The targets were seemingly removed by > DISPATCH-917 > Dir console/test/css/: > brokers.ttf -> ../../stand-alone/plugin/css/brokers.ttf > dispatch.css -> ../../stand-alone/plugin/css/dispatch.css > plugin.css -> ../../stand-alone/plugin/css/plugin.css > site-base.css -> ../../stand-alone/plugin/css/site-base.css > Dir console/test/html/: > qdrConnect.html -> ../../stand-alone/plugin/html/qdrConnect.html > qdrLayout.html -> ../../stand-alone/plugin/html/qdrLayout.html > Dir console/test/js/: > qdrService.js -> ../../stand-alone/plugin/js/qdrService.js > Dir console/test/lib/: > rhea-min.js -> ../../stand-alone/plugin/lib/rhea-min.js -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (DISPATCH-989) symlinks in tree to non existent files, possibly stale and could be removed?
[ https://issues.apache.org/jira/browse/DISPATCH-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ernest Allen resolved DISPATCH-989. --- Resolution: Fixed This tool should be deprecated and then removed from the source tree in a later release. > symlinks in tree to non existent files, possibly stale and could be removed? > > > Key: DISPATCH-989 > URL: https://issues.apache.org/jira/browse/DISPATCH-989 > Project: Qpid Dispatch > Issue Type: Task > Components: Console >Affects Versions: 1.1.0 >Reporter: Robbie Gemmell >Assignee: Ernest Allen >Priority: Minor > Fix For: 1.2.0 > > > There are a number of symlinks in the console tree which point to files that > no longer exist. Its not clear these are actually required anymore, and may > be stale and could be removed? The targets were seemingly removed by > DISPATCH-917 > Dir console/test/css/: > brokers.ttf -> ../../stand-alone/plugin/css/brokers.ttf > dispatch.css -> ../../stand-alone/plugin/css/dispatch.css > plugin.css -> ../../stand-alone/plugin/css/plugin.css > site-base.css -> ../../stand-alone/plugin/css/site-base.css > Dir console/test/html/: > qdrConnect.html -> ../../stand-alone/plugin/html/qdrConnect.html > qdrLayout.html -> ../../stand-alone/plugin/html/qdrLayout.html > Dir console/test/js/: > qdrService.js -> ../../stand-alone/plugin/js/qdrService.js > Dir console/test/lib/: > rhea-min.js -> ../../stand-alone/plugin/lib/rhea-min.js -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467772#comment-16467772 ] Cliff Jansen commented on PROTON-1842: -- Thank-you for this additional info. Yes, these look like the same types of stack traces after a dust up. If they truly are, for me the pconnection_done thread (T2) got the R socket event, the last event batch, begin_close and free. The competing thread (T4) got the RW socket event, then failed in numerous ways depending on what was in the torn down or reused freed memory. Catastrophic stuff happens for about 1 in 1 connections for a debug build on a middle-aged 4c/8t desktop machine. > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467806#comment-16467806 ] Alan Conway commented on PROTON-1842: - Note that the only way connections get closed in this version of the threaderciser is because a listener was closed and the connection failed (nobody listening, bad port, socket closed unexpectedly) so at least in the threaderciser version this happening as a result of an early error while the connection is possibly not completely set up. I'm adding socket kills to the connection socket now, will let you know how that goes. > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-991) Master qdstat throws keyError when running against 1.0.1 router
Ganesh Murthy created DISPATCH-991: -- Summary: Master qdstat throws keyError when running against 1.0.1 router Key: DISPATCH-991 URL: https://issues.apache.org/jira/browse/DISPATCH-991 Project: Qpid Dispatch Issue Type: Bug Components: Management Agent Affects Versions: 1.1.0 Reporter: Ganesh Murthy Assignee: Ganesh Murthy Fix For: 1.1.0 When running the master qdstat against a previously released 1.0.1 version of the router the following error is put out - KeyError: 'presettledDeliveries -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-991) Master qdstat throws keyError when running against 1.0.1 router
[ https://issues.apache.org/jira/browse/DISPATCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467982#comment-16467982 ] Ganesh Murthy commented on DISPATCH-991: Also put back the area field that was accidentally removed from the new version. > Master qdstat throws keyError when running against 1.0.1 router > --- > > Key: DISPATCH-991 > URL: https://issues.apache.org/jira/browse/DISPATCH-991 > Project: Qpid Dispatch > Issue Type: Bug > Components: Management Agent >Affects Versions: 1.1.0 >Reporter: Ganesh Murthy >Assignee: Ganesh Murthy >Priority: Major > Fix For: 1.1.0 > > > When running the master qdstat against a previously released 1.0.1 version of > the router the following error is put out - > KeyError: 'presettledDeliveries -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-991) Master qdstat throws keyError when running against 1.0.1 router
[ https://issues.apache.org/jira/browse/DISPATCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467997#comment-16467997 ] ASF subversion and git services commented on DISPATCH-991: -- Commit 448605e2a7d4cd724ab5d0659e060b11f4841994 in qpid-dispatch's branch refs/heads/master from [~ganeshmurthy] [ https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=448605e ] DISPATCH-991 - Added back area attribute and fixed the keyError. Now qdstat will be backward compatible (cherry picked from commit 90b04701f01ea53fb00efc8b5d44c321bb78dc79) > Master qdstat throws keyError when running against 1.0.1 router > --- > > Key: DISPATCH-991 > URL: https://issues.apache.org/jira/browse/DISPATCH-991 > Project: Qpid Dispatch > Issue Type: Bug > Components: Management Agent >Affects Versions: 1.1.0 >Reporter: Ganesh Murthy >Assignee: Ganesh Murthy >Priority: Major > Fix For: 1.1.0 > > > When running the master qdstat against a previously released 1.0.1 version of > the router the following error is put out - > KeyError: 'presettledDeliveries -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-992) System test is failing in some scenarios - system_tests_delivery_abort.py
Fernando Giorgetti created DISPATCH-992: --- Summary: System test is failing in some scenarios - system_tests_delivery_abort.py Key: DISPATCH-992 URL: https://issues.apache.org/jira/browse/DISPATCH-992 Project: Qpid Dispatch Issue Type: Bug Components: Tests Reporter: Fernando Giorgetti In some machines, we were able to see that system_tests_delivery_abort.py test is failing (only the truncate tests) as on_aborted() method is not being invoked. After debugging the test and along with the router code, it ended out being a timing issue on some machines. Basically when the sender's close() method is called (like at line 218), the headers have not yet been sent from the router (with aborted=true), so on_aborted is never invoked on the test. Using a bigger data to stream, like 100 instead of 10 (or even sleeping for 1 second before closing the sender), it gives enough time for the headers to be sent and then test passes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (DISPATCH-991) Master qdstat throws keyError when running against 1.0.1 router
[ https://issues.apache.org/jira/browse/DISPATCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ganesh Murthy resolved DISPATCH-991. Resolution: Fixed > Master qdstat throws keyError when running against 1.0.1 router > --- > > Key: DISPATCH-991 > URL: https://issues.apache.org/jira/browse/DISPATCH-991 > Project: Qpid Dispatch > Issue Type: Bug > Components: Management Agent >Affects Versions: 1.1.0 >Reporter: Ganesh Murthy >Assignee: Ganesh Murthy >Priority: Major > Fix For: 1.1.0 > > > When running the master qdstat against a previously released 1.0.1 version of > the router the following error is put out - > KeyError: 'presettledDeliveries -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[GitHub] qpid-dispatch pull request #302: DISPATCH-992: Fix for system_tests_delivery...
GitHub user fgiorgetti opened a pull request: https://github.com/apache/qpid-dispatch/pull/302 DISPATCH-992: Fix for system_tests_delivery_abort.py You can merge this pull request into a Git repository by running: $ git pull https://github.com/fgiorgetti/qpid-dispatch fgiorgetti-DISPATCH-992 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/qpid-dispatch/pull/302.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #302 commit 95d1ba5e1d8882dd624b451072837396f201baa5 Author: Fernando Giorgetti Date: 2018-05-08T21:43:21Z DISPATCH-992: Fix for system_tests_delivery_abort.py --- - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-992) System test is failing in some scenarios - system_tests_delivery_abort.py
[ https://issues.apache.org/jira/browse/DISPATCH-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468000#comment-16468000 ] ASF GitHub Bot commented on DISPATCH-992: - GitHub user fgiorgetti opened a pull request: https://github.com/apache/qpid-dispatch/pull/302 DISPATCH-992: Fix for system_tests_delivery_abort.py You can merge this pull request into a Git repository by running: $ git pull https://github.com/fgiorgetti/qpid-dispatch fgiorgetti-DISPATCH-992 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/qpid-dispatch/pull/302.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #302 commit 95d1ba5e1d8882dd624b451072837396f201baa5 Author: Fernando Giorgetti Date: 2018-05-08T21:43:21Z DISPATCH-992: Fix for system_tests_delivery_abort.py > System test is failing in some scenarios - system_tests_delivery_abort.py > - > > Key: DISPATCH-992 > URL: https://issues.apache.org/jira/browse/DISPATCH-992 > Project: Qpid Dispatch > Issue Type: Bug > Components: Tests >Reporter: Fernando Giorgetti >Priority: Major > > In some machines, we were able to see that system_tests_delivery_abort.py > test is failing (only the truncate tests) as on_aborted() method is not being > invoked. > After debugging the test and along with the router code, it ended out being a > timing issue on some machines. Basically when the sender's close() method is > called (like at line 218), the headers have not yet been sent from the router > (with aborted=true), so on_aborted is never invoked on the test. > Using a bigger data to stream, like 100 instead of 10 (or even > sleeping for 1 second before closing the sender), it gives enough time for > the headers to be sent and then test passes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[GitHub] qpid-dispatch pull request #302: DISPATCH-992: Fix for system_tests_delivery...
Github user asfgit closed the pull request at: https://github.com/apache/qpid-dispatch/pull/302 --- - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-992) System test is failing in some scenarios - system_tests_delivery_abort.py
[ https://issues.apache.org/jira/browse/DISPATCH-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468138#comment-16468138 ] ASF GitHub Bot commented on DISPATCH-992: - Github user asfgit closed the pull request at: https://github.com/apache/qpid-dispatch/pull/302 > System test is failing in some scenarios - system_tests_delivery_abort.py > - > > Key: DISPATCH-992 > URL: https://issues.apache.org/jira/browse/DISPATCH-992 > Project: Qpid Dispatch > Issue Type: Bug > Components: Tests >Reporter: Fernando Giorgetti >Priority: Major > > In some machines, we were able to see that system_tests_delivery_abort.py > test is failing (only the truncate tests) as on_aborted() method is not being > invoked. > After debugging the test and along with the router code, it ended out being a > timing issue on some machines. Basically when the sender's close() method is > called (like at line 218), the headers have not yet been sent from the router > (with aborted=true), so on_aborted is never invoked on the test. > Using a bigger data to stream, like 100 instead of 10 (or even > sleeping for 1 second before closing the sender), it gives enough time for > the headers to be sent and then test passes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-992) System test is failing in some scenarios - system_tests_delivery_abort.py
[ https://issues.apache.org/jira/browse/DISPATCH-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468137#comment-16468137 ] ASF subversion and git services commented on DISPATCH-992: -- Commit 95d1ba5e1d8882dd624b451072837396f201baa5 in qpid-dispatch's branch refs/heads/master from [~fgiorget] [ https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=95d1ba5 ] DISPATCH-992: Fix for system_tests_delivery_abort.py > System test is failing in some scenarios - system_tests_delivery_abort.py > - > > Key: DISPATCH-992 > URL: https://issues.apache.org/jira/browse/DISPATCH-992 > Project: Qpid Dispatch > Issue Type: Bug > Components: Tests >Reporter: Fernando Giorgetti >Priority: Major > > In some machines, we were able to see that system_tests_delivery_abort.py > test is failing (only the truncate tests) as on_aborted() method is not being > invoked. > After debugging the test and along with the router code, it ended out being a > timing issue on some machines. Basically when the sender's close() method is > called (like at line 218), the headers have not yet been sent from the router > (with aborted=true), so on_aborted is never invoked on the test. > Using a bigger data to stream, like 100 instead of 10 (or even > sleeping for 1 second before closing the sender), it gives enough time for > the headers to be sent and then test passes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (DISPATCH-992) System test is failing in some scenarios - system_tests_delivery_abort.py
[ https://issues.apache.org/jira/browse/DISPATCH-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ganesh Murthy resolved DISPATCH-992. Resolution: Fixed Fix Version/s: 1.2.0 > System test is failing in some scenarios - system_tests_delivery_abort.py > - > > Key: DISPATCH-992 > URL: https://issues.apache.org/jira/browse/DISPATCH-992 > Project: Qpid Dispatch > Issue Type: Bug > Components: Tests >Reporter: Fernando Giorgetti >Priority: Major > Fix For: 1.2.0 > > > In some machines, we were able to see that system_tests_delivery_abort.py > test is failing (only the truncate tests) as on_aborted() method is not being > invoked. > After debugging the test and along with the router code, it ended out being a > timing issue on some machines. Basically when the sender's close() method is > called (like at line 218), the headers have not yet been sent from the router > (with aborted=true), so on_aborted is never invoked on the test. > Using a bigger data to stream, like 100 instead of 10 (or even > sleeping for 1 second before closing the sender), it gives enough time for > the headers to be sent and then test passes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1771) [c-proactor] multi-thread race test for proactor
[ https://issues.apache.org/jira/browse/PROTON-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468169#comment-16468169 ] ASF subversion and git services commented on PROTON-1771: - Commit 94dfe1bf033f7d4b9183bbad75b1801d688a300d in qpid-proton's branch refs/heads/master from [~aconway] [ https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=94dfe1b ] PROTON-1771: [c] add -close-connnect, -cancel-timeout to threaderciser Also added -no-xxx flags to disable selected actions > [c-proactor] multi-thread race test for proactor > > > Key: PROTON-1771 > URL: https://issues.apache.org/jira/browse/PROTON-1771 > Project: Qpid Proton > Issue Type: Test > Components: proton-c >Affects Versions: proton-c-0.20.0 >Reporter: Alan Conway >Assignee: Alan Conway >Priority: Major > Fix For: proton-c-0.23.0 > > > Crate a new test exe that runs for a (configurable, default short) period of > time, with a single proactor acted on by multiple proactor and user threads. > Run > with helgrind or tsan to detect races. > Exercise potentially racy APIs concurrently: > - making, accepting and closing (from both ends) a connection. > - pn_connection_wake > - pn_proactor_release_connection > - re-use of released pn_connection_t on a new connection > - timeout > - concurrent with some normal use: sending/receiving messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468176#comment-16468176 ] Alan Conway commented on PROTON-1842: - Another note, the latest threaderciser shows the race with flags "-listen -connect -close-listen" so the only things that are racing here are IO events from connection errors and procator-generated wakes - there are no user wakes involved. > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468176#comment-16468176 ] Alan Conway edited comment on PROTON-1842 at 5/9/18 12:55 AM: -- Another note, the latest threaderciser shows the race with flags "-listen -connect -close-listen" so the only things that are racing here are IO events from connection errors and procator-generated wakes - there are no user wakes involved. I am seeing a race betwee pn_proactor_done() (user thread) deciding to finalize a connection, and an epoll thread waking up to process it. The epoll thread is racing to lock the context mutex while the user thread is deleting it - I'm not seeing a crash but it's clear that it could be a crash with the right timing. was (Author: aconway): Another note, the latest threaderciser shows the race with flags "-listen -connect -close-listen" so the only things that are racing here are IO events from connection errors and procator-generated wakes - there are no user wakes involved. > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (PROTON-1842) [c] Dispatch/Proton crashes when opening/closing connections
[ https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468176#comment-16468176 ] Alan Conway edited comment on PROTON-1842 at 5/9/18 1:07 AM: - Another note, the latest threaderciser shows the race with flags "-listen -connect -close-listen" so the only things that are racing here are IO events from connection errors and procator-generated wakes - there are no user wakes involved. I am seeing a race betwee pn_proactor_done() (user thread) deciding to finalize a connection, and an epoll thread waking up to process it. The epoll thread is racing to lock the context mutex while the user thread is deleting it - I'm not seeing a crash but it's clear that it could be a crash with the right timing. Speculating: we need to bring back something like the ee->mutex to sync around epoll mods and waits. The variables in pconnection_is_final(pconnection_t *pc) { return !pc->current_arm && !pc->timer_armed && !pc->context.wake_ops; } Need to be synchronized around epoll events, because right now it seems that is_final can return true concurrently with epoll_wait returning the same pc, so it seems like current_arm is not properly synced. was (Author: aconway): Another note, the latest threaderciser shows the race with flags "-listen -connect -close-listen" so the only things that are racing here are IO events from connection errors and procator-generated wakes - there are no user wakes involved. I am seeing a race betwee pn_proactor_done() (user thread) deciding to finalize a connection, and an epoll thread waking up to process it. The epoll thread is racing to lock the context mutex while the user thread is deleting it - I'm not seeing a crash but it's clear that it could be a crash with the right timing. > [c] Dispatch/Proton crashes when opening/closing connections > > > Key: PROTON-1842 > URL: https://issues.apache.org/jira/browse/PROTON-1842 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Chuck Rolke >Priority: Major > Attachments: helloworld.cpp, race.tsan, race.vg > > > Using proton cpp example code that is modified to open and close connections > by the thousands in the main loop and having the event loop short circuit any > messaging with: > {{ void on_connection_open(proton::connection& c) {}} > {{ c.close();}} > {{ }}} > and then directing this client example to a dispatch router 1.1.0. Eventually > (after 100,000 to 1,000,000 connection open/closes) the router crashes with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: > wake_pop_front: Assertion `p->wakes_in_progress' failed.}} > and with: > {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: > proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}} > This issue seems to happen only with qpid-dispatch accepting the open/close > event stream. Proton cpp example _server_direct_ and c example _direct_ work > properly with the same open/close event stream mounting into the 10s of > millions of connections. > A core dump backtrace with the PCONNECTION_TIMER failure reads as: > {{(gdb) bt}} > {{#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}} > {{#1 0x7f795c712c41 in __GI_abort () at abort.c:79}} > {{#2 0x7f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }} > {{ file=file@entry=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }} > {{ function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> > "proactor_do_epoll") at assert.c:92}} > {{#3 0x7f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a > "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 > "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }} > {{ function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") > at assert.c:101}} > {{#4 0x7f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) > at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}} > {{#5 0x7f795d72d30e in pn_proactor_wait (p=0x26b7310) at > /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}} > {{#6 0x7f795dbe89ad in thread_run (arg=0x26be750) at > /home/chug/git/qpid-dispatch/src/server.c:946}} > {{#7 0x7f795d50e50b in start_thread (arg=0x7f794f486700) at > pthread_create.c:465}} > {{#8 0x7f795c7d216f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)