[jira] [Created] (PROTON-1936) Support cross compiling to Windows from Linux
Marcel Meulemans created PROTON-1936: Summary: Support cross compiling to Windows from Linux Key: PROTON-1936 URL: https://issues.apache.org/jira/browse/PROTON-1936 Project: Qpid Proton Issue Type: Improvement Components: proton-c Affects Versions: proton-c-0.25.0 Reporter: Marcel Meulemans I am cross compiling proton for Windows via docker (multiarch/crossbuild) and running into a few minor issues that make it not work out of the box (mainly include file casing). Pull request will follow, more details there ... it would be nice if this made it upstream. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-1892) Deliveries on different links use the same delivery-id
Marcel Meulemans created PROTON-1892: Summary: Deliveries on different links use the same delivery-id Key: PROTON-1892 URL: https://issues.apache.org/jira/browse/PROTON-1892 Project: Qpid Proton Issue Type: Bug Components: proton-j Affects Versions: proton-j-0.27.1 Reporter: Marcel Meulemans Attachments: proton-j-delivery-id-fix.patch, proton-trace.log Given a session with two outgoing links the situation can occur that two deliveries on separate links share the same delivery-id. This situation occurs when a multi frame transfer is being sent on link A and a new (single frame) transfer is sent (multiplexed) on link B before the delivery on link A completes. The reason this occurs is because the increment of the delivery id counter (maintained per session) is delayed until the entire (multi frame) delivery is complete ([here|https://github.com/apache/qpid-proton-j/blob/e5a7dcade2996b2b68967949ddf1377f954bf579/proton-j/src/main/java/org/apache/qpid/proton/engine/impl/TransportImpl.java#L619]) allowing the second delivery to get the same delivery id when calling getOutgoingDeliveryId [here|https://github.com/apache/qpid-proton-j/blob/e5a7dcade2996b2b68967949ddf1377f954bf579/proton-j/src/main/java/org/apache/qpid/proton/engine/impl/TransportImpl.java#L559] My 100% reproduction scenario is as follows: * Run artemis (2.6.2 which uses proton-j 0.27.1) with an AMQP connector * Send a large message (10MB) to queue A * Send a couple of small messages to queue B * Connect a proton-c based client with a small maxFrameSize (8K) and limited credit to artemis and simultaneously subscribe to both queues (I think a flow frame triggers artemis to initiate a transfer therefore the limited credit). With proton-c trace logging enable you will get something like this: [^proton-trace.log] The attached patch fixes the issue. [^proton-j-delivery-id-fix.patch] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1846) [proton-c] Message decode fails with PN_OUT_OF_MEMORY if there are large lists in the message
[ https://issues.apache.org/jira/browse/PROTON-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504462#comment-16504462 ] Marcel Meulemans commented on PROTON-1846: -- I tried the diff above and and the {{uint32_t}} seems to introduce some unwanted side effects due to the with the singed/unsigned "magic" in {{pn_data_point/pn_data_restore}} (did look into the details, just that this branch, [https://github.com/apache/qpid-proton/blob/master/c/src/core/codec.c#L1177] isn't hit when it should be). Using {{typedef int32_t pni_nid_t;}} and fixing PNI_NID_MAX accordingly did worked for me. > [proton-c] Message decode fails with PN_OUT_OF_MEMORY if there are large > lists in the message > -- > > Key: PROTON-1846 > URL: https://issues.apache.org/jira/browse/PROTON-1846 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: proton-c-0.22.0 >Reporter: Ganesh Murthy >Priority: Major > Attachments: send_large_structured_body.js > > > Steps to reproduce - > > # Start the Qpid Dispatch router > # Run the following script that creates a bunch of addresses > # for i in `seq 1 6546`; do echo > "\{\"prefix\":\"address-$i\",\"distribution\":\"balanced\"}" | qdmanage > CREATE --type=org.apache.qpid.dispatch.router.config.address --name > address-$i --stdin; done > # now run qdmanage QUERY --type=address > # You will receive a Data error (-10) > The following diff seems to fix the issue > diff --git a/c/src/core/data.h b/c/src/core/data.h > index 94dc7d67..f4320e2a 100644 > --- a/c/src/core/data.h > +++ b/c/src/core/data.h > @@ -27,7 +27,7 @@ > #include "decoder.h" > #include "encoder.h" > > -typedef uint16_t pni_nid_t; > +typedef uint32_t pni_nid_t; > #define PNI_NID_MAX ((pni_nid_t)-1) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-1019) Messaging instability in networks with many clients / addresses.
[ https://issues.apache.org/jira/browse/DISPATCH-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Meulemans updated DISPATCH-1019: --- Description: After DISPATCH-966 has been fixed I am still experiencing problems in a network with many clients / addresses. I am running a three node fully connected mesh of dispatch routers with 1 attached clients all with two address messaging at around 100 msg/sec. In the logs I am seeing the following errors: {{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient Data to Determine Tag}} {{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't convert message field body}} Which, in turn, lead to python errors like: {{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 area=0 mobile_seq=0)}} {{2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message processing}} {{ Traceback (most recent call last):}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, in handleControlMessage}} {{ self.mobile_address_engine.handle_mau(msg, now)}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, in handle_mau}} {{ node = self.node_tracker.router_node(msg.id)}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in router_node}} {{ return self.nodes[node_id]}} {{ KeyError: None}} {{2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: opcode=MAU body=None}} I have tracked down the cause of the "Insufficient Data to Determine Tag" message to the following: During the call to {{qd_parse}} of the {{MAU}} message the {{qd_iterator_t}} reaches the end of the buffer list before it should. Specifically the call to {{qd_iterator_advance}} here [https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails"; to move forward a certain number of bytes (e.g. 31) even though the {{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 84802). The "fail" is because we reach the end of the buffer list before we should (here [https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]). What I have not been able to figure out yet is why this happens because it is not consistent. Many large MAU message are parsed correctly only sometimes not. I am able to reproduce these errors every time I run my tests. There may be a time component involved because the more logging I to the router code, the less often the errors seem to occur. was: After DISPATCH-966 has been fixed I am still experiencing problems in a network with many clients / addresses. I am running a three node fully connected mesh of dispatch routers with 1 attached clients all with two address messaging at around 100 msg/sec. In the logs I am seeing the following errors: {{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient Data to Determine Tag}} {{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't convert message field body}} Which, in turn, lead to python errors like: {{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 area=0 mobile_seq=0)}} {{ 2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message processing}} {{ Traceback (most recent call last):}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, in handleControlMessage}} {{self.mobile_address_engine.handle_mau(msg, now)}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, in handle_mau}} {{node = self.node_tracker.router_node(msg.id)}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in router_node}} {{return self.nodes[node_id]}} {{ KeyError: None}} {{ 2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: opcode=MAU body=None}} I have tracked down the cause of the "Insufficient Data to Determine Tag" message to the following: During the call to {{qd_parse}} of the {{MAU}} message the {{qd_iterator_t}} reaches the end of the buffer list before it should. Specifically the call to {{qd_iterator_advance}} here [https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails"; to move forward a certain number of bytes (e.g. 31) even though the {{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 84802). The "fail" is because we reach the end of the buffer list before we should (here [https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]). What I have not been able to figure out yet is why this happens because it is not consistent. Many large MAU message are parsed correctly only sometimes not. I am able to reproduce these errors every
[jira] [Updated] (DISPATCH-1019) Messaging instability in networks with many clients / addresses.
[ https://issues.apache.org/jira/browse/DISPATCH-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Meulemans updated DISPATCH-1019: --- Description: After DISPATCH-966 has been fixed I am still experiencing problems in a network with many clients / addresses. I am running a three node fully connected mesh of dispatch routers with 1 attached clients all with two address messaging at around 100 msg/sec. In the logs I am seeing the following errors: {{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient Data to Determine Tag}} {{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't convert message field body}} Which, in turn, lead to python errors like: {{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 area=0 mobile_seq=0)}} {{ 2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message processing}} {{ Traceback (most recent call last):}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, in handleControlMessage}} {{self.mobile_address_engine.handle_mau(msg, now)}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, in handle_mau}} {{node = self.node_tracker.router_node(msg.id)}} {{File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in router_node}} {{return self.nodes[node_id]}} {{ KeyError: None}} {{ 2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: opcode=MAU body=None}} I have tracked down the cause of the "Insufficient Data to Determine Tag" message to the following: During the call to {{qd_parse}} of the {{MAU}} message the {{qd_iterator_t}} reaches the end of the buffer list before it should. Specifically the call to {{qd_iterator_advance}} here [https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails"; to move forward a certain number of bytes (e.g. 31) even though the {{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 84802). The "fail" is because we reach the end of the buffer list before we should (here [https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]). What I have not been able to figure out yet is why this happens because it is not consistent. Many large MAU message are parsed correctly only sometimes not. I am able to reproduce these errors every time I run my tests. There may be a time component involved because the more logging I to the router code, the less often the errors seem to occur. was: After DISPATCH-966 has been fixed I am still experiencing problems in a network with many clients / addresses. I am running a three node fully connected mesh of dispatch routers with 1 attached clients all with two address messaging at around 100 msg/sec. In the logs I am seeing the following errors: {{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient Data to Determine Tag}} {{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't convert message field body}} Which, in turn, lead to python errors like: {{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 area=0 mobile_seq=0)}} {{2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message processing}} {{Traceback (most recent call last):}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, in handleControlMessage}} {{ self.mobile_address_engine.handle_mau(msg, now)}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, in handle_mau}} {{ node = self.node_tracker.router_node(msg.id)}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in router_node}} {{ return self.nodes[node_id]}} {{KeyError: None}} {{2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: opcode=MAU body=None}} I have tracked down the cause of the "Insufficient Data to Determine Tag" message to the following: During the call to {{qd_parse}} of the {{MAU}} message the {{qd_iterator_t}} reaches the end of the buffer list before it should. Specifically the call to {{qd_iterator_advance}} here [https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails"; to move forward a certain number of bytes (e.g. 31) even though the {{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 84802). The "fail" is because we reach the end of the buffer list before we should (here [https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]). What I have not been able to figure out yet is why this happens because it is not consistent. Many large MAU message are parsed correctly only sometimes not. I am able to reproduce these errors every ti
[jira] [Created] (DISPATCH-1019) Messaging instability in networks with many clients / addresses.
Marcel Meulemans created DISPATCH-1019: -- Summary: Messaging instability in networks with many clients / addresses. Key: DISPATCH-1019 URL: https://issues.apache.org/jira/browse/DISPATCH-1019 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 1.1.0 Reporter: Marcel Meulemans After DISPATCH-966 has been fixed I am still experiencing problems in a network with many clients / addresses. I am running a three node fully connected mesh of dispatch routers with 1 attached clients all with two address messaging at around 100 msg/sec. In the logs I am seeing the following errors: {{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient Data to Determine Tag}} {{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't convert message field body}} Which, in turn, lead to python errors like: {{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 area=0 mobile_seq=0)}} {{2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message processing}} {{Traceback (most recent call last):}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, in handleControlMessage}} {{ self.mobile_address_engine.handle_mau(msg, now)}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, in handle_mau}} {{ node = self.node_tracker.router_node(msg.id)}} {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in router_node}} {{ return self.nodes[node_id]}} {{KeyError: None}} {{2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: opcode=MAU body=None}} I have tracked down the cause of the "Insufficient Data to Determine Tag" message to the following: During the call to {{qd_parse}} of the {{MAU}} message the {{qd_iterator_t}} reaches the end of the buffer list before it should. Specifically the call to {{qd_iterator_advance}} here [https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails"; to move forward a certain number of bytes (e.g. 31) even though the {{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 84802). The "fail" is because we reach the end of the buffer list before we should (here [https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]). What I have not been able to figure out yet is why this happens because it is not consistent. Many large MAU message are parsed correctly only sometimes not. I am able to reproduce these errors every time I run my tests. There may be a time component involved because the more logging I to the router code, the less often the errors seem to occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-966) Qpid dispatch unstable inter-router connections
[ https://issues.apache.org/jira/browse/DISPATCH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479173#comment-16479173 ] Marcel Meulemans commented on DISPATCH-966: --- [~ganeshmurthy], you are correct about PROTON-1514 fixing the initial symptoms. I just reran my test setup with an current master branch of proton and dispatch without the allowUnsettledMulticast:true setting and I no longer see the inter router connections dropping. > Qpid dispatch unstable inter-router connections > --- > > Key: DISPATCH-966 > URL: https://issues.apache.org/jira/browse/DISPATCH-966 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 1.0.1 >Reporter: Marcel Meulemans >Assignee: Ted Ross >Priority: Blocker > Fix For: 1.1.0 > > Attachments: inconsistent-settlement.log, > qdrouterd-unsettled-true.log, qdrouterd.conf, qdrouterd.log, > router-unsettled-true.dump, router.dump > > > I am running a three node fully connected mesh of dispatch routers with 1 > attached clients and I am seeing some unstable inter-router connections (I am > sending around 1000 small, less than 1K, messages per second through the > network). The inter-router connections fail every so many seconds with the > message: > {{Connection to router-2:55672 failed: amqp:session:invalid-field sequencing > error, expected delivery-id 7, got 6}} > (the numbers 7 and 6 differ per connection loss) > In wireshark, using the attached tcpdump capture, I can see that every time > before the inter router connection is dropped, therw is a rejected > disposition with the message: > {{Condition: qd:forbidden}} > {{Description: Deliveries to a multicast address must be pre-settled}} > The routers are connected as follows: > * router-0 -> router-1 > * router-0 -> router-2 > * router-1 -> router-2 > The routers are running as a docker container (debian stretch) on google > compute engine machines (every router on a separate node). > Attached are: > * my qdrouter.conf (from one of the routers) > * a log snippet from router-0 at debug level from connection drop to > connection re-established to connection drop again. > * a tcpdump capture of the inter-router connection between router-0 and > router-1 during which several of the failures occur > Versions: > * qpid-dispatch@1.0.1-rc1 > * qpid-proton@0.20.0 > > [^qdrouterd.log] > [^qdrouterd.conf] > [^router.dump] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-966) Qpid dispatch unstable inter-router connections
[ https://issues.apache.org/jira/browse/DISPATCH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479167#comment-16479167 ] Marcel Meulemans commented on DISPATCH-966: --- [~tedross], that's great! Is the other problem you are referring too related to the further handling of such multi transfer deliveries by the router (I think I see the large MAU message arriving, but not being processed after the receive is complete)? If so I can stop looking into this :P > Qpid dispatch unstable inter-router connections > --- > > Key: DISPATCH-966 > URL: https://issues.apache.org/jira/browse/DISPATCH-966 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 1.0.1 >Reporter: Marcel Meulemans >Assignee: Ted Ross >Priority: Blocker > Fix For: 1.1.0 > > Attachments: inconsistent-settlement.log, > qdrouterd-unsettled-true.log, qdrouterd.conf, qdrouterd.log, > router-unsettled-true.dump, router.dump > > > I am running a three node fully connected mesh of dispatch routers with 1 > attached clients and I am seeing some unstable inter-router connections (I am > sending around 1000 small, less than 1K, messages per second through the > network). The inter-router connections fail every so many seconds with the > message: > {{Connection to router-2:55672 failed: amqp:session:invalid-field sequencing > error, expected delivery-id 7, got 6}} > (the numbers 7 and 6 differ per connection loss) > In wireshark, using the attached tcpdump capture, I can see that every time > before the inter router connection is dropped, therw is a rejected > disposition with the message: > {{Condition: qd:forbidden}} > {{Description: Deliveries to a multicast address must be pre-settled}} > The routers are connected as follows: > * router-0 -> router-1 > * router-0 -> router-2 > * router-1 -> router-2 > The routers are running as a docker container (debian stretch) on google > compute engine machines (every router on a separate node). > Attached are: > * my qdrouter.conf (from one of the routers) > * a log snippet from router-0 at debug level from connection drop to > connection re-established to connection drop again. > * a tcpdump capture of the inter-router connection between router-0 and > router-1 during which several of the failures occur > Versions: > * qpid-dispatch@1.0.1-rc1 > * qpid-proton@0.20.0 > > [^qdrouterd.log] > [^qdrouterd.conf] > [^router.dump] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-966) Qpid dispatch unstable inter-router connections
[ https://issues.apache.org/jira/browse/DISPATCH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478735#comment-16478735 ] Marcel Meulemans commented on DISPATCH-966: --- Sorry for the show response, but now I finally got around to doing some follow up on this. Turns out the python exceptions are a side effect, not the actual problem (however I'll try to reproduce the stack trace later if only to improve the python response to the situation). The actual problem is cause by this code (as far as I can see): [https://github.com/apache/qpid-dispatch/blob/master/src/message.c#L1168] ... In my situation with 1 clients (each with two unique addresses), the MAU messages exchanged between routers can become quite large, so large that the limit set on the number of msg->content->buffers (qd_message_Q2_holdoff_should_block) is hit. This holdoff is unblocked when buffers are freed up by sending them out, but as the MAU message is not being sent out the holdoff is never unblocked. As a consequence all communication on this link comes to a halt (some message still arrive on the link until the credit is used up, but are never processed by the router code) and eventually the network breaks down. It seems to me that this blocking should not occur on messages that are not going to be send out. I verified my theory by increasing QD_QLIMIT_Q2_UPPER and observing that the problem goes away, but that is of course not a correct solution. I don't know enough about the router internals to propose a solution other than the qd_message_Q2_holdoff_should_block implementation ([https://github.com/apache/qpid-dispatch/blob/master/src/message.c#L1950)] should probably also take into account that not all messages are sent out to other destinations. Btw, I have not been able to figure out how this leads to the initial error "Deliveries to a multicast address must be pre-settled". What I did notice it that proton trace logging is showing inconsistent settlement flag for messages that are split over multiple transfer frames (see [^inconsistent-settlement.log]). > Qpid dispatch unstable inter-router connections > --- > > Key: DISPATCH-966 > URL: https://issues.apache.org/jira/browse/DISPATCH-966 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 1.0.1 >Reporter: Marcel Meulemans >Assignee: Ted Ross >Priority: Blocker > Fix For: 1.1.0 > > Attachments: inconsistent-settlement.log, > qdrouterd-unsettled-true.log, qdrouterd.conf, qdrouterd.log, > router-unsettled-true.dump, router.dump > > > I am running a three node fully connected mesh of dispatch routers with 1 > attached clients and I am seeing some unstable inter-router connections (I am > sending around 1000 small, less than 1K, messages per second through the > network). The inter-router connections fail every so many seconds with the > message: > {{Connection to router-2:55672 failed: amqp:session:invalid-field sequencing > error, expected delivery-id 7, got 6}} > (the numbers 7 and 6 differ per connection loss) > In wireshark, using the attached tcpdump capture, I can see that every time > before the inter router connection is dropped, therw is a rejected > disposition with the message: > {{Condition: qd:forbidden}} > {{Description: Deliveries to a multicast address must be pre-settled}} > The routers are connected as follows: > * router-0 -> router-1 > * router-0 -> router-2 > * router-1 -> router-2 > The routers are running as a docker container (debian stretch) on google > compute engine machines (every router on a separate node). > Attached are: > * my qdrouter.conf (from one of the routers) > * a log snippet from router-0 at debug level from connection drop to > connection re-established to connection drop again. > * a tcpdump capture of the inter-router connection between router-0 and > router-1 during which several of the failures occur > Versions: > * qpid-dispatch@1.0.1-rc1 > * qpid-proton@0.20.0 > > [^qdrouterd.log] > [^qdrouterd.conf] > [^router.dump] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-966) Qpid dispatch unstable inter-router connections
[ https://issues.apache.org/jira/browse/DISPATCH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Meulemans updated DISPATCH-966: -- Attachment: inconsistent-settlement.log > Qpid dispatch unstable inter-router connections > --- > > Key: DISPATCH-966 > URL: https://issues.apache.org/jira/browse/DISPATCH-966 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 1.0.1 >Reporter: Marcel Meulemans >Assignee: Ted Ross >Priority: Blocker > Fix For: 1.1.0 > > Attachments: inconsistent-settlement.log, > qdrouterd-unsettled-true.log, qdrouterd.conf, qdrouterd.log, > router-unsettled-true.dump, router.dump > > > I am running a three node fully connected mesh of dispatch routers with 1 > attached clients and I am seeing some unstable inter-router connections (I am > sending around 1000 small, less than 1K, messages per second through the > network). The inter-router connections fail every so many seconds with the > message: > {{Connection to router-2:55672 failed: amqp:session:invalid-field sequencing > error, expected delivery-id 7, got 6}} > (the numbers 7 and 6 differ per connection loss) > In wireshark, using the attached tcpdump capture, I can see that every time > before the inter router connection is dropped, therw is a rejected > disposition with the message: > {{Condition: qd:forbidden}} > {{Description: Deliveries to a multicast address must be pre-settled}} > The routers are connected as follows: > * router-0 -> router-1 > * router-0 -> router-2 > * router-1 -> router-2 > The routers are running as a docker container (debian stretch) on google > compute engine machines (every router on a separate node). > Attached are: > * my qdrouter.conf (from one of the routers) > * a log snippet from router-0 at debug level from connection drop to > connection re-established to connection drop again. > * a tcpdump capture of the inter-router connection between router-0 and > router-1 during which several of the failures occur > Versions: > * qpid-dispatch@1.0.1-rc1 > * qpid-proton@0.20.0 > > [^qdrouterd.log] > [^qdrouterd.conf] > [^router.dump] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-974) Getting connections via the router management protocol causes AMQP framing errors
[ https://issues.apache.org/jira/browse/DISPATCH-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449877#comment-16449877 ] Marcel Meulemans commented on DISPATCH-974: --- Dug a little deeper and it seems that the problem is somehow related to response size. When I use qdmanage to get only the container names of every connection (qdmanage -t 30 query --type=org.apache.qpid.dispatch.connection container) I do get the full 5000 connections. I even tested up to 25000 connections and that still works. > Getting connections via the router management protocol causes AMQP framing > errors > - > > Key: DISPATCH-974 > URL: https://issues.apache.org/jira/browse/DISPATCH-974 > Project: Qpid Dispatch > Issue Type: Bug > Components: Management Agent >Affects Versions: 1.0.1 >Reporter: Marcel Meulemans >Priority: Major > Attachments: qdrouter-frame-errors.pcapng.gz > > > I am running a standalone router with 5000 clients connected. When I try to > get all connections via qdstat (qdstat --limit 5000 -c) something goes wrong > (seems to be a framing error). The output from qdstat is: > {{ MessageException: [-10]: data error: (null)}} > The problems seems to somehow be related to result size because when I set > the limit to less I get the list of connections as expected. In my situation > the critical limit is 3447 (i.e. 3447 result in the expected list of > connections, 3448 result in the error above). It does not seem to be frame > size related because getting 3447 connection is already spread over transfer > frames (256182, 256512 and 159399 bytes). > The error is not qdstat related because using some plain proton code to > create a management query results in the same problem. Ultimately the call to > pn_message_decode with data receive from the router fails (also wireshark can > not decode the final frame). > I have attached a wireshark dump to the qdstat session with the router > ([^qdrouter-frame-errors.pcapng.gz]). The logs of the router (at info level) > contain no further information. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-974) Getting connections via the router management protocol causes AMQP framing errors
Marcel Meulemans created DISPATCH-974: - Summary: Getting connections via the router management protocol causes AMQP framing errors Key: DISPATCH-974 URL: https://issues.apache.org/jira/browse/DISPATCH-974 Project: Qpid Dispatch Issue Type: Bug Components: Management Agent Affects Versions: 1.0.1 Reporter: Marcel Meulemans Attachments: qdrouter-frame-errors.pcapng.gz I am running a standalone router with 5000 clients connected. When I try to get all connections via qdstat (qdstat --limit 5000 -c) something goes wrong (seems to be a framing error). The output from qdstat is: {{ MessageException: [-10]: data error: (null)}} The problems seems to somehow be related to result size because when I set the limit to less I get the list of connections as expected. In my situation the critical limit is 3447 (i.e. 3447 result in the expected list of connections, 3448 result in the error above). It does not seem to be frame size related because getting 3447 connection is already spread over transfer frames (256182, 256512 and 159399 bytes). The error is not qdstat related because using some plain proton code to create a management query results in the same problem. Ultimately the call to pn_message_decode with data receive from the router fails (also wireshark can not decode the final frame). I have attached a wireshark dump to the qdstat session with the router ([^qdrouter-frame-errors.pcapng.gz]). The logs of the router (at info level) contain no further information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (DISPATCH-966) Qpid dispatch unstable inter-router connections
[ https://issues.apache.org/jira/browse/DISPATCH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437039#comment-16437039 ] Marcel Meulemans edited comment on DISPATCH-966 at 4/13/18 11:18 AM: - Added to the configuration and ran the test again several times. However now I see some things I did not expect; the network seems to come up correctly, but after a while it seems to fail in a weird way. The inter-router connections do not seems to drop anymore but routing via the network does not seem to work (i.e.ROUTER_LS (info) Computed next hops: {} and qdstat -n show only a single router). Maybe this is the issue unmasked by allowing unsettled multicasts? I attached two more files: * the logs of router-0 (from router start until slightly after the network fails) at info level * a tcpdump to the inter router communication to an from router-0 (tcpdump -i eth0 tcp port 55672 -s 65535) also from router start until slightly after the network fails I hope this helps (the dump is fairly large, so I hope you can find any hidden needles). -- Marcel was (Author: mmeulemans): Added to the configuration and ran the test again several times. However now I see some things I did not expect; the network seems to come up correctly, but after a while it seems to fail in a weird way. The inter-router connections do not seems to drop anymore but routing via the network does not seem to work (i.e.ROUTER_LS (info) Computed next hops: {} and qdstat -n show only a single router). Maybe this is the issue unmasked by allowing unsettled multicasts? I attached two more file: * the logs of router-0 (from router start until slightly after the network fails) at info level * a tcpdump to the inter router communication to an from router-0 (tcpdump -i eth0 tcp port 55672 -s 65535) I hope this helps (the dump is fairly large, so I hope you can find any hidden needles). -- Marcel > Qpid dispatch unstable inter-router connections > --- > > Key: DISPATCH-966 > URL: https://issues.apache.org/jira/browse/DISPATCH-966 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 1.0.1 >Reporter: Marcel Meulemans >Assignee: Ted Ross >Priority: Major > Attachments: qdrouterd-unsettled-true.log, qdrouterd.conf, > qdrouterd.log, router-unsettled-true.dump, router.dump > > > I am running a three node fully connected mesh of dispatch routers with 1 > attached clients and I am seeing some unstable inter-router connections (I am > sending around 1000 small, less than 1K, messages per second through the > network). The inter-router connections fail every so many seconds with the > message: > {{Connection to router-2:55672 failed: amqp:session:invalid-field sequencing > error, expected delivery-id 7, got 6}} > (the numbers 7 and 6 differ per connection loss) > In wireshark, using the attached tcpdump capture, I can see that every time > before the inter router connection is dropped, therw is a rejected > disposition with the message: > {{Condition: qd:forbidden}} > {{Description: Deliveries to a multicast address must be pre-settled}} > The routers are connected as follows: > * router-0 -> router-1 > * router-0 -> router-2 > * router-1 -> router-2 > The routers are running as a docker container (debian stretch) on google > compute engine machines (every router on a separate node). > Attached are: > * my qdrouter.conf (from one of the routers) > * a log snippet from router-0 at debug level from connection drop to > connection re-established to connection drop again. > * a tcpdump capture of the inter-router connection between router-0 and > router-1 during which several of the failures occur > Versions: > * qpid-dispatch@1.0.1-rc1 > * qpid-proton@0.20.0 > > [^qdrouterd.log] > [^qdrouterd.conf] > [^router.dump] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-966) Qpid dispatch unstable inter-router connections
[ https://issues.apache.org/jira/browse/DISPATCH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Meulemans updated DISPATCH-966: -- Attachment: qdrouterd-unsettled-true.log router-unsettled-true.dump > Qpid dispatch unstable inter-router connections > --- > > Key: DISPATCH-966 > URL: https://issues.apache.org/jira/browse/DISPATCH-966 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 1.0.1 >Reporter: Marcel Meulemans >Assignee: Ted Ross >Priority: Major > Attachments: qdrouterd-unsettled-true.log, qdrouterd.conf, > qdrouterd.log, router-unsettled-true.dump, router.dump > > > I am running a three node fully connected mesh of dispatch routers with 1 > attached clients and I am seeing some unstable inter-router connections (I am > sending around 1000 small, less than 1K, messages per second through the > network). The inter-router connections fail every so many seconds with the > message: > {{Connection to router-2:55672 failed: amqp:session:invalid-field sequencing > error, expected delivery-id 7, got 6}} > (the numbers 7 and 6 differ per connection loss) > In wireshark, using the attached tcpdump capture, I can see that every time > before the inter router connection is dropped, therw is a rejected > disposition with the message: > {{Condition: qd:forbidden}} > {{Description: Deliveries to a multicast address must be pre-settled}} > The routers are connected as follows: > * router-0 -> router-1 > * router-0 -> router-2 > * router-1 -> router-2 > The routers are running as a docker container (debian stretch) on google > compute engine machines (every router on a separate node). > Attached are: > * my qdrouter.conf (from one of the routers) > * a log snippet from router-0 at debug level from connection drop to > connection re-established to connection drop again. > * a tcpdump capture of the inter-router connection between router-0 and > router-1 during which several of the failures occur > Versions: > * qpid-dispatch@1.0.1-rc1 > * qpid-proton@0.20.0 > > [^qdrouterd.log] > [^qdrouterd.conf] > [^router.dump] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-966) Qpid dispatch unstable inter-router connections
[ https://issues.apache.org/jira/browse/DISPATCH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437039#comment-16437039 ] Marcel Meulemans commented on DISPATCH-966: --- Added to the configuration and ran the test again several times. However now I see some things I did not expect; the network seems to come up correctly, but after a while it seems to fail in a weird way. The inter-router connections do not seems to drop anymore but routing via the network does not seem to work (i.e.ROUTER_LS (info) Computed next hops: {} and qdstat -n show only a single router). Maybe this is the issue unmasked by allowing unsettled multicasts? I attached two more file: * the logs of router-0 (from router start until slightly after the network fails) at info level * a tcpdump to the inter router communication to an from router-0 (tcpdump -i eth0 tcp port 55672 -s 65535) I hope this helps (the dump is fairly large, so I hope you can find any hidden needles). -- Marcel > Qpid dispatch unstable inter-router connections > --- > > Key: DISPATCH-966 > URL: https://issues.apache.org/jira/browse/DISPATCH-966 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 1.0.1 >Reporter: Marcel Meulemans >Assignee: Ted Ross >Priority: Major > Attachments: qdrouterd-unsettled-true.log, qdrouterd.conf, > qdrouterd.log, router-unsettled-true.dump, router.dump > > > I am running a three node fully connected mesh of dispatch routers with 1 > attached clients and I am seeing some unstable inter-router connections (I am > sending around 1000 small, less than 1K, messages per second through the > network). The inter-router connections fail every so many seconds with the > message: > {{Connection to router-2:55672 failed: amqp:session:invalid-field sequencing > error, expected delivery-id 7, got 6}} > (the numbers 7 and 6 differ per connection loss) > In wireshark, using the attached tcpdump capture, I can see that every time > before the inter router connection is dropped, therw is a rejected > disposition with the message: > {{Condition: qd:forbidden}} > {{Description: Deliveries to a multicast address must be pre-settled}} > The routers are connected as follows: > * router-0 -> router-1 > * router-0 -> router-2 > * router-1 -> router-2 > The routers are running as a docker container (debian stretch) on google > compute engine machines (every router on a separate node). > Attached are: > * my qdrouter.conf (from one of the routers) > * a log snippet from router-0 at debug level from connection drop to > connection re-established to connection drop again. > * a tcpdump capture of the inter-router connection between router-0 and > router-1 during which several of the failures occur > Versions: > * qpid-dispatch@1.0.1-rc1 > * qpid-proton@0.20.0 > > [^qdrouterd.log] > [^qdrouterd.conf] > [^router.dump] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-966) Qpid dispatch unstable inter-router connections
Marcel Meulemans created DISPATCH-966: - Summary: Qpid dispatch unstable inter-router connections Key: DISPATCH-966 URL: https://issues.apache.org/jira/browse/DISPATCH-966 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 1.0.1 Reporter: Marcel Meulemans Attachments: qdrouterd.conf, qdrouterd.log, router.dump I am running a three node fully connected mesh of dispatch routers with 1 attached clients and I am seeing some unstable inter-router connections (I am sending around 1000 small, less than 1K, messages per second through the network). The inter-router connections fail every so many seconds with the message: {{Connection to router-2:55672 failed: amqp:session:invalid-field sequencing error, expected delivery-id 7, got 6}} (the numbers 7 and 6 differ per connection loss) In wireshark, using the attached tcpdump capture, I can see that every time before the inter router connection is dropped, therw is a rejected disposition with the message: {{Condition: qd:forbidden}} {{Description: Deliveries to a multicast address must be pre-settled}} The routers are connected as follows: * router-0 -> router-1 * router-0 -> router-2 * router-1 -> router-2 The routers are running as a docker container (debian stretch) on google compute engine machines (every router on a separate node). Attached are: * my qdrouter.conf (from one of the routers) * a log snippet from router-0 at debug level from connection drop to connection re-established to connection drop again. * a tcpdump capture of the inter-router connection between router-0 and router-1 during which several of the failures occur Versions: * qpid-dispatch@1.0.1-rc1 * qpid-proton@0.20.0 [^qdrouterd.log] [^qdrouterd.conf] [^router.dump] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org