Re: Testing failover on dispatcher/java-broker cluster
Hello Ted, I confirm all my tests are GREEN at head of 0.6.x branch. For reference: Qpid Java Broker: 6.0.4 Qpid Proton: 0.12.2 Compiler: gcc 4.9.1 OS: Linux Red Hat Regards, Adel From: Adel Boutros Sent: Friday, September 30, 2016 3:07:56 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster Great! I have synched your changes and we will run my tests. I will get back to you with the results as soon as possible. Regards, Adel From: Ted Ross Sent: Friday, September 30, 2016 2:39:51 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster Done. I've pushed the four cherry-picked commits to the 0.6.x branch if you'd like to give it a go. -Ted On 09/30/2016 05:47 AM, Adel Boutros wrote: > Hello Ted, > > > Following discussions here > (http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html), > can DISPATCH-500 be included in the minor release? > > > PS: It still hasn't solved my below issue but I will continue the analysis on > the other thread > > > Regards, > > Adel > > Apache Qpid users - [Dispatch router 0.6.1] Configuration > bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html> > qpid.2158936.n2.nabble.com > [Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my > previous thread, I am having some issues with the dispatch router. I will > start with the first one here: It seems the... > > > > From: Adel Boutros > Sent: Thursday, September 29, 2016 5:01:45 PM > To: users@qpid.apache.org > Subject: Re: Testing failover on dispatcher/java-broker cluster > > I would expect what you have described however it doesn't seem to be the case. > > > delete/recreate mobile address: > > qdmanage -b amqp://localhost:10501 delete --type=address --name > haProxy.queue.addr > qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue > waypoint=true name=haProxy.queue.addr > > The stats remain at a positive value (10 10). If I restart the dispatchers > without the inter-router connection, I don't have the issue. > > Router Addresses > class addr phs distribin-proc local > remote cntnr in out thru to-proc from-proc > > == > mobile haProxy.queue 1balanced 0 0 0 > 000 0 0 0 > mobile haProxy.queue 0balanced 0 1 0 > 0 10 10 0 0 0 > > > Adel > > > From: Ted Ross > Sent: Thursday, September 29, 2016 4:55 PM > To: users@qpid.apache.org > Subject: Re: Testing failover on dispatcher/java-broker cluster > > > > On 09/29/2016 10:47 AM, Adel Boutros wrote: >> They seem fair enough and quite related. >> >> >> As a side note, I have a bug with the dispatch router 0.6.1 but I haven't >> submitted it yet because I haven't reduced the test case yet. >> >> In resume, when I connect 2 dispatchers (inter-router) and then delete the >> connector/listener of "inter-router". If I delete and recreate a mobile >> address which has received a message on one of the dispatchers, the stats of >> the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain >> at the old values. However they reset correctly on the other router. > > What exactly do you mean by "delete and recreate a mobile address"? > > If an address is removed from the table, the next time it appears, a new > record will be created for that address. The new record will have > zeroed statistics. What behavior are you expecting? > >> >> >> Have you encountered something similar? Once I have a reduced test case, I >> will post it in a different thread of course. >> >> >> Regards, >> >> Adel >> >> >> From: Ted Ross >> Sent: Thursday, September 29, 2016 4:38:26 PM >> To: users@qpid.apache.org >> Subject: Re: Testing failover on dispatcher/java-broker cluster >> >> Sorry, those Jira numbers and descriptions are mismatched. Here's the >> correct list: >> >> - DISPATCH-496 - Activation of an autolink does not result in issuing >> credit to a blocked sender >> - DISPATCH-505 - E
Re: Testing failover on dispatcher/java-broker cluster
Great! I have synched your changes and we will run my tests. I will get back to you with the results as soon as possible. Regards, Adel From: Ted Ross Sent: Friday, September 30, 2016 2:39:51 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster Done. I've pushed the four cherry-picked commits to the 0.6.x branch if you'd like to give it a go. -Ted On 09/30/2016 05:47 AM, Adel Boutros wrote: > Hello Ted, > > > Following discussions here > (http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html), > can DISPATCH-500 be included in the minor release? > > > PS: It still hasn't solved my below issue but I will continue the analysis on > the other thread > > > Regards, > > Adel > > Apache Qpid users - [Dispatch router 0.6.1] Configuration > bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html> > qpid.2158936.n2.nabble.com > [Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my > previous thread, I am having some issues with the dispatch router. I will > start with the first one here: It seems the... > > > > From: Adel Boutros > Sent: Thursday, September 29, 2016 5:01:45 PM > To: users@qpid.apache.org > Subject: Re: Testing failover on dispatcher/java-broker cluster > > I would expect what you have described however it doesn't seem to be the case. > > > delete/recreate mobile address: > > qdmanage -b amqp://localhost:10501 delete --type=address --name > haProxy.queue.addr > qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue > waypoint=true name=haProxy.queue.addr > > The stats remain at a positive value (10 10). If I restart the dispatchers > without the inter-router connection, I don't have the issue. > > Router Addresses > class addr phs distribin-proc local > remote cntnr in out thru to-proc from-proc > > == > mobile haProxy.queue 1balanced 0 0 0 > 000 0 0 0 > mobile haProxy.queue 0balanced 0 1 0 > 0 10 10 0 0 0 > > > Adel > > > From: Ted Ross > Sent: Thursday, September 29, 2016 4:55 PM > To: users@qpid.apache.org > Subject: Re: Testing failover on dispatcher/java-broker cluster > > > > On 09/29/2016 10:47 AM, Adel Boutros wrote: >> They seem fair enough and quite related. >> >> >> As a side note, I have a bug with the dispatch router 0.6.1 but I haven't >> submitted it yet because I haven't reduced the test case yet. >> >> In resume, when I connect 2 dispatchers (inter-router) and then delete the >> connector/listener of "inter-router". If I delete and recreate a mobile >> address which has received a message on one of the dispatchers, the stats of >> the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain >> at the old values. However they reset correctly on the other router. > > What exactly do you mean by "delete and recreate a mobile address"? > > If an address is removed from the table, the next time it appears, a new > record will be created for that address. The new record will have > zeroed statistics. What behavior are you expecting? > >> >> >> Have you encountered something similar? Once I have a reduced test case, I >> will post it in a different thread of course. >> >> >> Regards, >> >> Adel >> >> >> From: Ted Ross >> Sent: Thursday, September 29, 2016 4:38:26 PM >> To: users@qpid.apache.org >> Subject: Re: Testing failover on dispatcher/java-broker cluster >> >> Sorry, those Jira numbers and descriptions are mismatched. Here's the >> correct list: >> >> - DISPATCH-496 - Activation of an autolink does not result in issuing >> credit to a blocked sender >> - DISPATCH-505 - Eventual loss of credit on inter-router control >> links when the topology changes >> - DISPATCH-523 - Topology changes can cause in-flight deliveries to >> be stuck in the ingress router >> >> >> On 09/29/2016 10:35 AM, Ted Ross wrote: >>> >>> On 09/24/2016 05:32 AM, Adel Boutr
Re: Testing failover on dispatcher/java-broker cluster
Done. I've pushed the four cherry-picked commits to the 0.6.x branch if you'd like to give it a go. -Ted On 09/30/2016 05:47 AM, Adel Boutros wrote: Hello Ted, Following discussions here (http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html), can DISPATCH-500 be included in the minor release? PS: It still hasn't solved my below issue but I will continue the analysis on the other thread Regards, Adel Apache Qpid users - [Dispatch router 0.6.1] Configuration bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html> qpid.2158936.n2.nabble.com [Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my previous thread, I am having some issues with the dispatch router. I will start with the first one here: It seems the... From: Adel Boutros Sent: Thursday, September 29, 2016 5:01:45 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster I would expect what you have described however it doesn't seem to be the case. delete/recreate mobile address: qdmanage -b amqp://localhost:10501 delete --type=address --name haProxy.queue.addr qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue waypoint=true name=haProxy.queue.addr The stats remain at a positive value (10 10). If I restart the dispatchers without the inter-router connection, I don't have the issue. Router Addresses class addr phs distribin-proc local remote cntnr in out thru to-proc from-proc == mobile haProxy.queue 1balanced 0 0 0 000 0 0 0 mobile haProxy.queue 0balanced 0 1 0 0 10 10 000 Adel From: Ted Ross Sent: Thursday, September 29, 2016 4:55 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster On 09/29/2016 10:47 AM, Adel Boutros wrote: They seem fair enough and quite related. As a side note, I have a bug with the dispatch router 0.6.1 but I haven't submitted it yet because I haven't reduced the test case yet. In resume, when I connect 2 dispatchers (inter-router) and then delete the connector/listener of "inter-router". If I delete and recreate a mobile address which has received a message on one of the dispatchers, the stats of the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain at the old values. However they reset correctly on the other router. What exactly do you mean by "delete and recreate a mobile address"? If an address is removed from the table, the next time it appears, a new record will be created for that address. The new record will have zeroed statistics. What behavior are you expecting? Have you encountered something similar? Once I have a reduced test case, I will post it in a different thread of course. Regards, Adel From: Ted Ross Sent: Thursday, September 29, 2016 4:38:26 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster Sorry, those Jira numbers and descriptions are mismatched. Here's the correct list: - DISPATCH-496 - Activation of an autolink does not result in issuing credit to a blocked sender - DISPATCH-505 - Eventual loss of credit on inter-router control links when the topology changes - DISPATCH-523 - Topology changes can cause in-flight deliveries to be stuck in the ingress router On 09/29/2016 10:35 AM, Ted Ross wrote: On 09/24/2016 05:32 AM, Adel Boutros wrote: We are indeed in favor of a minor release as long as the latest version is still 0.6.x and we are willing to re-launch our tests and give feedback on the release candidate once provided (It shouldn't take us more than a day to compile and test). Do you have a list of fixes in mind? I've identified three fixes that look like good candidates for 0.6.2: - DISPATCH-496 - Topology changes can cause in-flight deliveries to be stuck in the ingress router - DISPATCH-505 - Eventual loss of credit on inter-router control links when the topology changes - DISPATCH-523 - Activation of an autolink does not result in issuing credit to a blocked sender These are all stability-related issues. Thoughts? -Ted Regards,Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 23 Sep 2016 17:23:57 -0400 Hi Adel, A minor release is always possible. It's up
Re: Testing failover on dispatcher/java-broker cluster
Hello Ted, Following discussions here (http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html), can DISPATCH-500 be included in the minor release? PS: It still hasn't solved my below issue but I will continue the analysis on the other thread Regards, Adel Apache Qpid users - [Dispatch router 0.6.1] Configuration bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html> qpid.2158936.n2.nabble.com [Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my previous thread, I am having some issues with the dispatch router. I will start with the first one here: It seems the... From: Adel Boutros Sent: Thursday, September 29, 2016 5:01:45 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster I would expect what you have described however it doesn't seem to be the case. delete/recreate mobile address: qdmanage -b amqp://localhost:10501 delete --type=address --name haProxy.queue.addr qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue waypoint=true name=haProxy.queue.addr The stats remain at a positive value (10 10). If I restart the dispatchers without the inter-router connection, I don't have the issue. Router Addresses class addr phs distribin-proc local remote cntnr in out thru to-proc from-proc == mobile haProxy.queue 1balanced 0 0 0 000 0 0 0 mobile haProxy.queue 0balanced 0 1 0 0 10 10 000 Adel From: Ted Ross Sent: Thursday, September 29, 2016 4:55 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster On 09/29/2016 10:47 AM, Adel Boutros wrote: > They seem fair enough and quite related. > > > As a side note, I have a bug with the dispatch router 0.6.1 but I haven't > submitted it yet because I haven't reduced the test case yet. > > In resume, when I connect 2 dispatchers (inter-router) and then delete the > connector/listener of "inter-router". If I delete and recreate a mobile > address which has received a message on one of the dispatchers, the stats of > the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain > at the old values. However they reset correctly on the other router. What exactly do you mean by "delete and recreate a mobile address"? If an address is removed from the table, the next time it appears, a new record will be created for that address. The new record will have zeroed statistics. What behavior are you expecting? > > > Have you encountered something similar? Once I have a reduced test case, I > will post it in a different thread of course. > > > Regards, > > Adel > > ____________ > From: Ted Ross > Sent: Thursday, September 29, 2016 4:38:26 PM > To: users@qpid.apache.org > Subject: Re: Testing failover on dispatcher/java-broker cluster > > Sorry, those Jira numbers and descriptions are mismatched. Here's the > correct list: > > - DISPATCH-496 - Activation of an autolink does not result in issuing > credit to a blocked sender > - DISPATCH-505 - Eventual loss of credit on inter-router control > links when the topology changes > - DISPATCH-523 - Topology changes can cause in-flight deliveries to > be stuck in the ingress router > > > On 09/29/2016 10:35 AM, Ted Ross wrote: >> >> On 09/24/2016 05:32 AM, Adel Boutros wrote: >>> We are indeed in favor of a minor release as long as the latest >>> version is still 0.6.x and we are willing to re-launch our tests and >>> give feedback on the release candidate once provided (It shouldn't >>> take us more than a day to compile and test). >>> Do you have a list of fixes in mind? >> >> I've identified three fixes that look like good candidates for 0.6.2: >> >> - DISPATCH-496 - Topology changes can cause in-flight deliveries to >>be stuck in the ingress router >> - DISPATCH-505 - Eventual loss of credit on inter-router control >> links when the topology changes >> - DISPATCH-523 - Activation of an autolink does not result in issuing >>credit to a blocked sender >> >> These are all stability-related issues. >> >> Thoughts? >> >> -Ted >&
Re: Testing failover on dispatcher/java-broker cluster
I would expect what you have described however it doesn't seem to be the case. delete/recreate mobile address: qdmanage -b amqp://localhost:10501 delete --type=address --name haProxy.queue.addr qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue waypoint=true name=haProxy.queue.addr The stats remain at a positive value (10 10). If I restart the dispatchers without the inter-router connection, I don't have the issue. Router Addresses class addr phs distribin-proc local remote cntnr in out thru to-proc from-proc == mobile haProxy.queue 1balanced 0 0 0 000 0 0 0 mobile haProxy.queue 0balanced 0 1 0 0 10 10 000 Adel From: Ted Ross Sent: Thursday, September 29, 2016 4:55 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster On 09/29/2016 10:47 AM, Adel Boutros wrote: > They seem fair enough and quite related. > > > As a side note, I have a bug with the dispatch router 0.6.1 but I haven't > submitted it yet because I haven't reduced the test case yet. > > In resume, when I connect 2 dispatchers (inter-router) and then delete the > connector/listener of "inter-router". If I delete and recreate a mobile > address which has received a message on one of the dispatchers, the stats of > the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain > at the old values. However they reset correctly on the other router. What exactly do you mean by "delete and recreate a mobile address"? If an address is removed from the table, the next time it appears, a new record will be created for that address. The new record will have zeroed statistics. What behavior are you expecting? > > > Have you encountered something similar? Once I have a reduced test case, I > will post it in a different thread of course. > > > Regards, > > Adel > > ____________ > From: Ted Ross > Sent: Thursday, September 29, 2016 4:38:26 PM > To: users@qpid.apache.org > Subject: Re: Testing failover on dispatcher/java-broker cluster > > Sorry, those Jira numbers and descriptions are mismatched. Here's the > correct list: > > - DISPATCH-496 - Activation of an autolink does not result in issuing > credit to a blocked sender > - DISPATCH-505 - Eventual loss of credit on inter-router control > links when the topology changes > - DISPATCH-523 - Topology changes can cause in-flight deliveries to > be stuck in the ingress router > > > On 09/29/2016 10:35 AM, Ted Ross wrote: >> >> On 09/24/2016 05:32 AM, Adel Boutros wrote: >>> We are indeed in favor of a minor release as long as the latest >>> version is still 0.6.x and we are willing to re-launch our tests and >>> give feedback on the release candidate once provided (It shouldn't >>> take us more than a day to compile and test). >>> Do you have a list of fixes in mind? >> >> I've identified three fixes that look like good candidates for 0.6.2: >> >> - DISPATCH-496 - Topology changes can cause in-flight deliveries to >>be stuck in the ingress router >> - DISPATCH-505 - Eventual loss of credit on inter-router control >> links when the topology changes >> - DISPATCH-523 - Activation of an autolink does not result in issuing >>credit to a blocked sender >> >> These are all stability-related issues. >> >> Thoughts? >> >> -Ted >> >>> Regards,Adel >>> >>>> Subject: Re: Testing failover on dispatcher/java-broker cluster >>>> To: users@qpid.apache.org >>>> From: tr...@redhat.com >>>> Date: Fri, 23 Sep 2016 17:23:57 -0400 >>>> >>>> Hi Adel, >>>> >>>> A minor release is always possible. It's up to us, the community, to >>>> decide whether and when to produce one. I'm in favor of releasing an >>>> 0.6.2 with some small backports to fix bugs for users that want to stay >>>> on Proton 0.12. >>>> >>>> -Ted >>>> >>>> On 09/23/2016 09:44 AM, Adel Boutros wrote: >>>>> Hello Ted, >>>>> Did you happen to have the time to check if a minor release is >>>>> p
Re: Testing failover on dispatcher/java-broker cluster
On 09/29/2016 10:47 AM, Adel Boutros wrote: They seem fair enough and quite related. As a side note, I have a bug with the dispatch router 0.6.1 but I haven't submitted it yet because I haven't reduced the test case yet. In resume, when I connect 2 dispatchers (inter-router) and then delete the connector/listener of "inter-router". If I delete and recreate a mobile address which has received a message on one of the dispatchers, the stats of the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain at the old values. However they reset correctly on the other router. What exactly do you mean by "delete and recreate a mobile address"? If an address is removed from the table, the next time it appears, a new record will be created for that address. The new record will have zeroed statistics. What behavior are you expecting? Have you encountered something similar? Once I have a reduced test case, I will post it in a different thread of course. Regards, Adel From: Ted Ross Sent: Thursday, September 29, 2016 4:38:26 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster Sorry, those Jira numbers and descriptions are mismatched. Here's the correct list: - DISPATCH-496 - Activation of an autolink does not result in issuing credit to a blocked sender - DISPATCH-505 - Eventual loss of credit on inter-router control links when the topology changes - DISPATCH-523 - Topology changes can cause in-flight deliveries to be stuck in the ingress router On 09/29/2016 10:35 AM, Ted Ross wrote: On 09/24/2016 05:32 AM, Adel Boutros wrote: We are indeed in favor of a minor release as long as the latest version is still 0.6.x and we are willing to re-launch our tests and give feedback on the release candidate once provided (It shouldn't take us more than a day to compile and test). Do you have a list of fixes in mind? I've identified three fixes that look like good candidates for 0.6.2: - DISPATCH-496 - Topology changes can cause in-flight deliveries to be stuck in the ingress router - DISPATCH-505 - Eventual loss of credit on inter-router control links when the topology changes - DISPATCH-523 - Activation of an autolink does not result in issuing credit to a blocked sender These are all stability-related issues. Thoughts? -Ted Regards,Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 23 Sep 2016 17:23:57 -0400 Hi Adel, A minor release is always possible. It's up to us, the community, to decide whether and when to produce one. I'm in favor of releasing an 0.6.2 with some small backports to fix bugs for users that want to stay on Proton 0.12. -Ted On 09/23/2016 09:44 AM, Adel Boutros wrote: Hello Ted, Did you happen to have the time to check if a minor release is possible? Regards,Adel From: adelbout...@live.com To: users@qpid.apache.org Subject: RE: Testing failover on dispatcher/java-broker cluster Date: Tue, 20 Sep 2016 15:13:03 +0200 Hello Ted, I confirm the fix solved the issue. Would it be possible to do a 0.6.2 release? We cannot compile newer versions of Proton (We currently use 0.12.2) due to lack of resources from our side and we really need this fix for our tests. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Mon, 19 Sep 2016 12:18:23 -0400 Hi Adel, It's a one-liner and it applies cleanly to the 0.6.x branch. https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 -Ted On 09/19/2016 11:41 AM, Adel Boutros wrote: Hello Ted, Antoine is on vacation so I will be taking over this task. Does this fix have any dependencies? We would like to apply it on 0.6.1 without other fixes because it seems the master branch requires proton 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at the time being. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 16 Sep 2016 16:53:05 -0400 Antoine, I think I know what that problem is. I belileve you've stumbled upon this issue: https://issues.apache.org/jira/browse/DISPATCH-496 Your second delivery, the one resulting in a timeout, is causing the inbound link to be blocked (i.e. it has undelivered messages). When the broker reattaches, the blocked links are supposed to become unblocked but they don't in the case of auto-links. This has been fixed on the master branch if you'd like to try applying the patch. -Ted On 09/15/2016 04:56 AM, Antoine Chevin wrote: Hi Ted, You’re right, the connection close look
Re: Testing failover on dispatcher/java-broker cluster
They seem fair enough and quite related. As a side note, I have a bug with the dispatch router 0.6.1 but I haven't submitted it yet because I haven't reduced the test case yet. In resume, when I connect 2 dispatchers (inter-router) and then delete the connector/listener of "inter-router". If I delete and recreate a mobile address which has received a message on one of the dispatchers, the stats of the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain at the old values. However they reset correctly on the other router. Have you encountered something similar? Once I have a reduced test case, I will post it in a different thread of course. Regards, Adel From: Ted Ross Sent: Thursday, September 29, 2016 4:38:26 PM To: users@qpid.apache.org Subject: Re: Testing failover on dispatcher/java-broker cluster Sorry, those Jira numbers and descriptions are mismatched. Here's the correct list: - DISPATCH-496 - Activation of an autolink does not result in issuing credit to a blocked sender - DISPATCH-505 - Eventual loss of credit on inter-router control links when the topology changes - DISPATCH-523 - Topology changes can cause in-flight deliveries to be stuck in the ingress router On 09/29/2016 10:35 AM, Ted Ross wrote: > > On 09/24/2016 05:32 AM, Adel Boutros wrote: >> We are indeed in favor of a minor release as long as the latest >> version is still 0.6.x and we are willing to re-launch our tests and >> give feedback on the release candidate once provided (It shouldn't >> take us more than a day to compile and test). >> Do you have a list of fixes in mind? > > I've identified three fixes that look like good candidates for 0.6.2: > > - DISPATCH-496 - Topology changes can cause in-flight deliveries to >be stuck in the ingress router > - DISPATCH-505 - Eventual loss of credit on inter-router control >links when the topology changes > - DISPATCH-523 - Activation of an autolink does not result in issuing >credit to a blocked sender > > These are all stability-related issues. > > Thoughts? > > -Ted > >> Regards,Adel >> >>> Subject: Re: Testing failover on dispatcher/java-broker cluster >>> To: users@qpid.apache.org >>> From: tr...@redhat.com >>> Date: Fri, 23 Sep 2016 17:23:57 -0400 >>> >>> Hi Adel, >>> >>> A minor release is always possible. It's up to us, the community, to >>> decide whether and when to produce one. I'm in favor of releasing an >>> 0.6.2 with some small backports to fix bugs for users that want to stay >>> on Proton 0.12. >>> >>> -Ted >>> >>> On 09/23/2016 09:44 AM, Adel Boutros wrote: >>>> Hello Ted, >>>> Did you happen to have the time to check if a minor release is >>>> possible? >>>> Regards,Adel >>>> >>>>> From: adelbout...@live.com >>>>> To: users@qpid.apache.org >>>>> Subject: RE: Testing failover on dispatcher/java-broker cluster >>>>> Date: Tue, 20 Sep 2016 15:13:03 +0200 >>>>> >>>>> Hello Ted, >>>>> >>>>> I confirm the fix solved the issue. >>>>> >>>>> Would it be possible to do a 0.6.2 release? We cannot compile newer >>>>> versions of Proton (We currently use 0.12.2) due to lack of >>>>> resources from our side and we really need this fix for our tests. >>>>> >>>>> Regards, >>>>> Adel >>>>> >>>>>> Subject: Re: Testing failover on dispatcher/java-broker cluster >>>>>> To: users@qpid.apache.org >>>>>> From: tr...@redhat.com >>>>>> Date: Mon, 19 Sep 2016 12:18:23 -0400 >>>>>> >>>>>> Hi Adel, >>>>>> >>>>>> It's a one-liner and it applies cleanly to the 0.6.x branch. >>>>>> >>>>>> https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 >>>>>> >>>>>> -Ted >>>>>> >>>>>> >>>>>> On 09/19/2016 11:41 AM, Adel Boutros wrote: >>>>>>> Hello Ted, >>>>>>> >>>>>>> Antoine is on vacation so I will be taking over this task. >>>>>>> >>>>>>> Does this fix have any dependencies? We would like to apply it on >>
Re: Testing failover on dispatcher/java-broker cluster
Sorry, those Jira numbers and descriptions are mismatched. Here's the correct list: - DISPATCH-496 - Activation of an autolink does not result in issuing credit to a blocked sender - DISPATCH-505 - Eventual loss of credit on inter-router control links when the topology changes - DISPATCH-523 - Topology changes can cause in-flight deliveries to be stuck in the ingress router On 09/29/2016 10:35 AM, Ted Ross wrote: On 09/24/2016 05:32 AM, Adel Boutros wrote: We are indeed in favor of a minor release as long as the latest version is still 0.6.x and we are willing to re-launch our tests and give feedback on the release candidate once provided (It shouldn't take us more than a day to compile and test). Do you have a list of fixes in mind? I've identified three fixes that look like good candidates for 0.6.2: - DISPATCH-496 - Topology changes can cause in-flight deliveries to be stuck in the ingress router - DISPATCH-505 - Eventual loss of credit on inter-router control links when the topology changes - DISPATCH-523 - Activation of an autolink does not result in issuing credit to a blocked sender These are all stability-related issues. Thoughts? -Ted Regards,Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 23 Sep 2016 17:23:57 -0400 Hi Adel, A minor release is always possible. It's up to us, the community, to decide whether and when to produce one. I'm in favor of releasing an 0.6.2 with some small backports to fix bugs for users that want to stay on Proton 0.12. -Ted On 09/23/2016 09:44 AM, Adel Boutros wrote: Hello Ted, Did you happen to have the time to check if a minor release is possible? Regards,Adel From: adelbout...@live.com To: users@qpid.apache.org Subject: RE: Testing failover on dispatcher/java-broker cluster Date: Tue, 20 Sep 2016 15:13:03 +0200 Hello Ted, I confirm the fix solved the issue. Would it be possible to do a 0.6.2 release? We cannot compile newer versions of Proton (We currently use 0.12.2) due to lack of resources from our side and we really need this fix for our tests. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Mon, 19 Sep 2016 12:18:23 -0400 Hi Adel, It's a one-liner and it applies cleanly to the 0.6.x branch. https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 -Ted On 09/19/2016 11:41 AM, Adel Boutros wrote: Hello Ted, Antoine is on vacation so I will be taking over this task. Does this fix have any dependencies? We would like to apply it on 0.6.1 without other fixes because it seems the master branch requires proton 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at the time being. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 16 Sep 2016 16:53:05 -0400 Antoine, I think I know what that problem is. I belileve you've stumbled upon this issue: https://issues.apache.org/jira/browse/DISPATCH-496 Your second delivery, the one resulting in a timeout, is causing the inbound link to be blocked (i.e. it has undelivered messages). When the broker reattaches, the blocked links are supposed to become unblocked but they don't in the case of auto-links. This has been fixed on the master branch if you'd like to try applying the patch. -Ted On 09/15/2016 04:56 AM, Antoine Chevin wrote: Hi Ted, You’re right, the connection close looked strange before stopping of the broker. I manually added the annotation (# stopping the broker) and was wrong about the position of this one. I replayed the test and the connection close happens *after* the broker stop. I assume it is the broker that initiates it. I found something interesting. In my test, I always sent a message when the broker is down, expecting to get a JmsSendTimedOutException (waiting for the disposition frame). I assumed this was harmless. But it turns out this is not. When I don’t do that, I can send a message after the broker restart. So to sum up the experiment I did: * I use Wireshark between the JMS client and the dispatcher. * 1) Using JMS I establish a connection to the dispatcher and create a message producer (Wireshark: connection open -> attach) 2) I’m able to send a message to the broker through the dispatcher ( Wireshark: transfer -> disposition) 3) I stop the broker 4) With the same link, I send a message and I get a JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: transfer) 5) I restart the broker 6) With the same link, I try to send a message and I get a JmsSendTimedOutException for the same reason (waiting for the disposition frame) (Wireshark
Re: Testing failover on dispatcher/java-broker cluster
On 09/24/2016 05:32 AM, Adel Boutros wrote: We are indeed in favor of a minor release as long as the latest version is still 0.6.x and we are willing to re-launch our tests and give feedback on the release candidate once provided (It shouldn't take us more than a day to compile and test). Do you have a list of fixes in mind? I've identified three fixes that look like good candidates for 0.6.2: - DISPATCH-496 - Topology changes can cause in-flight deliveries to be stuck in the ingress router - DISPATCH-505 - Eventual loss of credit on inter-router control links when the topology changes - DISPATCH-523 - Activation of an autolink does not result in issuing credit to a blocked sender These are all stability-related issues. Thoughts? -Ted Regards,Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 23 Sep 2016 17:23:57 -0400 Hi Adel, A minor release is always possible. It's up to us, the community, to decide whether and when to produce one. I'm in favor of releasing an 0.6.2 with some small backports to fix bugs for users that want to stay on Proton 0.12. -Ted On 09/23/2016 09:44 AM, Adel Boutros wrote: Hello Ted, Did you happen to have the time to check if a minor release is possible? Regards,Adel From: adelbout...@live.com To: users@qpid.apache.org Subject: RE: Testing failover on dispatcher/java-broker cluster Date: Tue, 20 Sep 2016 15:13:03 +0200 Hello Ted, I confirm the fix solved the issue. Would it be possible to do a 0.6.2 release? We cannot compile newer versions of Proton (We currently use 0.12.2) due to lack of resources from our side and we really need this fix for our tests. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Mon, 19 Sep 2016 12:18:23 -0400 Hi Adel, It's a one-liner and it applies cleanly to the 0.6.x branch. https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 -Ted On 09/19/2016 11:41 AM, Adel Boutros wrote: Hello Ted, Antoine is on vacation so I will be taking over this task. Does this fix have any dependencies? We would like to apply it on 0.6.1 without other fixes because it seems the master branch requires proton 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at the time being. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 16 Sep 2016 16:53:05 -0400 Antoine, I think I know what that problem is. I belileve you've stumbled upon this issue: https://issues.apache.org/jira/browse/DISPATCH-496 Your second delivery, the one resulting in a timeout, is causing the inbound link to be blocked (i.e. it has undelivered messages). When the broker reattaches, the blocked links are supposed to become unblocked but they don't in the case of auto-links. This has been fixed on the master branch if you'd like to try applying the patch. -Ted On 09/15/2016 04:56 AM, Antoine Chevin wrote: Hi Ted, You’re right, the connection close looked strange before stopping of the broker. I manually added the annotation (# stopping the broker) and was wrong about the position of this one. I replayed the test and the connection close happens *after* the broker stop. I assume it is the broker that initiates it. I found something interesting. In my test, I always sent a message when the broker is down, expecting to get a JmsSendTimedOutException (waiting for the disposition frame). I assumed this was harmless. But it turns out this is not. When I don’t do that, I can send a message after the broker restart. So to sum up the experiment I did: * I use Wireshark between the JMS client and the dispatcher. * 1) Using JMS I establish a connection to the dispatcher and create a message producer (Wireshark: connection open -> attach) 2) I’m able to send a message to the broker through the dispatcher ( Wireshark: transfer -> disposition) 3) I stop the broker 4) With the same link, I send a message and I get a JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: transfer) 5) I restart the broker 6) With the same link, I try to send a message and I get a JmsSendTimedOutException for the same reason (waiting for the disposition frame) (Wireshark: transfer) If I skip step (4), I cannot reproduce step (6) and my messages arrive (Wireshark: transfer -> disposition) to the restarted broker. I hope it makes it clearer for you. Sorry for my rookie mistakes :-). Note: My colleague and I ran a small experiment to identify if the problem comes from JMS or the AMQP protocol. He changed the code of the java broker to not send the disposition frame one time out of two. We got these results: * I use Wireshark between the JMS
RE: Testing failover on dispatcher/java-broker cluster
We are indeed in favor of a minor release as long as the latest version is still 0.6.x and we are willing to re-launch our tests and give feedback on the release candidate once provided (It shouldn't take us more than a day to compile and test). Do you have a list of fixes in mind? Regards,Adel > Subject: Re: Testing failover on dispatcher/java-broker cluster > To: users@qpid.apache.org > From: tr...@redhat.com > Date: Fri, 23 Sep 2016 17:23:57 -0400 > > Hi Adel, > > A minor release is always possible. It's up to us, the community, to > decide whether and when to produce one. I'm in favor of releasing an > 0.6.2 with some small backports to fix bugs for users that want to stay > on Proton 0.12. > > -Ted > > On 09/23/2016 09:44 AM, Adel Boutros wrote: > > Hello Ted, > > Did you happen to have the time to check if a minor release is possible? > > Regards,Adel > > > >> From: adelbout...@live.com > >> To: users@qpid.apache.org > >> Subject: RE: Testing failover on dispatcher/java-broker cluster > >> Date: Tue, 20 Sep 2016 15:13:03 +0200 > >> > >> Hello Ted, > >> > >> I confirm the fix solved the issue. > >> > >> Would it be possible to do a 0.6.2 release? We cannot compile newer > >> versions of Proton (We currently use 0.12.2) due to lack of resources from > >> our side and we really need this fix for our tests. > >> > >> Regards, > >> Adel > >> > >>> Subject: Re: Testing failover on dispatcher/java-broker cluster > >>> To: users@qpid.apache.org > >>> From: tr...@redhat.com > >>> Date: Mon, 19 Sep 2016 12:18:23 -0400 > >>> > >>> Hi Adel, > >>> > >>> It's a one-liner and it applies cleanly to the 0.6.x branch. > >>> > >>> https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 > >>> > >>> -Ted > >>> > >>> > >>> On 09/19/2016 11:41 AM, Adel Boutros wrote: > >>>> Hello Ted, > >>>> > >>>> Antoine is on vacation so I will be taking over this task. > >>>> > >>>> Does this fix have any dependencies? We would like to apply it on 0.6.1 > >>>> without other fixes because it seems the master branch requires proton > >>>> 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at > >>>> the time being. > >>>> > >>>> Regards, > >>>> Adel > >>>> > >>>>> Subject: Re: Testing failover on dispatcher/java-broker cluster > >>>>> To: users@qpid.apache.org > >>>>> From: tr...@redhat.com > >>>>> Date: Fri, 16 Sep 2016 16:53:05 -0400 > >>>>> > >>>>> Antoine, > >>>>> > >>>>> I think I know what that problem is. I belileve you've stumbled upon > >>>>> this issue: > >>>>> > >>>>> https://issues.apache.org/jira/browse/DISPATCH-496 > >>>>> > >>>>> Your second delivery, the one resulting in a timeout, is causing the > >>>>> inbound link to be blocked (i.e. it has undelivered messages). When the > >>>>> broker reattaches, the blocked links are supposed to become unblocked > >>>>> but they don't in the case of auto-links. > >>>>> > >>>>> This has been fixed on the master branch if you'd like to try applying > >>>>> the patch. > >>>>> > >>>>> -Ted > >>>>> > >>>>> On 09/15/2016 04:56 AM, Antoine Chevin wrote: > >>>>>> Hi Ted, > >>>>>> > >>>>>> You’re right, the connection close looked strange before stopping of > >>>>>> the > >>>>>> broker. I manually added the annotation (# stopping the broker) and was > >>>>>> wrong about the position of this one. I replayed the test and the > >>>>>> connection close happens *after* the broker stop. I assume it is the > >>>>>> broker > >>>>>> that initiates it. > >>>>>> > >>>>>> I found something interesting. In my test, I always sent a message > >>>>>> when the > >>>>>> broker is down, expecting to get a JmsSendTimedOutException (waiting > >>>>>
Re: Testing failover on dispatcher/java-broker cluster
Hi Adel, A minor release is always possible. It's up to us, the community, to decide whether and when to produce one. I'm in favor of releasing an 0.6.2 with some small backports to fix bugs for users that want to stay on Proton 0.12. -Ted On 09/23/2016 09:44 AM, Adel Boutros wrote: Hello Ted, Did you happen to have the time to check if a minor release is possible? Regards,Adel From: adelbout...@live.com To: users@qpid.apache.org Subject: RE: Testing failover on dispatcher/java-broker cluster Date: Tue, 20 Sep 2016 15:13:03 +0200 Hello Ted, I confirm the fix solved the issue. Would it be possible to do a 0.6.2 release? We cannot compile newer versions of Proton (We currently use 0.12.2) due to lack of resources from our side and we really need this fix for our tests. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Mon, 19 Sep 2016 12:18:23 -0400 Hi Adel, It's a one-liner and it applies cleanly to the 0.6.x branch. https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 -Ted On 09/19/2016 11:41 AM, Adel Boutros wrote: Hello Ted, Antoine is on vacation so I will be taking over this task. Does this fix have any dependencies? We would like to apply it on 0.6.1 without other fixes because it seems the master branch requires proton 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at the time being. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 16 Sep 2016 16:53:05 -0400 Antoine, I think I know what that problem is. I belileve you've stumbled upon this issue: https://issues.apache.org/jira/browse/DISPATCH-496 Your second delivery, the one resulting in a timeout, is causing the inbound link to be blocked (i.e. it has undelivered messages). When the broker reattaches, the blocked links are supposed to become unblocked but they don't in the case of auto-links. This has been fixed on the master branch if you'd like to try applying the patch. -Ted On 09/15/2016 04:56 AM, Antoine Chevin wrote: Hi Ted, You’re right, the connection close looked strange before stopping of the broker. I manually added the annotation (# stopping the broker) and was wrong about the position of this one. I replayed the test and the connection close happens *after* the broker stop. I assume it is the broker that initiates it. I found something interesting. In my test, I always sent a message when the broker is down, expecting to get a JmsSendTimedOutException (waiting for the disposition frame). I assumed this was harmless. But it turns out this is not. When I don’t do that, I can send a message after the broker restart. So to sum up the experiment I did: * I use Wireshark between the JMS client and the dispatcher. * 1) Using JMS I establish a connection to the dispatcher and create a message producer (Wireshark: connection open -> attach) 2) I’m able to send a message to the broker through the dispatcher ( Wireshark: transfer -> disposition) 3) I stop the broker 4) With the same link, I send a message and I get a JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: transfer) 5) I restart the broker 6) With the same link, I try to send a message and I get a JmsSendTimedOutException for the same reason (waiting for the disposition frame) (Wireshark: transfer) If I skip step (4), I cannot reproduce step (6) and my messages arrive (Wireshark: transfer -> disposition) to the restarted broker. I hope it makes it clearer for you. Sorry for my rookie mistakes :-). Note: My colleague and I ran a small experiment to identify if the problem comes from JMS or the AMQP protocol. He changed the code of the java broker to not send the disposition frame one time out of two. We got these results: * I use Wireshark between the JMS client and the patched broker. * 1) Using JMS I establish a connection to the patched broker and create a message producer (Wireshark: connection open -> attach) 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition) 3) I send a message to the broker which drops the disposition frame. I get a send timeout in JMS (Wireshark: transfer) 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition). It works fine. We assume that there is something going on in the dispatcher. Thanks, Antoine - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additio
RE: Testing failover on dispatcher/java-broker cluster
Hello Ted, Did you happen to have the time to check if a minor release is possible? Regards,Adel > From: adelbout...@live.com > To: users@qpid.apache.org > Subject: RE: Testing failover on dispatcher/java-broker cluster > Date: Tue, 20 Sep 2016 15:13:03 +0200 > > Hello Ted, > > I confirm the fix solved the issue. > > Would it be possible to do a 0.6.2 release? We cannot compile newer versions > of Proton (We currently use 0.12.2) due to lack of resources from our side > and we really need this fix for our tests. > > Regards, > Adel > > > Subject: Re: Testing failover on dispatcher/java-broker cluster > > To: users@qpid.apache.org > > From: tr...@redhat.com > > Date: Mon, 19 Sep 2016 12:18:23 -0400 > > > > Hi Adel, > > > > It's a one-liner and it applies cleanly to the 0.6.x branch. > > > > https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 > > > > -Ted > > > > > > On 09/19/2016 11:41 AM, Adel Boutros wrote: > > > Hello Ted, > > > > > > Antoine is on vacation so I will be taking over this task. > > > > > > Does this fix have any dependencies? We would like to apply it on 0.6.1 > > > without other fixes because it seems the master branch requires proton > > > 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at > > > the time being. > > > > > > Regards, > > > Adel > > > > > >> Subject: Re: Testing failover on dispatcher/java-broker cluster > > >> To: users@qpid.apache.org > > >> From: tr...@redhat.com > > >> Date: Fri, 16 Sep 2016 16:53:05 -0400 > > >> > > >> Antoine, > > >> > > >> I think I know what that problem is. I belileve you've stumbled upon > > >> this issue: > > >> > > >> https://issues.apache.org/jira/browse/DISPATCH-496 > > >> > > >> Your second delivery, the one resulting in a timeout, is causing the > > >> inbound link to be blocked (i.e. it has undelivered messages). When the > > >> broker reattaches, the blocked links are supposed to become unblocked > > >> but they don't in the case of auto-links. > > >> > > >> This has been fixed on the master branch if you'd like to try applying > > >> the patch. > > >> > > >> -Ted > > >> > > >> On 09/15/2016 04:56 AM, Antoine Chevin wrote: > > >>> Hi Ted, > > >>> > > >>> You’re right, the connection close looked strange before stopping of the > > >>> broker. I manually added the annotation (# stopping the broker) and was > > >>> wrong about the position of this one. I replayed the test and the > > >>> connection close happens *after* the broker stop. I assume it is the > > >>> broker > > >>> that initiates it. > > >>> > > >>> I found something interesting. In my test, I always sent a message when > > >>> the > > >>> broker is down, expecting to get a JmsSendTimedOutException (waiting for > > >>> the disposition frame). I assumed this was harmless. But it turns out > > >>> this > > >>> is not. When I don’t do that, I can send a message after the broker > > >>> restart. So to sum up the experiment I did: > > >>> > > >>> * I use Wireshark between the JMS client and the dispatcher. * > > >>> > > >>> 1) Using JMS I establish a connection to the dispatcher and create > > >>> a > > >>> message producer (Wireshark: connection open -> attach) > > >>> 2) I’m able to send a message to the broker through the dispatcher > > >>> ( > > >>> Wireshark: transfer -> disposition) > > >>> 3) I stop the broker > > >>> 4) With the same link, I send a message and I get a > > >>> JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: > > >>> transfer) > > >>> 5) I restart the broker > > >>> 6) With the same link, I try to send a message and I get a > > >>> JmsSendTimedOutException for the same reason (waiting for the > > >>> disposition > > >>> frame) (Wireshark: transfer) > > >>> > > >>> If I skip step (4), I cannot reproduce step (6) and my messages arrive > > >>> (W
RE: Testing failover on dispatcher/java-broker cluster
Hello Ted, I confirm the fix solved the issue. Would it be possible to do a 0.6.2 release? We cannot compile newer versions of Proton (We currently use 0.12.2) due to lack of resources from our side and we really need this fix for our tests. Regards, Adel > Subject: Re: Testing failover on dispatcher/java-broker cluster > To: users@qpid.apache.org > From: tr...@redhat.com > Date: Mon, 19 Sep 2016 12:18:23 -0400 > > Hi Adel, > > It's a one-liner and it applies cleanly to the 0.6.x branch. > > https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 > > -Ted > > > On 09/19/2016 11:41 AM, Adel Boutros wrote: > > Hello Ted, > > > > Antoine is on vacation so I will be taking over this task. > > > > Does this fix have any dependencies? We would like to apply it on 0.6.1 > > without other fixes because it seems the master branch requires proton > > 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at > > the time being. > > > > Regards, > > Adel > > > >> Subject: Re: Testing failover on dispatcher/java-broker cluster > >> To: users@qpid.apache.org > >> From: tr...@redhat.com > >> Date: Fri, 16 Sep 2016 16:53:05 -0400 > >> > >> Antoine, > >> > >> I think I know what that problem is. I belileve you've stumbled upon > >> this issue: > >> > >> https://issues.apache.org/jira/browse/DISPATCH-496 > >> > >> Your second delivery, the one resulting in a timeout, is causing the > >> inbound link to be blocked (i.e. it has undelivered messages). When the > >> broker reattaches, the blocked links are supposed to become unblocked > >> but they don't in the case of auto-links. > >> > >> This has been fixed on the master branch if you'd like to try applying > >> the patch. > >> > >> -Ted > >> > >> On 09/15/2016 04:56 AM, Antoine Chevin wrote: > >>> Hi Ted, > >>> > >>> You’re right, the connection close looked strange before stopping of the > >>> broker. I manually added the annotation (# stopping the broker) and was > >>> wrong about the position of this one. I replayed the test and the > >>> connection close happens *after* the broker stop. I assume it is the > >>> broker > >>> that initiates it. > >>> > >>> I found something interesting. In my test, I always sent a message when > >>> the > >>> broker is down, expecting to get a JmsSendTimedOutException (waiting for > >>> the disposition frame). I assumed this was harmless. But it turns out this > >>> is not. When I don’t do that, I can send a message after the broker > >>> restart. So to sum up the experiment I did: > >>> > >>> * I use Wireshark between the JMS client and the dispatcher. * > >>> > >>> 1) Using JMS I establish a connection to the dispatcher and create a > >>> message producer (Wireshark: connection open -> attach) > >>> 2) I’m able to send a message to the broker through the dispatcher ( > >>> Wireshark: transfer -> disposition) > >>> 3) I stop the broker > >>> 4) With the same link, I send a message and I get a > >>> JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: > >>> transfer) > >>> 5) I restart the broker > >>> 6) With the same link, I try to send a message and I get a > >>> JmsSendTimedOutException for the same reason (waiting for the disposition > >>> frame) (Wireshark: transfer) > >>> > >>> If I skip step (4), I cannot reproduce step (6) and my messages arrive > >>> (Wireshark: transfer -> disposition) to the restarted broker. > >>> > >>> I hope it makes it clearer for you. Sorry for my rookie mistakes :-). > >>> > >>> Note: My colleague and I ran a small experiment to identify if the problem > >>> comes from JMS or the AMQP protocol. He changed the code of the java > >>> broker > >>> to not send the disposition frame one time out of two. > >>> > >>> We got these results: > >>> > >>> * I use Wireshark between the JMS client and the patched broker. * > >>> > >>> 1) Using JMS I establish a connection to the patched broker and create a > >>> message producer (Wireshark: connection open -> attach) > >>> 2) I send a message to
Re: Testing failover on dispatcher/java-broker cluster
Hi Adel, It's a one-liner and it applies cleanly to the 0.6.x branch. https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407 -Ted On 09/19/2016 11:41 AM, Adel Boutros wrote: Hello Ted, Antoine is on vacation so I will be taking over this task. Does this fix have any dependencies? We would like to apply it on 0.6.1 without other fixes because it seems the master branch requires proton 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at the time being. Regards, Adel Subject: Re: Testing failover on dispatcher/java-broker cluster To: users@qpid.apache.org From: tr...@redhat.com Date: Fri, 16 Sep 2016 16:53:05 -0400 Antoine, I think I know what that problem is. I belileve you've stumbled upon this issue: https://issues.apache.org/jira/browse/DISPATCH-496 Your second delivery, the one resulting in a timeout, is causing the inbound link to be blocked (i.e. it has undelivered messages). When the broker reattaches, the blocked links are supposed to become unblocked but they don't in the case of auto-links. This has been fixed on the master branch if you'd like to try applying the patch. -Ted On 09/15/2016 04:56 AM, Antoine Chevin wrote: Hi Ted, You’re right, the connection close looked strange before stopping of the broker. I manually added the annotation (# stopping the broker) and was wrong about the position of this one. I replayed the test and the connection close happens *after* the broker stop. I assume it is the broker that initiates it. I found something interesting. In my test, I always sent a message when the broker is down, expecting to get a JmsSendTimedOutException (waiting for the disposition frame). I assumed this was harmless. But it turns out this is not. When I don’t do that, I can send a message after the broker restart. So to sum up the experiment I did: * I use Wireshark between the JMS client and the dispatcher. * 1) Using JMS I establish a connection to the dispatcher and create a message producer (Wireshark: connection open -> attach) 2) I’m able to send a message to the broker through the dispatcher ( Wireshark: transfer -> disposition) 3) I stop the broker 4) With the same link, I send a message and I get a JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: transfer) 5) I restart the broker 6) With the same link, I try to send a message and I get a JmsSendTimedOutException for the same reason (waiting for the disposition frame) (Wireshark: transfer) If I skip step (4), I cannot reproduce step (6) and my messages arrive (Wireshark: transfer -> disposition) to the restarted broker. I hope it makes it clearer for you. Sorry for my rookie mistakes :-). Note: My colleague and I ran a small experiment to identify if the problem comes from JMS or the AMQP protocol. He changed the code of the java broker to not send the disposition frame one time out of two. We got these results: * I use Wireshark between the JMS client and the patched broker. * 1) Using JMS I establish a connection to the patched broker and create a message producer (Wireshark: connection open -> attach) 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition) 3) I send a message to the broker which drops the disposition frame. I get a send timeout in JMS (Wireshark: transfer) 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition). It works fine. We assume that there is something going on in the dispatcher. Thanks, Antoine - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
RE: Testing failover on dispatcher/java-broker cluster
Hello Ted, Antoine is on vacation so I will be taking over this task. Does this fix have any dependencies? We would like to apply it on 0.6.1 without other fixes because it seems the master branch requires proton 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at the time being. Regards, Adel > Subject: Re: Testing failover on dispatcher/java-broker cluster > To: users@qpid.apache.org > From: tr...@redhat.com > Date: Fri, 16 Sep 2016 16:53:05 -0400 > > Antoine, > > I think I know what that problem is. I belileve you've stumbled upon > this issue: > > https://issues.apache.org/jira/browse/DISPATCH-496 > > Your second delivery, the one resulting in a timeout, is causing the > inbound link to be blocked (i.e. it has undelivered messages). When the > broker reattaches, the blocked links are supposed to become unblocked > but they don't in the case of auto-links. > > This has been fixed on the master branch if you'd like to try applying > the patch. > > -Ted > > On 09/15/2016 04:56 AM, Antoine Chevin wrote: > > Hi Ted, > > > > You’re right, the connection close looked strange before stopping of the > > broker. I manually added the annotation (# stopping the broker) and was > > wrong about the position of this one. I replayed the test and the > > connection close happens *after* the broker stop. I assume it is the broker > > that initiates it. > > > > I found something interesting. In my test, I always sent a message when the > > broker is down, expecting to get a JmsSendTimedOutException (waiting for > > the disposition frame). I assumed this was harmless. But it turns out this > > is not. When I don’t do that, I can send a message after the broker > > restart. So to sum up the experiment I did: > > > > * I use Wireshark between the JMS client and the dispatcher. * > > > > 1) Using JMS I establish a connection to the dispatcher and create a > > message producer (Wireshark: connection open -> attach) > > 2) I’m able to send a message to the broker through the dispatcher ( > > Wireshark: transfer -> disposition) > > 3) I stop the broker > > 4) With the same link, I send a message and I get a > > JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: > > transfer) > > 5) I restart the broker > > 6) With the same link, I try to send a message and I get a > > JmsSendTimedOutException for the same reason (waiting for the disposition > > frame) (Wireshark: transfer) > > > > If I skip step (4), I cannot reproduce step (6) and my messages arrive > > (Wireshark: transfer -> disposition) to the restarted broker. > > > > I hope it makes it clearer for you. Sorry for my rookie mistakes :-). > > > > Note: My colleague and I ran a small experiment to identify if the problem > > comes from JMS or the AMQP protocol. He changed the code of the java broker > > to not send the disposition frame one time out of two. > > > > We got these results: > > > > * I use Wireshark between the JMS client and the patched broker. * > > > > 1) Using JMS I establish a connection to the patched broker and create a > > message producer (Wireshark: connection open -> attach) > > 2) I send a message to the broker and it replies with the disposition > > frame (Wireshark: transfer -> disposition) > > 3) I send a message to the broker which drops the disposition frame. I get > > a send timeout in JMS (Wireshark: transfer) > > 2) I send a message to the broker and it replies with the disposition frame > > (Wireshark: transfer -> disposition). It works fine. > > > > We assume that there is something going on in the dispatcher. > > > > > > Thanks, > > Antoine > > > > - > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org > For additional commands, e-mail: users-h...@qpid.apache.org >
Re: Testing failover on dispatcher/java-broker cluster
Antoine, I think I know what that problem is. I belileve you've stumbled upon this issue: https://issues.apache.org/jira/browse/DISPATCH-496 Your second delivery, the one resulting in a timeout, is causing the inbound link to be blocked (i.e. it has undelivered messages). When the broker reattaches, the blocked links are supposed to become unblocked but they don't in the case of auto-links. This has been fixed on the master branch if you'd like to try applying the patch. -Ted On 09/15/2016 04:56 AM, Antoine Chevin wrote: Hi Ted, You’re right, the connection close looked strange before stopping of the broker. I manually added the annotation (# stopping the broker) and was wrong about the position of this one. I replayed the test and the connection close happens *after* the broker stop. I assume it is the broker that initiates it. I found something interesting. In my test, I always sent a message when the broker is down, expecting to get a JmsSendTimedOutException (waiting for the disposition frame). I assumed this was harmless. But it turns out this is not. When I don’t do that, I can send a message after the broker restart. So to sum up the experiment I did: * I use Wireshark between the JMS client and the dispatcher. * 1) Using JMS I establish a connection to the dispatcher and create a message producer (Wireshark: connection open -> attach) 2) I’m able to send a message to the broker through the dispatcher ( Wireshark: transfer -> disposition) 3) I stop the broker 4) With the same link, I send a message and I get a JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: transfer) 5) I restart the broker 6) With the same link, I try to send a message and I get a JmsSendTimedOutException for the same reason (waiting for the disposition frame) (Wireshark: transfer) If I skip step (4), I cannot reproduce step (6) and my messages arrive (Wireshark: transfer -> disposition) to the restarted broker. I hope it makes it clearer for you. Sorry for my rookie mistakes :-). Note: My colleague and I ran a small experiment to identify if the problem comes from JMS or the AMQP protocol. He changed the code of the java broker to not send the disposition frame one time out of two. We got these results: * I use Wireshark between the JMS client and the patched broker. * 1) Using JMS I establish a connection to the patched broker and create a message producer (Wireshark: connection open -> attach) 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition) 3) I send a message to the broker which drops the disposition frame. I get a send timeout in JMS (Wireshark: transfer) 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition). It works fine. We assume that there is something going on in the dispatcher. Thanks, Antoine - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: Testing failover on dispatcher/java-broker cluster
Hi Ted, Do you have any insights into that problem? Thanks, Antoine > Hi Ted, > > You’re right, the connection close looked strange before stopping of the broker. I manually added the annotation (# stopping the broker) and was wrong about the position of this one. I replayed the test and the connection close happens *after* the broker stop. I assume it is the broker that initiates it. > > I found something interesting. In my test, I always sent a message when the broker is down, expecting to get a JmsSendTimedOutException (waiting for the disposition frame). I assumed this was harmless. But it turns out this is not. When I don’t do that, I can send a message after the broker restart. So to sum up the experiment I did: > > * I use Wireshark between the JMS client and the dispatcher. * > > 1) Using JMS I establish a connection to the dispatcher and create a > message producer (Wireshark: connection open -> attach) > 2) I’m able to send a message to the broker through the dispatcher ( > Wireshark: transfer -> disposition) > 3) I stop the broker > 4) With the same link, I send a message and I get a > JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: > transfer) > 5) I restart the broker > 6) With the same link, I try to send a message and I get a > JmsSendTimedOutException for the same reason (waiting for the disposition > frame) (Wireshark: transfer) > > If I skip step (4), I cannot reproduce step (6) and my messages arrive > (Wireshark: transfer -> disposition) to the restarted broker. > > I hope it makes it clearer for you. Sorry for my rookie mistakes :-). > > Note: My colleague and I ran a small experiment to identify if the problem comes from JMS or the AMQP protocol. He changed the code of the java broker to not send the disposition frame one time out of two. > > We got these results: > > * I use Wireshark between the JMS client and the patched broker. * > > 1) Using JMS I establish a connection to the patched broker and create a message producer (Wireshark: connection open -> attach) > 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition) > 3) I send a message to the broker which drops the disposition frame. I get a send timeout in JMS (Wireshark: transfer) > 2) I send a message to the broker and it replies with the disposition frame > (Wireshark: transfer -> disposition). It works fine. > > We assume that there is something going on in the dispatcher. > > > Thanks, > Antoine
Re: Testing failover on dispatcher/java-broker cluster
Hi Ted, You’re right, the connection close looked strange before stopping of the broker. I manually added the annotation (# stopping the broker) and was wrong about the position of this one. I replayed the test and the connection close happens *after* the broker stop. I assume it is the broker that initiates it. I found something interesting. In my test, I always sent a message when the broker is down, expecting to get a JmsSendTimedOutException (waiting for the disposition frame). I assumed this was harmless. But it turns out this is not. When I don’t do that, I can send a message after the broker restart. So to sum up the experiment I did: * I use Wireshark between the JMS client and the dispatcher. * 1) Using JMS I establish a connection to the dispatcher and create a message producer (Wireshark: connection open -> attach) 2) I’m able to send a message to the broker through the dispatcher ( Wireshark: transfer -> disposition) 3) I stop the broker 4) With the same link, I send a message and I get a JmsSendTimedOutException (waiting for the disposition frame) (Wireshark: transfer) 5) I restart the broker 6) With the same link, I try to send a message and I get a JmsSendTimedOutException for the same reason (waiting for the disposition frame) (Wireshark: transfer) If I skip step (4), I cannot reproduce step (6) and my messages arrive (Wireshark: transfer -> disposition) to the restarted broker. I hope it makes it clearer for you. Sorry for my rookie mistakes :-). Note: My colleague and I ran a small experiment to identify if the problem comes from JMS or the AMQP protocol. He changed the code of the java broker to not send the disposition frame one time out of two. We got these results: * I use Wireshark between the JMS client and the patched broker. * 1) Using JMS I establish a connection to the patched broker and create a message producer (Wireshark: connection open -> attach) 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition) 3) I send a message to the broker which drops the disposition frame. I get a send timeout in JMS (Wireshark: transfer) 2) I send a message to the broker and it replies with the disposition frame (Wireshark: transfer -> disposition). It works fine. We assume that there is something going on in the dispatcher. Thanks, Antoine
Re: Testing failover on dispatcher/java-broker cluster
Hi Antoine, In the broker traces, I see connection shutdown after the transfer but before you shut down the broker. Do you know what is happening there? What was the disposition of the delivery? -Ted On 09/14/2016 09:12 AM, Antoine Chevin wrote: Hello Qpid community, I’m testing the resilience of a dispatcher/broker infrastructure and I noticed the following behavior: I run a test with one JMS client connected to a dispatcher, which is connected to a broker. 1) Using JMS I establish a connection to the dispatcher and create a message producer 2) I’m able to send a message to the broker through the dispatcher 3) I stop and restart the broker 4) I cannot send any messages using the message producer I created before. 5) If a recreate a MessageProducer (new AMQP link), the message arrives to the broker In the failing scenario 4, I noticed using Wireshark that the dispatcher does not send any messages to the broker. So I deduced that the broker is not responsible for this behavior. *Is it an expected behavior? What can I change in the dispatcher/JMS configuration to avoid the failure?* You can find attached the Wireshark logs I produced from this experiment: - JMS – dispatcher – reuse sender: logs between JMS and the dispatcher when I reuse the message producer after the restart - JMS – dispatcher – new sender: logs between JMS and the dispatcher when I create a new message producer after the restart - dispatcher – broker – reuse sender: logs between the dispatcher and the broker, I reuse the message producer - dispatcher – broker – reuse sender: logs between the dispatcher and the broker, I create a new message producer I’m using qpid-dispatch 0.6.0, JMS 0.9.0 and qpid-java-broker 6.0.1. Thanks, Best regards, Antoine - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org - To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org
Re: Testing failover on dispatcher/java-broker cluster
Two details I did not mention in my previous mail: - I use the same JMS connection/session to recreate a MessageProducer - I noticed that the dispatcher reconnects correctly to the broker and the autolinks are 'active'. This is the dispatcher config that I used for this test: router { id: router.5672 mode: interior worker-threads: 4 } listener { host: 0.0.0.0 port: 5672 role: normal saslMechanisms: ANONYMOUS requireSsl: no authenticatePeer: no } address { name: perf.topic.addr prefix: perf.topic waypoint: true } connector { role: route-container addr: localhost port: 10101 name: localhost.broker.10101.connector } autoLink { addr: perf.topic dir: out connection: localhost.broker.10101.connector name: localhost.broker.10101.perf.topic.out } Thanks, Antoine
Testing failover on dispatcher/java-broker cluster
Hello Qpid community, I’m testing the resilience of a dispatcher/broker infrastructure and I noticed the following behavior: I run a test with one JMS client connected to a dispatcher, which is connected to a broker. 1) Using JMS I establish a connection to the dispatcher and create a message producer 2) I’m able to send a message to the broker through the dispatcher 3) I stop and restart the broker 4) I cannot send any messages using the message producer I created before. 5) If a recreate a MessageProducer (new AMQP link), the message arrives to the broker In the failing scenario 4, I noticed using Wireshark that the dispatcher does not send any messages to the broker. So I deduced that the broker is not responsible for this behavior. *Is it an expected behavior? What can I change in the dispatcher/JMS configuration to avoid the failure?* You can find attached the Wireshark logs I produced from this experiment: - JMS – dispatcher – reuse sender: logs between JMS and the dispatcher when I reuse the message producer after the restart - JMS – dispatcher – new sender: logs between JMS and the dispatcher when I create a new message producer after the restart - dispatcher – broker – reuse sender: logs between the dispatcher and the broker, I reuse the message producer - dispatcher – broker – reuse sender: logs between the dispatcher and the broker, I create a new message producer I’m using qpid-dispatch 0.6.0, JMS 0.9.0 and qpid-java-broker 6.0.1. Thanks, Best regards, Antoine src.ip = JMS ip dst.ip = dispatcher ip # Client connection SourceDestinationProtocol Length Info src.ip dst.ipTCP 66 53505 â 5672 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=2 SACK_PERM=1 dst.ipsrc.ip TCP 66 5672 â 53505 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1 WS=128 src.ip dst.ipTCP 54 53505 â 5672 [ACK] Seq=1 Ack=1 Win=65536 Len=0 src.ip dst.ipAMQP 62 Protocol-Header 1-0-0 dst.ipsrc.ip TCP 60 5672 â 53505 [ACK] Seq=1 Ack=9 Win=29312 Len=0 dst.ipsrc.ip AMQP 105Protocol-Header 1-0-0 sasl.mechanisms src.ip dst.ipAMQP 91 sasl.init dst.ipsrc.ip TCP 60 5672 â 53505 [ACK] Seq=52 Ack=46 Win=29312 Len=0 dst.ipsrc.ip AMQP 76 sasl.outcome src.ip dst.ipAMQP 300Protocol-Header 1-0-0 open dst.ipsrc.ip AMQP 185Protocol-Header 1-0-0 open # Creating the Session src.ip dst.ipAMQP 86 begin dst.ipsrc.ip AMQP 86 begin src.ip dst.ipAMQP 86 begin dst.ipsrc.ip AMQP 86 begin # Creating MessageProducer src.ip dst.ipAMQP 313attach dst.ipsrc.ip AMQP 374attach flow # Sending a message (success) src.ip dst.ipAMQP 405transfer dst.ipsrc.ip AMQP 131flow disposition src.ip dst.ipTCP 54 53505 â 5672 [ACK] Seq=966 Ack=666 Win=64870 Len=0 src.ip dst.ipAMQP 62 (empty) dst.ipsrc.ip TCP 60 5672 â 53505 [ACK] Seq=666 Ack=974 Win=32512 Len=0 src.ip dst.ipAMQP 62 (empty) dst.ipsrc.ip TCP 60 5672 â 53505 [ACK] Seq=666 Ack=982 Win=32512 Len=0 dst.ipsrc.ip AMQP 62 (empty) # Stopping broker src.ip dst.ipTCP 54 53505 â 5672 [ACK] Seq=982 Ack=674 Win=64862 Len=0 src.ip dst.ipAMQP 62 (empty) dst.ipsrc.ip TCP 60 5672 â 53505 [ACK] Seq=674 Ack=990 Win=32512 Len=0 # Trying to send a message (timeout) src.ip dst.ipAMQP 406transfer dst.ipsrc.ip TCP 60 5672 â 53505 [ACK] Seq=674 Ack=1342 Win=33536 Len=0 src.ip dst.ipAMQP 62 (empty) # Restarting broker dst.ipsrc.ip TCP 60 5672 â 53505 [ACK] Seq=674 Ack=1350 Win=33536 Len=0 # Trying to send a message (timeout) src.ip dst.ipAMQP 405transfer dst.ipsrc.ip TCP 60 5672 â 53505 [ACK] Seq=674 Ack=1701 Win=34560 Len=0 dst.ipsrc.ip AMQP 62 (empty) src.ip dst.ipTCP 54 53505 â 5672 [ACK] Seq=1701 Ack=682 Win=64854 Len=0 src.ip dst.ipTCP 54 53505 â 5672 [RST, ACK] Seq=1701 Ack=682 Win=0 Len=0 dst.ip = dispatcher ip src.ip = broker ip # Connecting t