Re: [tipc-discussion] tipc nametable update problem
Yes. And I issued a bug report and followed it up, so it is now also fixed in kernel 4.4, which will be in mainline 16.04 very soon. BR ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Tuesday, 26 April, 2016 13:53 > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: GUNA (gbala...@gmail.com); Ying Xue > Subject: RE: [tipc-discussion] tipc nametable update problem > > Testing latest Ubuntu ppa kernel (4.6.0rc5), looks like the issues have been > fixed! > Tahnks!. > > (Hopefully that hits mainline 16.04 kernel too). > > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Wednesday, April 06, 2016 1:16 PM > To: Jon Maloy; Rune Torgersen; 'Jon Maloy'; tipc- > discuss...@lists.sourceforge.net > Cc: GUNA (gbala...@gmail.com); Ying Xue > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > -Original Message- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: Wednesday, 06 April, 2016 14:07 > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: GUNA (gbala...@gmail.com); Ying Xue > > Subject: Re: [tipc-discussion] tipc nametable update problem > > > > Hi Rune, > > I finally found what it is, - a missing buffer linearization, and it turns > > out to be a > > problem that has already been solved. I checked it in last November, and is > > present in kernel 4.5, but not in 4.4. > > The reason I didn't realize this right away was that found and solved this > > as a > > UDP-bearer specific problem, and posted the patch as such. Since UDP support > is > > relatively new in TIPC, I didn't realize the need to have this correction > > applied > > further back. > > The solution as such was in generic code, so it really is solved even for your > Ethernet based system. > Sorry for not being clear enough about this. > > ///j > > > I will create a new patch and try to get it applied on the "stable" branch. > > > > Regards > > ///jon > > > > > > > -Original Message- > > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > > Sent: Tuesday, 05 April, 2016 18:10 > > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > > > (ying.x...@gmail.com); Ying Xue > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > So after trying various things to get it to fail again, I finally got it. > > > Even got an > > > corrupted update... > > > > > > SO I rebooted 1.1.1, and started TIPC when it was up and running. > > > Only 1.1.1 ports show in tipc-config -nt (see log_1_1_1.txt). > > > Then I rebooted 1.1.2 (ended up doing that twice I think). > > > > > > Normally I could not talk from 1.1.1 to 1.1.2, but this time that worked > > > fine, > but > > > 1.1.2 got bad updates after reboot, and 1.1.2 does NOT see the ports open > > > on > > > 1.1.1. > > > (see log_1_1_2.txt and 1_1_1.txt. Last tipc-config -nt is take within 30 > > > second > of > > > each other) > > > > > > The port I was testing was 104,65537 and 104,131073. > > > > > > This was done using Ubuntu 4.4.0-15 kernel (4.4.0-15-generic) > > > > > > -Original Message- > > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > > Sent: Tuesday, April 05, 2016 11:27 AM > > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > > > (ying.x...@gmail.com); Ying Xue > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > > -Original Message- > > > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > > > Sent: Tuesday, 05 April, 2016 12:12 > > > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue > Ying > > > > (ying.x...@gmail.com); Ying Xue > > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > I should be able to do more testing. > > > > > > > > I do not know for sure that the mappings were missing or not before > reboot. > > > > If I had restarted applications, then the
Re: [tipc-discussion] tipc nametable update problem
Testing latest Ubuntu ppa kernel (4.6.0rc5), looks like the issues have been fixed! Tahnks!. (Hopefully that hits mainline 16.04 kernel too). -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: Wednesday, April 06, 2016 1:16 PM To: Jon Maloy; Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net Cc: GUNA (gbala...@gmail.com); Ying Xue Subject: RE: [tipc-discussion] tipc nametable update problem > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Wednesday, 06 April, 2016 14:07 > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: GUNA (gbala...@gmail.com); Ying Xue > Subject: Re: [tipc-discussion] tipc nametable update problem > > Hi Rune, > I finally found what it is, - a missing buffer linearization, and it turns > out to be a > problem that has already been solved. I checked it in last November, and is > present in kernel 4.5, but not in 4.4. > The reason I didn't realize this right away was that found and solved this as > a > UDP-bearer specific problem, and posted the patch as such. Since UDP support > is > relatively new in TIPC, I didn't realize the need to have this correction > applied > further back. The solution as such was in generic code, so it really is solved even for your Ethernet based system. Sorry for not being clear enough about this. ///j > I will create a new patch and try to get it applied on the "stable" branch. > > Regards > ///jon > > > > -Original Message- > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > Sent: Tuesday, 05 April, 2016 18:10 > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > > (ying.x...@gmail.com); Ying Xue > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > So after trying various things to get it to fail again, I finally got it. > > Even got an > > corrupted update... > > > > SO I rebooted 1.1.1, and started TIPC when it was up and running. > > Only 1.1.1 ports show in tipc-config -nt (see log_1_1_1.txt). > > Then I rebooted 1.1.2 (ended up doing that twice I think). > > > > Normally I could not talk from 1.1.1 to 1.1.2, but this time that worked > > fine, but > > 1.1.2 got bad updates after reboot, and 1.1.2 does NOT see the ports open on > > 1.1.1. > > (see log_1_1_2.txt and 1_1_1.txt. Last tipc-config -nt is take within 30 > > second of > > each other) > > > > The port I was testing was 104,65537 and 104,131073. > > > > This was done using Ubuntu 4.4.0-15 kernel (4.4.0-15-generic) > > > > -Original Message- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: Tuesday, April 05, 2016 11:27 AM > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > > (ying.x...@gmail.com); Ying Xue > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > > -Original Message- > > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > > Sent: Tuesday, 05 April, 2016 12:12 > > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > > > (ying.x...@gmail.com); Ying Xue > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > I should be able to do more testing. > > > > > > I do not know for sure that the mappings were missing or not before > > > reboot. > > > If I had restarted applications, then the mappings would be there before > > reboot. > > > > > > I do know that they are definitely missing after reboot. That is how I > > > first > > > discovered it, namely by not seeing the application registration from > > > 1.1.2 > after > > > reboot. > > > > > > Looked to me like most mappings were not present, but I'll recheck. > > > > > > I'll reboot both with a kernel that I know have a problem; > > > then start wireshark on 1, restart applications on 1.1.2, and make sure > > > they > can > > > talk. > > > print out nametable on both > > > Then reboot 1.1.2 and see. > > > > > > Anything else you'd want to see (short of running diag code)? > > > > That sounds like a plan. What I am most interested in right now is if it > > is only the > > "bulk&quo
Re: [tipc-discussion] tipc nametable update problem
> -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Tuesday, 05 April, 2016 12:12 > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > (ying.x...@gmail.com); Ying Xue > Subject: RE: [tipc-discussion] tipc nametable update problem > > I should be able to do more testing. > > I do not know for sure that the mappings were missing or not before reboot. > If I had restarted applications, then the mappings would be there before > reboot. > > I do know that they are definitely missing after reboot. That is how I first > discovered it, namely by not seeing the application registration from 1.1.2 > after > reboot. > > Looked to me like most mappings were not present, but I'll recheck. > > I'll reboot both with a kernel that I know have a problem; > then start wireshark on 1, restart applications on 1.1.2, and make sure they > can > talk. > print out nametable on both > Then reboot 1.1.2 and see. > > Anything else you'd want to see (short of running diag code)? That sounds like a plan. What I am most interested in right now is if it is only the "bulk" (pre-establishment) bindings that are lost, or if it is all of them. If we can confirm that this is the case we will have a very specific packet (#2) to trace on, and it should be possible to find out what happens to it and its contents. ///jon > > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Monday, April 04, 2016 3:47 PM > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > (ying.x...@gmail.com); Ying Xue > Subject: RE: [tipc-discussion] tipc nametable update problem > > Hi Rune, > I don't get any further with this without more input. > > First a question: were *all* bindings from 1.1.2 missing after the reboot, or > only > the first 6 ones (the ones in the "bulk" publication of message #12646 (in > the big > pcap file)?). > If the latter is the case, then we know that it is the content of this > particular > message that is not being applied. This message is special, because it > contains all > bindings that were made on 1.1.2 prior to the link being established. This > message is always sent with sequence #2, and we can see from the dump that it > was received (after a couple of retransmissions) and acknowledged by 1.1.1, > which means it was delivered (?) up to the binding table. > > If the bindings were missing in 1.1.1 before the reboot, but not after (which > seems to be contrary to what you state) my theory may still be valid. The > Wireshark dump does not go far enough back to see what happened to the > original publications; only that they were missing when you tried to remove > them. I wonder if you (or anybody else who is able to reproduce the problem) > could still make the effort to apply our patches and see what happens. But of > course, if you are 100% sure that the bindings were missing even after the > reboot > run you sent me, then the problem must be something else, and I don't see how > I can get further without instrumenting the code. > > Regards > ///jon > > > -Original Message- > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > Sent: Monday, 04 April, 2016 13:23 > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > They were not in there after the reboot, might not have been there before > > either. > > Only way to actually get it working was to restart whichever application > > has the > > missing registration on 1.1.2. > > > > > > -Original Message- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: Monday, April 04, 2016 11:44 AM > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > Thank you Rune, > > I think my theory was wrong. I can now see that the dropped items actually > were > > withdrawals, not publications, that were sent out just before the 1.1.2 > rebooted, > > of course because the server application was being killed at that moment. > > They were probably queued because the corresponding publications could not > > be found in the table. Were those entries visible in the table of 1.1.1 > &
Re: [tipc-discussion] tipc nametable update problem
I should be able to do more testing. I do not know for sure that the mappings were missing or not before reboot. If I had restarted applications, then the mappings would be there before reboot. I do know that they are definitely missing after reboot. That is how I first discovered it, namely by not seeing the application registration from 1.1.2 after reboot. Looked to me like most mappings were not present, but I'll recheck. I'll reboot both with a kernel that I know have a problem; then start wireshark on 1, restart applications on 1.1.2, and make sure they can talk. print out nametable on both Then reboot 1.1.2 and see. Anything else you'd want to see (short of running diag code)? -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: Monday, April 04, 2016 3:47 PM To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying (ying.x...@gmail.com); Ying Xue Subject: RE: [tipc-discussion] tipc nametable update problem Hi Rune, I don't get any further with this without more input. First a question: were *all* bindings from 1.1.2 missing after the reboot, or only the first 6 ones (the ones in the "bulk" publication of message #12646 (in the big pcap file)?). If the latter is the case, then we know that it is the content of this particular message that is not being applied. This message is special, because it contains all bindings that were made on 1.1.2 prior to the link being established. This message is always sent with sequence #2, and we can see from the dump that it was received (after a couple of retransmissions) and acknowledged by 1.1.1, which means it was delivered (?) up to the binding table. If the bindings were missing in 1.1.1 before the reboot, but not after (which seems to be contrary to what you state) my theory may still be valid. The Wireshark dump does not go far enough back to see what happened to the original publications; only that they were missing when you tried to remove them. I wonder if you (or anybody else who is able to reproduce the problem) could still make the effort to apply our patches and see what happens. But of course, if you are 100% sure that the bindings were missing even after the reboot run you sent me, then the problem must be something else, and I don't see how I can get further without instrumenting the code. Regards ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Monday, 04 April, 2016 13:23 > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > They were not in there after the reboot, might not have been there before > either. > Only way to actually get it working was to restart whichever application has > the > missing registration on 1.1.2. > > > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Monday, April 04, 2016 11:44 AM > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > Thank you Rune, > I think my theory was wrong. I can now see that the dropped items actually > were > withdrawals, not publications, that were sent out just before the 1.1.2 > rebooted, > of course because the server application was being killed at that moment. > They were probably queued because the corresponding publications could not > be found in the table. Were those entries visible in the table of 1.1.1 > before you > rebooted? My guess is not... > > ///jon > > > > -Original Message- > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > Sent: Monday, 04 April, 2016 11:11 > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > Here is the full capture. > > (If this is too big, I'll make it available on a dropbox share). > > > > Reboot happened approx 21:31:48, 2016-03-30 UTC. > > > > -Original Message- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: Monday, April 04, 2016 9:57 AM > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > > -Original Message- > > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > > Sent: Monda
Re: [tipc-discussion] tipc nametable update problem
They were not in there after the reboot, might not have been there before either. Only way to actually get it working was to restart whichever application has the missing registration on 1.1.2. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: Monday, April 04, 2016 11:44 AM To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan Subject: RE: [tipc-discussion] tipc nametable update problem Thank you Rune, I think my theory was wrong. I can now see that the dropped items actually were withdrawals, not publications, that were sent out just before the 1.1.2 rebooted, of course because the server application was being killed at that moment. They were probably queued because the corresponding publications could not be found in the table. Were those entries visible in the table of 1.1.1 before you rebooted? My guess is not... ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Monday, 04 April, 2016 11:11 > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > Here is the full capture. > (If this is too big, I'll make it available on a dropbox share). > > Reboot happened approx 21:31:48, 2016-03-30 UTC. > > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Monday, April 04, 2016 9:57 AM > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > -Original Message- > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > Sent: Monday, 04 April, 2016 09:53 > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > The test set up I have are two servers with SuperMicro X10DRL-i > > motherboards, > > each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory. > > I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet > > interfaces, but only one was active in this case, and I only use one as a > > bearer. > > > > There are other server pairs running on the same subnet with different > > netids. > > > > This particular issue happens when I reboot one of the two servers. The > > reboot > > (full cold reboot) takes almost 5 minutes because of POST with 10 NICS > > trying to > > do PXE boot. > > My guess is that in this particular run you rebooted node 1.1.2? > > If so, it doesn't contradict my theory. The dropped entries may quite well > have > been lingering in 1.1.1's backlog during the five minutes it took to reboot > the > peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during that > period. > It is first when we try to make an additional insertion (probably the one of > the link > going up) that the expired backlog items are discovered and purged. > So, I am still very interested in what happened before the reboot, since I > believe > that the dropped entries is just a late symptom of a problem that manifested > itself much earlier. > > ///jon > > > > > So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA, > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one > > that does not have a problem. > > All other I have tried (4.2, 4.4 and 4.5) have shown this problem. > > > > I should be able to compile a kernel and try. > > > > -Original Message- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: Friday, April 01, 2016 7:07 PM > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > Hi Rune, > > I am totally unable to reproduce your scenario, and the Wireshark dump does > not > > in itself show anything wrong. > > > > But by comparing the dropped publications with ones arriving in the > > messages, I > > could quickly confirm that those are not the same, although they partially > contain > > the same port names. > > Those of type 101 that are dropped has instance numbers that are not the > same > > as in the arriving messages (you are probably generating new ones for each > > se
Re: [tipc-discussion] tipc nametable update problem
They might not have been. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: Monday, April 04, 2016 11:44 AM To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan Subject: RE: [tipc-discussion] tipc nametable update problem Thank you Rune, I think my theory was wrong. I can now see that the dropped items actually were withdrawals, not publications, that were sent out just before the 1.1.2 rebooted, of course because the server application was being killed at that moment. They were probably queued because the corresponding publications could not be found in the table. Were those entries visible in the table of 1.1.1 before you rebooted? My guess is not... ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Monday, 04 April, 2016 11:11 > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > Here is the full capture. > (If this is too big, I'll make it available on a dropbox share). > > Reboot happened approx 21:31:48, 2016-03-30 UTC. > > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Monday, April 04, 2016 9:57 AM > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > -Original Message- > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > Sent: Monday, 04 April, 2016 09:53 > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > The test set up I have are two servers with SuperMicro X10DRL-i > > motherboards, > > each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory. > > I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet > > interfaces, but only one was active in this case, and I only use one as a > > bearer. > > > > There are other server pairs running on the same subnet with different > > netids. > > > > This particular issue happens when I reboot one of the two servers. The > > reboot > > (full cold reboot) takes almost 5 minutes because of POST with 10 NICS > > trying to > > do PXE boot. > > My guess is that in this particular run you rebooted node 1.1.2? > > If so, it doesn't contradict my theory. The dropped entries may quite well > have > been lingering in 1.1.1's backlog during the five minutes it took to reboot > the > peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during that > period. > It is first when we try to make an additional insertion (probably the one of > the link > going up) that the expired backlog items are discovered and purged. > So, I am still very interested in what happened before the reboot, since I > believe > that the dropped entries is just a late symptom of a problem that manifested > itself much earlier. > > ///jon > > > > > So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA, > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one > > that does not have a problem. > > All other I have tried (4.2, 4.4 and 4.5) have shown this problem. > > > > I should be able to compile a kernel and try. > > > > -Original Message- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: Friday, April 01, 2016 7:07 PM > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > Hi Rune, > > I am totally unable to reproduce your scenario, and the Wireshark dump does > not > > in itself show anything wrong. > > > > But by comparing the dropped publications with ones arriving in the > > messages, I > > could quickly confirm that those are not the same, although they partially > contain > > the same port names. > > Those of type 101 that are dropped has instance numbers that are not the > same > > as in the arriving messages (you are probably generating new ones for each > > session), while those which are the same (103, 105..) have different > > publication > > key numbers. > > > > The conclusion can only be that those are l
Re: [tipc-discussion] tipc nametable update problem
Thank you Rune, I think my theory was wrong. I can now see that the dropped items actually were withdrawals, not publications, that were sent out just before the 1.1.2 rebooted, of course because the server application was being killed at that moment. They were probably queued because the corresponding publications could not be found in the table. Were those entries visible in the table of 1.1.1 before you rebooted? My guess is not... ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Monday, 04 April, 2016 11:11 > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > Here is the full capture. > (If this is too big, I'll make it available on a dropbox share). > > Reboot happened approx 21:31:48, 2016-03-30 UTC. > > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Monday, April 04, 2016 9:57 AM > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > -Original Message- > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > Sent: Monday, 04 April, 2016 09:53 > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > The test set up I have are two servers with SuperMicro X10DRL-i > > motherboards, > > each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory. > > I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet > > interfaces, but only one was active in this case, and I only use one as a > > bearer. > > > > There are other server pairs running on the same subnet with different > > netids. > > > > This particular issue happens when I reboot one of the two servers. The > > reboot > > (full cold reboot) takes almost 5 minutes because of POST with 10 NICS > > trying to > > do PXE boot. > > My guess is that in this particular run you rebooted node 1.1.2? > > If so, it doesn't contradict my theory. The dropped entries may quite well > have > been lingering in 1.1.1's backlog during the five minutes it took to reboot > the > peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during that > period. > It is first when we try to make an additional insertion (probably the one of > the link > going up) that the expired backlog items are discovered and purged. > So, I am still very interested in what happened before the reboot, since I > believe > that the dropped entries is just a late symptom of a problem that manifested > itself much earlier. > > ///jon > > > > > So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA, > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one > > that does not have a problem. > > All other I have tried (4.2, 4.4 and 4.5) have shown this problem. > > > > I should be able to compile a kernel and try. > > > > -Original Message- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: Friday, April 01, 2016 7:07 PM > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > Hi Rune, > > I am totally unable to reproduce your scenario, and the Wireshark dump does > not > > in itself show anything wrong. > > > > But by comparing the dropped publications with ones arriving in the > > messages, I > > could quickly confirm that those are not the same, although they partially > contain > > the same port names. > > Those of type 101 that are dropped has instance numbers that are not the > same > > as in the arriving messages (you are probably generating new ones for each > > session), while those which are the same (103, 105..) have different > > publication > > key numbers. > > > > The conclusion can only be that those are leftovers from a previous session > which > > have not been purged when the contact with node 1.1.2 was lost. A quick > > check > > of the code confirms this; entries in the name table backlog are only purged > > based on expiration time, and not on loss of contact with their originating > > code, &g
Re: [tipc-discussion] tipc nametable update problem
> -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Monday, 04 April, 2016 09:53 > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > The test set up I have are two servers with SuperMicro X10DRL-i motherboards, > each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory. > I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet > interfaces, but only one was active in this case, and I only use one as a > bearer. > > There are other server pairs running on the same subnet with different netids. > > This particular issue happens when I reboot one of the two servers. The reboot > (full cold reboot) takes almost 5 minutes because of POST with 10 NICS trying > to > do PXE boot. My guess is that in this particular run you rebooted node 1.1.2? If so, it doesn't contradict my theory. The dropped entries may quite well have been lingering in 1.1.1's backlog during the five minutes it took to reboot the peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during that period. It is first when we try to make an additional insertion (probably the one of the link going up) that the expired backlog items are discovered and purged. So, I am still very interested in what happened before the reboot, since I believe that the dropped entries is just a late symptom of a problem that manifested itself much earlier. ///jon > > So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA, > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one > that does not have a problem. > All other I have tried (4.2, 4.4 and 4.5) have shown this problem. > > I should be able to compile a kernel and try. > > -Original Message- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Friday, April 01, 2016 7:07 PM > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > Subject: RE: [tipc-discussion] tipc nametable update problem > > Hi Rune, > I am totally unable to reproduce your scenario, and the Wireshark dump does > not > in itself show anything wrong. > > But by comparing the dropped publications with ones arriving in the messages, > I > could quickly confirm that those are not the same, although they partially > contain > the same port names. > Those of type 101 that are dropped has instance numbers that are not the same > as in the arriving messages (you are probably generating new ones for each > session), while those which are the same (103, 105..) have different > publication > key numbers. > > The conclusion can only be that those are leftovers from a previous session > which > have not been purged when the contact with node 1.1.2 was lost. A quick check > of the code confirms this; entries in the name table backlog are only purged > based on expiration time, and not on loss of contact with their originating > code, > as they should be. This is clearly a bug that somebody has to fix (Partha, > Richard?). > > Remains to understand how those entries got into the backlog in the first > place. > Something must have happened in the previous session that prevented them > from being applied. Since you never use instance sequences, overlapping > sequences cannot be the problem. If it were a memory allocation problem this > would be visible in the log. One possibility I see is that we have a race > condition > between the purging of binding table from the pre-previous session and the > previous one. The call to the purging function tipc_publ_notify() is done > outside > any lock protection, so it is fully possible that a link that quickly goes > down and > comes back may be able to deliver a new batch of publications before the > purging > action is finished. This becomes particularly likely if the number of > publications is > large, and we are running in a multi-VM or multi-namespace environment on the > same host. (Can you confirm, Rune?) > If only the interface or link is cycled, while the same application server > continues > running on 1.1.2, and 1.1.1 still is intact, this is a possible scenario. > The newly delivered publications will find a set of exactly equal > publications from > the previous session in the name table, and hence be put in the backlog. > > How do we resolve this? My first idea was to just run a process_backlog() on > the > flank of tipc_publ_notify(). But unfortunately we first need to run a purge > on the > backlog, according to the above, and this purge would be unab
Re: [tipc-discussion] tipc nametable update problem
The test set up I have are two servers with SuperMicro X10DRL-i motherboards, each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory. I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet interfaces, but only one was active in this case, and I only use one as a bearer. There are other server pairs running on the same subnet with different netids. This particular issue happens when I reboot one of the two servers. The reboot (full cold reboot) takes almost 5 minutes because of POST with 10 NICS trying to do PXE boot. So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA, http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one that does not have a problem. All other I have tried (4.2, 4.4 and 4.5) have shown this problem. I should be able to compile a kernel and try. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: Friday, April 01, 2016 7:07 PM To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan Subject: RE: [tipc-discussion] tipc nametable update problem Hi Rune, I am totally unable to reproduce your scenario, and the Wireshark dump does not in itself show anything wrong. But by comparing the dropped publications with ones arriving in the messages, I could quickly confirm that those are not the same, although they partially contain the same port names. Those of type 101 that are dropped has instance numbers that are not the same as in the arriving messages (you are probably generating new ones for each session), while those which are the same (103, 105..) have different publication key numbers. The conclusion can only be that those are leftovers from a previous session which have not been purged when the contact with node 1.1.2 was lost. A quick check of the code confirms this; entries in the name table backlog are only purged based on expiration time, and not on loss of contact with their originating code, as they should be. This is clearly a bug that somebody has to fix (Partha, Richard?). Remains to understand how those entries got into the backlog in the first place. Something must have happened in the previous session that prevented them from being applied. Since you never use instance sequences, overlapping sequences cannot be the problem. If it were a memory allocation problem this would be visible in the log. One possibility I see is that we have a race condition between the purging of binding table from the pre-previous session and the previous one. The call to the purging function tipc_publ_notify() is done outside any lock protection, so it is fully possible that a link that quickly goes down and comes back may be able to deliver a new batch of publications before the purging action is finished. This becomes particularly likely if the number of publications is large, and we are running in a multi-VM or multi-namespace environment on the same host. (Can you confirm, Rune?) If only the interface or link is cycled, while the same application server continues running on 1.1.2, and 1.1.1 still is intact, this is a possible scenario. The newly delivered publications will find a set of exactly equal publications from the previous session in the name table, and hence be put in the backlog. How do we resolve this? My first idea was to just run a process_backlog() on the flank of tipc_publ_notify(). But unfortunately we first need to run a purge on the backlog, according to the above, and this purge would be unable to distinguish between "old" and "new" backlog items, and would have to purge them all. A better, but maybe not so neat solution would be to use a similar solution as we do with socket wakeup. We create a pseudo message with a new message type PURGER, and append that to the tail of the node's namedq when we lose contact with a node, but this time *before* we release the node write lock. We could then test for this type, in addition to the PUBLICATION and WITHDRAWAL types, inside tipc_update_nametbl(), and call tipc_publ_notify(), still inside the name table lock, whenever this message type is encountered. This would guarantee that things happen in sequential order, since any new publications would end up behind the PURGER message in the node's namedq. Who has time to implement this? Also, do you Rune build your own kernel, so you could try out a patch from us and confirm my theory before we deliver such a solution upstream? Regards ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Thursday, 31 March, 2016 14:56 > To: 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > Have not been able to capture a corrupted update yet, but did manage to get > one where it dropped the updates. > > Here is the
Re: [tipc-discussion] tipc nametable update problem
Hi Rune, I am totally unable to reproduce your scenario, and the Wireshark dump does not in itself show anything wrong. But by comparing the dropped publications with ones arriving in the messages, I could quickly confirm that those are not the same, although they partially contain the same port names. Those of type 101 that are dropped has instance numbers that are not the same as in the arriving messages (you are probably generating new ones for each session), while those which are the same (103, 105..) have different publication key numbers. The conclusion can only be that those are leftovers from a previous session which have not been purged when the contact with node 1.1.2 was lost. A quick check of the code confirms this; entries in the name table backlog are only purged based on expiration time, and not on loss of contact with their originating code, as they should be. This is clearly a bug that somebody has to fix (Partha, Richard?). Remains to understand how those entries got into the backlog in the first place. Something must have happened in the previous session that prevented them from being applied. Since you never use instance sequences, overlapping sequences cannot be the problem. If it were a memory allocation problem this would be visible in the log. One possibility I see is that we have a race condition between the purging of binding table from the pre-previous session and the previous one. The call to the purging function tipc_publ_notify() is done outside any lock protection, so it is fully possible that a link that quickly goes down and comes back may be able to deliver a new batch of publications before the purging action is finished. This becomes particularly likely if the number of publications is large, and we are running in a multi-VM or multi-namespace environment on the same host. (Can you confirm, Rune?) If only the interface or link is cycled, while the same application server continues running on 1.1.2, and 1.1.1 still is intact, this is a possible scenario. The newly delivered publications will find a set of exactly equal publications from the previous session in the name table, and hence be put in the backlog. How do we resolve this? My first idea was to just run a process_backlog() on the flank of tipc_publ_notify(). But unfortunately we first need to run a purge on the backlog, according to the above, and this purge would be unable to distinguish between "old" and "new" backlog items, and would have to purge them all. A better, but maybe not so neat solution would be to use a similar solution as we do with socket wakeup. We create a pseudo message with a new message type PURGER, and append that to the tail of the node's namedq when we lose contact with a node, but this time *before* we release the node write lock. We could then test for this type, in addition to the PUBLICATION and WITHDRAWAL types, inside tipc_update_nametbl(), and call tipc_publ_notify(), still inside the name table lock, whenever this message type is encountered. This would guarantee that things happen in sequential order, since any new publications would end up behind the PURGER message in the node's namedq. Who has time to implement this? Also, do you Rune build your own kernel, so you could try out a patch from us and confirm my theory before we deliver such a solution upstream? Regards ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Thursday, 31 March, 2016 14:56 > To: 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > Have not been able to capture a corrupted update yet, but did manage to get > one where it dropped the updates. > > Here is the dmesg output (times are in CST). > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {103, > 1003, 1003} from <1.1.2> key=4271114002 > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {103, > 3, 3} > from <1.1.2> key=3675117576 > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, > 10, 10} from <1.1.2> key=2005280282 > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of > {13762562, 0, 0} from <1.1.2> key=3568185108 > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {206, > 9, 9} > from <1.1.2> key=3641103006 > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, > 133398, 133398} from <1.1.2> key=2675546830 > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, > 133138, 133138} from <1.1.2> key=2939408752 > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {105, > 104, > 104} from <1.1.2> key=140803529 > Mar 30 16:31:48 testserv218 kernel: Dropping name t
Re: [tipc-discussion] tipc nametable update problem
Have not been able to capture a corrupted update yet, but did manage to get one where it dropped the updates. Here is the dmesg output (times are in CST). Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {103, 1003, 1003} from <1.1.2> key=4271114002 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {103, 3, 3} from <1.1.2> key=3675117576 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, 10, 10} from <1.1.2> key=2005280282 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {13762562, 0, 0} from <1.1.2> key=3568185108 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {206, 9, 9} from <1.1.2> key=3641103006 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, 133398, 133398} from <1.1.2> key=2675546830 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, 133138, 133138} from <1.1.2> key=2939408752 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {105, 104, 104} from <1.1.2> key=140803529 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {105, 4, 4} from <1.1.2> key=3695579549 Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, 133386, 133386} from <1.1.2> key=808970575 Attached it the tipc packets received on 1.1.1. (where the log is from during the same time period). -Original Message- From: Jon Maloy [mailto:ma...@donjonn.com] Sent: Saturday, March 26, 2016 8:51 AM To: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Hi Rune, I assume you are still using Ethernet/L2 a bearer, and not UDP? UDP is relatively new as supported bearer, and may incur new problems. Anyway, name table updates are never fragmented. As far as I can see you have five bindings on each node, and that amounts to either one "bulk" update message with all bindings in one message (100 bytes in your case) or five individual update messages of 20 bytes each. All depending on whether you application was started, and the bindings made before or after the link between the nodes are established. To me it looks like the dropped bindings are severely corrupted, and that may be a starting point for our trouble shooting. Could you start Wireshark and have a look at the messages being exchanged when this happens? If you only look for NAME_DISTRIBUTOR messages, the number of messages to analyze should be very limited, and we can at least see if our bug is on the sending or the receiving side. God påske ///jon On 03/25/2016 12:05 PM, Rune Torgersen wrote: > Is it possible for the update messages to be greater than 1 MTU? > > Because were doing a lot of video multicast, we’re turning on UDP RSS hashing > to get messages to differnet receive queue (via ethtool -N ethN rx-flow-hash > udp4 sdfn) > Because of that, there is a kernel warning per interface, and I am curious if > that is what is causing this: > > igb :07:00.0: enabling UDP RSS: fragmented packets may arrive out of > order to the stack above > > > > > From: Erik Hugne [mailto:erik.hu...@gmail.com] > Sent: Wednesday, March 23, 2016 12:07 PM > To: Rune Torgersen > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > > When an update is received, bit cannot immediately be applied to the local > nametable, we retain it for a few seconds in a backlog queue. > Then for each subsequent update received (that may have cleared up the > conflict) we try to apply any update stored in the backlog. > The timeout can be set with sysctl -w tipc.named_timeout=xxx > Default is 2000ms. > > So clock drift does not matter. > > I'm guessing that the nametable updates are dropped on the sending side. > Are there any interface renaming going on after tipc is enabled? > > //E > On Mar 23, 2016 17:04, "Rune Torgersen" > <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: > How much clock drift between units does the nametable update allow? > > On one of the test units, the clock was off by about a second between them. > > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > Sent: Tuesday, March 22, 2016 10:58 AM > To: > tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> > Subject: Re: [tipc-discussion] tipc nametable update problem > > Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) > > Here is an except of tipc-config -nt on both systems: > address 1.1.1: > > 1041025 1025 <1.1.1:3540751351> 3540751351 > cluster > 10465537
Re: [tipc-discussion] tipc nametable update problem
Hi Rune, I assume you are still using Ethernet/L2 a bearer, and not UDP? UDP is relatively new as supported bearer, and may incur new problems. Anyway, name table updates are never fragmented. As far as I can see you have five bindings on each node, and that amounts to either one "bulk" update message with all bindings in one message (100 bytes in your case) or five individual update messages of 20 bytes each. All depending on whether you application was started, and the bindings made before or after the link between the nodes are established. To me it looks like the dropped bindings are severely corrupted, and that may be a starting point for our trouble shooting. Could you start Wireshark and have a look at the messages being exchanged when this happens? If you only look for NAME_DISTRIBUTOR messages, the number of messages to analyze should be very limited, and we can at least see if our bug is on the sending or the receiving side. God påske ///jon On 03/25/2016 12:05 PM, Rune Torgersen wrote: > Is it possible for the update messages to be greater than 1 MTU? > > Because were doing a lot of video multicast, we’re turning on UDP RSS hashing > to get messages to differnet receive queue (via ethtool -N ethN rx-flow-hash > udp4 sdfn) > Because of that, there is a kernel warning per interface, and I am curious if > that is what is causing this: > > igb :07:00.0: enabling UDP RSS: fragmented packets may arrive out of > order to the stack above > > > > > From: Erik Hugne [mailto:erik.hu...@gmail.com] > Sent: Wednesday, March 23, 2016 12:07 PM > To: Rune Torgersen > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > > When an update is received, bit cannot immediately be applied to the local > nametable, we retain it for a few seconds in a backlog queue. > Then for each subsequent update received (that may have cleared up the > conflict) we try to apply any update stored in the backlog. > The timeout can be set with sysctl -w tipc.named_timeout=xxx > Default is 2000ms. > > So clock drift does not matter. > > I'm guessing that the nametable updates are dropped on the sending side. > Are there any interface renaming going on after tipc is enabled? > > //E > On Mar 23, 2016 17:04, "Rune Torgersen" > <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: > How much clock drift between units does the nametable update allow? > > On one of the test units, the clock was off by about a second between them. > > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > Sent: Tuesday, March 22, 2016 10:58 AM > To: > tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> > Subject: Re: [tipc-discussion] tipc nametable update problem > > Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) > > Here is an except of tipc-config -nt on both systems: > address 1.1.1: > > 1041025 1025 <1.1.1:3540751351> 3540751351 > cluster > 10465537 65537 <1.1.1:4046699456> 4046699456 > cluster > 104131073 131073 <1.1.2:59828181> 59828181 > cluster > 10416777984 16777984 <1.1.1:3135589675> 3135589675 > cluster > 10433555200 33555200 <1.1.2:2193437365> 2193437365 > cluster > > Address 1.1.2: > 104131073 131073 <1.1.2:59828181> 59828181 > cluster > 10433555200 33555200 <1.1.2:2193437365> 2193437365 > cluster > > So in this case 1 sees all address 2 has published, while 2 is not seeing the > addesses from 1. > 2 was rebooted to make this happen. > > Is tere a possibility I'm calling tipc-config too early, and the interface is > not yet up, or is this still the same roblem I saw before. > > There is nome dropped nametable update messages in kernel: > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Ma
Re: [tipc-discussion] tipc nametable update problem
It looks to me from my dmesg dump that I'm actually crashing on line 285 in subscr.c. From: Jon Maloy [jon.ma...@ericsson.com] Sent: Thursday, March 24, 2016 1:56 PM To: Rune Torgersen; tipc-discussion@lists.sourceforge.net Subject: RE: [tipc-discussion] tipc nametable update problem Hi Rune, As far as I can see the fix is present in the 4.5.0 code (subscr.c, line 299), so it may be that there still is a problem. I suspect you will have to wait until Partha is back from Easter leave to get a better answer to this. Regards ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Thursday, 24 March, 2016 10:08 > To: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > 4.5.0 kernel still gets a NULL ptr. I have kernel core dumps if anyone knows > what > to look for. > I am now trying 4.2 kernel. > > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Wednesday, March 23, 2016 12:48 PM > To: 'Erik Hugne' > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > Interfaces are renamed, yes, but that should all have been done before TIPC is > loaded and configured. > > So I have been testing different kernels, and the nametable update problems > only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5 > mainline. > Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least > after > reboot, no nametable issues, and all entries seem to be on both sides). > > Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable > enough (looks > like it also might have a fix for the NULL ptr crash I saw). > > Another interesting thing I saw on one reboot, was that one unit got > completely > invalid entries from the other. > > [ 29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2> > key=3210283144 > [ 29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} > from > <1.1.2> key=0 > [ 29.916013] Dropping name table update (0) of {3260614792, 4294955122, > 3243837576} from <1.1.2> key=2743205887 > [ 29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0 > [ 29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> > key=2184 > [ 29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from > <1.1.2> key=57427 > [ 29.916025] Dropping name table update (0) of {3831105672, 4294959155, > 442042504} from <1.1.2> key=89786504 > > > on 1.1.1: > 294907136 0 49389 <1.1.2:3260614792> 3260614792 cluster > 2560 0 <1.1.2:0> 0 > cluster > 4294911232 0 8418048<1.1.2:268959744> 268959744 > cluster > 4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922 > cluster > 4294928877 3260614792 4294901760 <1.1.2:0> 0 > cluster > 4294904576 0 38 <1.1.2:1062668424> 1062668424 > cluster > 2816 0 63724 <1.1.2:3260614792> 3260614792 > cluster > 1097732991 18669 3260614792 <1.1.2:4294949099> 4294949099 > cluster > > while on 1.1.2 those addresses does not exist. > > > From: Erik Hugne [mailto:erik.hu...@gmail.com] > Sent: Wednesday, March 23, 2016 12:07 PM > To: Rune Torgersen > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > > When an update is received, bit cannot immediately be applied to the local > nametable, we retain it for a few seconds in a backlog queue. > Then for each subsequent update received (that may have cleared up the > conflict) we try to apply any update stored in the backlog. > The timeout can be set with sysctl -w tipc.named_timeout=xxx > Default is 2000ms. > > So clock drift does not matter. > > I'm guessing that the nametable updates are dropped on the sending side. > Are there any interface renaming going on after tipc is enabled? > > //E > On Mar 23, 2016 17:04, "Rune Torgersen" > <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: > How much clock drift between units does the nametable update allow? > > On one of the test units, the clock was off by about a second between them. > > -Original Message- > From: Rune Torgersen > [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > Sent: Tuesday, March 22, 2016 10:58 AM > To: tipc-discussion@lists.sourceforge.net&l
Re: [tipc-discussion] tipc nametable update problem
Hi Rune, As far as I can see the fix is present in the 4.5.0 code (subscr.c, line 299), so it may be that there still is a problem. I suspect you will have to wait until Partha is back from Easter leave to get a better answer to this. Regards ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Thursday, 24 March, 2016 10:08 > To: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > 4.5.0 kernel still gets a NULL ptr. I have kernel core dumps if anyone knows > what > to look for. > I am now trying 4.2 kernel. > > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Wednesday, March 23, 2016 12:48 PM > To: 'Erik Hugne' > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > Interfaces are renamed, yes, but that should all have been done before TIPC is > loaded and configured. > > So I have been testing different kernels, and the nametable update problems > only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5 > mainline. > Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least > after > reboot, no nametable issues, and all entries seem to be on both sides). > > Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable > enough (looks > like it also might have a fix for the NULL ptr crash I saw). > > Another interesting thing I saw on one reboot, was that one unit got > completely > invalid entries from the other. > > [ 29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2> > key=3210283144 > [ 29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} > from > <1.1.2> key=0 > [ 29.916013] Dropping name table update (0) of {3260614792, 4294955122, > 3243837576} from <1.1.2> key=2743205887 > [ 29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0 > [ 29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> > key=2184 > [ 29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from > <1.1.2> key=57427 > [ 29.916025] Dropping name table update (0) of {3831105672, 4294959155, > 442042504} from <1.1.2> key=89786504 > > > on 1.1.1: > 294907136 0 49389 <1.1.2:3260614792> 3260614792 cluster > 2560 0 <1.1.2:0> 0 > cluster > 4294911232 0 8418048<1.1.2:268959744> 268959744 > cluster > 4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922 > cluster > 4294928877 3260614792 4294901760 <1.1.2:0> 0 > cluster > 4294904576 0 38 <1.1.2:1062668424> 1062668424 > cluster > 2816 0 63724 <1.1.2:3260614792> 3260614792 > cluster > 1097732991 18669 3260614792 <1.1.2:4294949099> 4294949099 > cluster > > while on 1.1.2 those addresses does not exist. > > > From: Erik Hugne [mailto:erik.hu...@gmail.com] > Sent: Wednesday, March 23, 2016 12:07 PM > To: Rune Torgersen > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > > When an update is received, bit cannot immediately be applied to the local > nametable, we retain it for a few seconds in a backlog queue. > Then for each subsequent update received (that may have cleared up the > conflict) we try to apply any update stored in the backlog. > The timeout can be set with sysctl -w tipc.named_timeout=xxx > Default is 2000ms. > > So clock drift does not matter. > > I'm guessing that the nametable updates are dropped on the sending side. > Are there any interface renaming going on after tipc is enabled? > > //E > On Mar 23, 2016 17:04, "Rune Torgersen" > <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: > How much clock drift between units does the nametable update allow? > > On one of the test units, the clock was off by about a second between them. > > -Original Message- > From: Rune Torgersen > [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > Sent: Tuesday, March 22, 2016 10:58 AM > To: tipc-discussion@lists.sourceforge.net<mailto:tipc- > discuss...@lists.sourceforge.net> > Subject: Re: [tipc-discussion] tipc nametable update problem > > Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) > > Here is an except of tipc-config -nt on both systems: > address 1.1.1: > > 10410
Re: [tipc-discussion] tipc nametable update problem
4.5.0 kernel still gets a NULL ptr. I have kernel core dumps if anyone knows what to look for. I am now trying 4.2 kernel. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com] Sent: Wednesday, March 23, 2016 12:48 PM To: 'Erik Hugne' Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Interfaces are renamed, yes, but that should all have been done before TIPC is loaded and configured. So I have been testing different kernels, and the nametable update problems only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5 mainline. Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least after reboot, no nametable issues, and all entries seem to be on both sides). Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable enough (looks like it also might have a fix for the NULL ptr crash I saw). Another interesting thing I saw on one reboot, was that one unit got completely invalid entries from the other. [ 29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=3210283144 [ 29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} from <1.1.2> key=0 [ 29.916013] Dropping name table update (0) of {3260614792, 4294955122, 3243837576} from <1.1.2> key=2743205887 [ 29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0 [ 29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=2184 [ 29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from <1.1.2> key=57427 [ 29.916025] Dropping name table update (0) of {3831105672, 4294959155, 442042504} from <1.1.2> key=89786504 on 1.1.1: 294907136 0 49389 <1.1.2:3260614792> 3260614792 cluster 2560 0 <1.1.2:0> 0 cluster 4294911232 0 8418048<1.1.2:268959744> 268959744 cluster 4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922 cluster 4294928877 3260614792 4294901760 <1.1.2:0> 0 cluster 4294904576 0 38 <1.1.2:1062668424> 1062668424 cluster 2816 0 63724 <1.1.2:3260614792> 3260614792 cluster 1097732991 18669 3260614792 <1.1.2:4294949099> 4294949099 cluster while on 1.1.2 those addresses does not exist. From: Erik Hugne [mailto:erik.hu...@gmail.com] Sent: Wednesday, March 23, 2016 12:07 PM To: Rune Torgersen Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem When an update is received, bit cannot immediately be applied to the local nametable, we retain it for a few seconds in a backlog queue. Then for each subsequent update received (that may have cleared up the conflict) we try to apply any update stored in the backlog. The timeout can be set with sysctl -w tipc.named_timeout=xxx Default is 2000ms. So clock drift does not matter. I'm guessing that the nametable updates are dropped on the sending side. Are there any interface renaming going on after tipc is enabled? //E On Mar 23, 2016 17:04, "Rune Torgersen" <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: How much clock drift between units does the nametable update allow? On one of the test units, the clock was off by about a second between them. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] Sent: Tuesday, March 22, 2016 10:58 AM To: tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> Subject: Re: [tipc-discussion] tipc nametable update problem Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) Here is an except of tipc-config -nt on both systems: address 1.1.1: 1041025 1025 <1.1.1:3540751351> 3540751351 cluster 10465537 65537 <1.1.1:4046699456> 4046699456 cluster 104131073 131073 <1.1.2:59828181> 59828181cluster 10416777984 16777984 <1.1.1:3135589675> 3135589675 cluster 10433555200 33555200 <1.1.2:2193437365> 2193437365 cluster Address 1.1.2: 104131073 131073 <1.1.2:59828181> 59828181cluster 10433555200 33555200 <1.1.2:2193437365> 2193437365 cluster So in this case 1 sees all address 2 has published, while 2 is not seeing the addesses from 1. 2 was rebooted to make this happen. Is tere a possibility I'm calling tipc-config too early, and the interface is not yet up, or is this still the same roblem I saw before. There is nome dropped nametable update messages in kernel: Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0,
Re: [tipc-discussion] tipc nametable update problem
Logs indicate all renames done about 15 seconds before tipc starts: Mar 23 12:37:58 testserv218 kernel: igb :04:00.1 rename3: renamed from eth1 Mar 23 12:37:58 testserv218 kernel: igb :04:00.0 rename2: renamed from eth0 Mar 23 12:37:58 testserv218 kernel: igb :04:00.2 rename4: renamed from eth2 Mar 23 12:37:58 testserv218 kernel: igb :04:00.3 rename5: renamed from eth3 Mar 23 12:37:58 testserv218 kernel: igb :08:00.0 eth1: renamed from eth5 Mar 23 12:37:58 testserv218 kernel: igb :07:00.0 eth0: renamed from eth4 Mar 23 12:37:58 testserv218 kernel: igb :04:00.1 eth3: renamed from rename3 Mar 23 12:37:58 testserv218 kernel: igb :04:00.0 eth2: renamed from rename2 Mar 23 12:37:58 testserv218 kernel: igb :04:00.2 eth4: renamed from rename4 Mar 23 12:37:58 testserv218 kernel: igb :04:00.3 eth5: renamed from rename5 Mar 23 12:38:13 testserv218 kernel: tipc: Activated (version 2.0.0) Mar 23 12:38:13 testserv218 kernel: NET: Registered protocol family 30 Mar 23 12:38:13 testserv218 kernel: tipc: Started in single node mode Mar 23 12:38:13 testserv218 kernel: Started in network mode Mar 23 12:38:13 testserv218 kernel: Own node address <1.1.1>, network identity 3013 Mar 23 12:38:13 testserv218 kernel: Enabled bearer , discovery domain <1.1.0>, priority 10 -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com] Sent: Wednesday, March 23, 2016 12:48 PM To: 'Erik Hugne' Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Interfaces are renamed, yes, but that should all have been done before TIPC is loaded and configured. So I have been testing different kernels, and the nametable update problems only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5 mainline. Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least after reboot, no nametable issues, and all entries seem to be on both sides). Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable enough (looks like it also might have a fix for the NULL ptr crash I saw). Another interesting thing I saw on one reboot, was that one unit got completely invalid entries from the other. [ 29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=3210283144 [ 29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} from <1.1.2> key=0 [ 29.916013] Dropping name table update (0) of {3260614792, 4294955122, 3243837576} from <1.1.2> key=2743205887 [ 29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0 [ 29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=2184 [ 29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from <1.1.2> key=57427 [ 29.916025] Dropping name table update (0) of {3831105672, 4294959155, 442042504} from <1.1.2> key=89786504 on 1.1.1: 294907136 0 49389 <1.1.2:3260614792> 3260614792 cluster 2560 0 <1.1.2:0> 0 cluster 4294911232 0 8418048<1.1.2:268959744> 268959744 cluster 4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922 cluster 4294928877 3260614792 4294901760 <1.1.2:0> 0 cluster 4294904576 0 38 <1.1.2:1062668424> 1062668424 cluster 2816 0 63724 <1.1.2:3260614792> 3260614792 cluster 1097732991 18669 3260614792 <1.1.2:4294949099> 4294949099 cluster while on 1.1.2 those addresses does not exist. From: Erik Hugne [mailto:erik.hu...@gmail.com] Sent: Wednesday, March 23, 2016 12:07 PM To: Rune Torgersen Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem When an update is received, bit cannot immediately be applied to the local nametable, we retain it for a few seconds in a backlog queue. Then for each subsequent update received (that may have cleared up the conflict) we try to apply any update stored in the backlog. The timeout can be set with sysctl -w tipc.named_timeout=xxx Default is 2000ms. So clock drift does not matter. I'm guessing that the nametable updates are dropped on the sending side. Are there any interface renaming going on after tipc is enabled? //E On Mar 23, 2016 17:04, "Rune Torgersen" <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: How much clock drift between units does the nametable update allow? On one of the test units, the clock was off by about a second between them. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] Sent: Tuesday, March 22, 2016 10:58 AM To: tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> Subject: Re: [tipc-discussion] tipc n
Re: [tipc-discussion] tipc nametable update problem
Interfaces are renamed, yes, but that should all have been done before TIPC is loaded and configured. So I have been testing different kernels, and the nametable update problems only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5 mainline. Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least after reboot, no nametable issues, and all entries seem to be on both sides). Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable enough (looks like it also might have a fix for the NULL ptr crash I saw). Another interesting thing I saw on one reboot, was that one unit got completely invalid entries from the other. [ 29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=3210283144 [ 29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} from <1.1.2> key=0 [ 29.916013] Dropping name table update (0) of {3260614792, 4294955122, 3243837576} from <1.1.2> key=2743205887 [ 29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0 [ 29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=2184 [ 29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from <1.1.2> key=57427 [ 29.916025] Dropping name table update (0) of {3831105672, 4294959155, 442042504} from <1.1.2> key=89786504 on 1.1.1: 294907136 0 49389 <1.1.2:3260614792> 3260614792 cluster 2560 0 <1.1.2:0> 0 cluster 4294911232 0 8418048<1.1.2:268959744> 268959744 cluster 4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922 cluster 4294928877 3260614792 4294901760 <1.1.2:0> 0 cluster 4294904576 0 38 <1.1.2:1062668424> 1062668424 cluster 2816 0 63724 <1.1.2:3260614792> 3260614792 cluster 1097732991 18669 3260614792 <1.1.2:4294949099> 4294949099 cluster while on 1.1.2 those addresses does not exist. From: Erik Hugne [mailto:erik.hu...@gmail.com] Sent: Wednesday, March 23, 2016 12:07 PM To: Rune Torgersen Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem When an update is received, bit cannot immediately be applied to the local nametable, we retain it for a few seconds in a backlog queue. Then for each subsequent update received (that may have cleared up the conflict) we try to apply any update stored in the backlog. The timeout can be set with sysctl -w tipc.named_timeout=xxx Default is 2000ms. So clock drift does not matter. I'm guessing that the nametable updates are dropped on the sending side. Are there any interface renaming going on after tipc is enabled? //E On Mar 23, 2016 17:04, "Rune Torgersen" <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: How much clock drift between units does the nametable update allow? On one of the test units, the clock was off by about a second between them. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] Sent: Tuesday, March 22, 2016 10:58 AM To: tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> Subject: Re: [tipc-discussion] tipc nametable update problem Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) Here is an except of tipc-config -nt on both systems: address 1.1.1: 1041025 1025 <1.1.1:3540751351> 3540751351 cluster 10465537 65537 <1.1.1:4046699456> 4046699456 cluster 104131073 131073 <1.1.2:59828181> 59828181cluster 10416777984 16777984 <1.1.1:3135589675> 3135589675 cluster 10433555200 33555200 <1.1.2:2193437365> 2193437365 cluster Address 1.1.2: 104131073 131073 <1.1.2:59828181> 59828181cluster 10433555200 33555200 <1.1.2:2193437365> 2193437365 cluster So in this case 1 sees all address 2 has published, while 2 is not seeing the addesses from 1. 2 was rebooted to make this happen. Is tere a possibility I'm calling tipc-config too early, and the interface is not yet up, or is this still the same roblem I saw before. There is nome dropped nametable update messages in kernel: Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34
Re: [tipc-discussion] tipc nametable update problem
When an update is received, bit cannot immediately be applied to the local nametable, we retain it for a few seconds in a backlog queue. Then for each subsequent update received (that may have cleared up the conflict) we try to apply any update stored in the backlog. The timeout can be set with sysctl -w tipc.named_timeout=xxx Default is 2000ms. So clock drift does not matter. I'm guessing that the nametable updates are dropped on the sending side. Are there any interface renaming going on after tipc is enabled? //E On Mar 23, 2016 17:04, "Rune Torgersen" <ru...@innovsys.com> wrote: > How much clock drift between units does the nametable update allow? > > On one of the test units, the clock was off by about a second between them. > > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Tuesday, March 22, 2016 10:58 AM > To: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) > > Here is an except of tipc-config -nt on both systems: > address 1.1.1: > > 1041025 1025 <1.1.1:3540751351> 3540751351 > cluster > 10465537 65537 <1.1.1:4046699456> 4046699456 > cluster > 104131073 131073 <1.1.2:59828181> 59828181 > cluster > 10416777984 16777984 <1.1.1:3135589675> 3135589675 > cluster > 10433555200 33555200 <1.1.2:2193437365> 2193437365 > cluster > > Address 1.1.2: > 104131073 131073 <1.1.2:59828181> 59828181 > cluster > 10433555200 33555200 <1.1.2:2193437365> 2193437365 > cluster > > So in this case 1 sees all address 2 has published, while 2 is not seeing > the addesses from 1. > 2 was rebooted to make this happen. > > Is tere a possibility I'm calling tipc-config too early, and the interface > is not yet up, or is this still the same roblem I saw before. > > There is nome dropped nametable update messages in kernel: > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 0} from <1.1.1> key=0 > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of > {0, 0, 16600} from <1.1.1> key=4294915584 > > but they do not mention port 104. > > If I restart the application on 1 having 104:1025 open, it shows up on 2. > > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Monday, March 21, 2016 12:17 AM > To: Jon Maloy; Erik Hugne > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > Using TIPC_CLUSTER_SCOPE will work. > This was new system bring-up, and code was ported from older system, which > used TIPC 1.7.7 driver. > A quick search and replace of TIPC_ZONE_SCOPE is not a bad workaround. > > From: Jon Maloy [jon.ma...@ericsson.com] > Sent: Saturday, March 19, 2016 10:57 AM > To: Erik Hugne > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > Maybe not completely trivial, but not very complex either. I know I failed > to describe this verbally to you at one moment, but I can put it on paper, > and you will realize it is not a big deal. > If you or anybody else are interested I can make an effort to describe > this next week. I don't have time to implement it myself at the moment. > > ///jon > > > From: Erik Hugne [mailto:erik.hu...@gmail.com] > Sent: Friday, 18 March, 2016 12:38 > To: Jon Maloy > Subject: RE: [tipc-discussion] tipc nametable update problem > > > Agree. > But implementing a new lookup mecha
Re: [tipc-discussion] tipc nametable update problem
How much clock drift between units does the nametable update allow? On one of the test units, the clock was off by about a second between them. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com] Sent: Tuesday, March 22, 2016 10:58 AM To: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) Here is an except of tipc-config -nt on both systems: address 1.1.1: 1041025 1025 <1.1.1:3540751351> 3540751351 cluster 10465537 65537 <1.1.1:4046699456> 4046699456 cluster 104131073 131073 <1.1.2:59828181> 59828181cluster 10416777984 16777984 <1.1.1:3135589675> 3135589675 cluster 10433555200 33555200 <1.1.2:2193437365> 2193437365 cluster Address 1.1.2: 104131073 131073 <1.1.2:59828181> 59828181cluster 10433555200 33555200 <1.1.2:2193437365> 2193437365 cluster So in this case 1 sees all address 2 has published, while 2 is not seeing the addesses from 1. 2 was rebooted to make this happen. Is tere a possibility I'm calling tipc-config too early, and the interface is not yet up, or is this still the same roblem I saw before. There is nome dropped nametable update messages in kernel: Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 16600} from <1.1.1> key=4294915584 but they do not mention port 104. If I restart the application on 1 having 104:1025 open, it shows up on 2. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com] Sent: Monday, March 21, 2016 12:17 AM To: Jon Maloy; Erik Hugne Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Using TIPC_CLUSTER_SCOPE will work. This was new system bring-up, and code was ported from older system, which used TIPC 1.7.7 driver. A quick search and replace of TIPC_ZONE_SCOPE is not a bad workaround. From: Jon Maloy [jon.ma...@ericsson.com] Sent: Saturday, March 19, 2016 10:57 AM To: Erik Hugne Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Maybe not completely trivial, but not very complex either. I know I failed to describe this verbally to you at one moment, but I can put it on paper, and you will realize it is not a big deal. If you or anybody else are interested I can make an effort to describe this next week. I don't have time to implement it myself at the moment. ///jon From: Erik Hugne [mailto:erik.hu...@gmail.com] Sent: Friday, 18 March, 2016 12:38 To: Jon Maloy Subject: RE: [tipc-discussion] tipc nametable update problem Agree. But implementing a new lookup mechanism is not trivial.. :) @Rune afaik there is no functional limitation on using cluster scoped publications, so i hope that's an acceptable workaround for you. //E On Mar 18, 2016 16:46, "Jon Maloy" <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote: Still weird that this starts happening now, when this issue is supposed to be remedied, and not earlier, when it wasn't. We really need that "permit overlapping publications" solution I have been preaching about. Br ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > Sent: Friday, 18 March, 2016 10:25 > To: 'Erik Hugne' > Cc: > tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> > Subject: Re: [tipc-discussion] tipc nametable update problem > > Yes I have. > There are quite a few at the same time like this: > > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1853110816, > 19529986
Re: [tipc-discussion] tipc nametable update problem
Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) Here is an except of tipc-config -nt on both systems: address 1.1.1: 1041025 1025 <1.1.1:3540751351> 3540751351 cluster 10465537 65537 <1.1.1:4046699456> 4046699456 cluster 104131073 131073 <1.1.2:59828181> 59828181cluster 10416777984 16777984 <1.1.1:3135589675> 3135589675 cluster 10433555200 33555200 <1.1.2:2193437365> 2193437365 cluster Address 1.1.2: 104131073 131073 <1.1.2:59828181> 59828181cluster 10433555200 33555200 <1.1.2:2193437365> 2193437365 cluster So in this case 1 sees all address 2 has published, while 2 is not seeing the addesses from 1. 2 was rebooted to make this happen. Is tere a possibility I'm calling tipc-config too early, and the interface is not yet up, or is this still the same roblem I saw before. There is nome dropped nametable update messages in kernel: Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 0} from <1.1.1> key=0 Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 0, 16600} from <1.1.1> key=4294915584 but they do not mention port 104. If I restart the application on 1 having 104:1025 open, it shows up on 2. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com] Sent: Monday, March 21, 2016 12:17 AM To: Jon Maloy; Erik Hugne Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Using TIPC_CLUSTER_SCOPE will work. This was new system bring-up, and code was ported from older system, which used TIPC 1.7.7 driver. A quick search and replace of TIPC_ZONE_SCOPE is not a bad workaround. From: Jon Maloy [jon.ma...@ericsson.com] Sent: Saturday, March 19, 2016 10:57 AM To: Erik Hugne Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Maybe not completely trivial, but not very complex either. I know I failed to describe this verbally to you at one moment, but I can put it on paper, and you will realize it is not a big deal. If you or anybody else are interested I can make an effort to describe this next week. I don't have time to implement it myself at the moment. ///jon From: Erik Hugne [mailto:erik.hu...@gmail.com] Sent: Friday, 18 March, 2016 12:38 To: Jon Maloy Subject: RE: [tipc-discussion] tipc nametable update problem Agree. But implementing a new lookup mechanism is not trivial.. :) @Rune afaik there is no functional limitation on using cluster scoped publications, so i hope that's an acceptable workaround for you. //E On Mar 18, 2016 16:46, "Jon Maloy" <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote: Still weird that this starts happening now, when this issue is supposed to be remedied, and not earlier, when it wasn't. We really need that "permit overlapping publications" solution I have been preaching about. Br ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > Sent: Friday, 18 March, 2016 10:25 > To: 'Erik Hugne' > Cc: > tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> > Subject: Re: [tipc-discussion] tipc nametable update problem > > Yes I have. > There are quite a few at the same time like this: > > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1853110816, > 1952998688, 1801810542} from <1.1.1> key=1633905523 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {542000723, > 544613732, 544437616} from <1.1.1> key=167800175 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {544239474, > 1953325424, 543582572} from <1.1.1> key=1930035237
Re: [tipc-discussion] tipc nametable update problem
Using TIPC_CLUSTER_SCOPE will work. This was new system bring-up, and code was ported from older system, which used TIPC 1.7.7 driver. A quick search and replace of TIPC_ZONE_SCOPE is not a bad workaround. From: Jon Maloy [jon.ma...@ericsson.com] Sent: Saturday, March 19, 2016 10:57 AM To: Erik Hugne Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Maybe not completely trivial, but not very complex either. I know I failed to describe this verbally to you at one moment, but I can put it on paper, and you will realize it is not a big deal. If you or anybody else are interested I can make an effort to describe this next week. I don’t have time to implement it myself at the moment. ///jon From: Erik Hugne [mailto:erik.hu...@gmail.com] Sent: Friday, 18 March, 2016 12:38 To: Jon Maloy Subject: RE: [tipc-discussion] tipc nametable update problem Agree. But implementing a new lookup mechanism is not trivial.. :) @Rune afaik there is no functional limitation on using cluster scoped publications, so i hope that's an acceptable workaround for you. //E On Mar 18, 2016 16:46, "Jon Maloy" <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote: Still weird that this starts happening now, when this issue is supposed to be remedied, and not earlier, when it wasn't. We really need that "permit overlapping publications" solution I have been preaching about. Br ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > Sent: Friday, 18 March, 2016 10:25 > To: 'Erik Hugne' > Cc: > tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> > Subject: Re: [tipc-discussion] tipc nametable update problem > > Yes I have. > There are quite a few at the same time like this: > > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1853110816, > 1952998688, 1801810542} from <1.1.1> key=1633905523 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {542000723, > 544613732, 544437616} from <1.1.1> key=167800175 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {544239474, > 1953325424, 543582572} from <1.1.1> key=1930035237 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1933189232, > 1869771885, 1634738291} from <1.1.1> key=1768843040 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1717660012, > 1701054976, 628308512} from <1.1.1> key=1869881446 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {16397, > 1073741824, 16397} from <1.1.1> key=29285 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1633943667, > 1752134260, 544367969} from <1.1.1> key=1679834144 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1869771808, > 2003986804, 1698300018} from <1.1.1> key=4294915584 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1073741824, > 65279, 4294902016} from <1.1.1> key=1073741824 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {65279, > 4294901760, 59154} from <1.1.1> key=65023 > > > From: Erik Hugne [mailto:erik.hu...@gmail.com<mailto:erik.hu...@gmail.com>] > Sent: Friday, March 18, 2016 1:48 AM > To: Rune Torgersen > Cc: > tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> > Subject: Re: [tipc-discussion] tipc nametable update problem > > > Hi Rune. > When the problem occurs, have you seen any traces like "tipc: Dropping name > table update" ? > > //E > On Mar 18, 2016 02:11, "Rune Torgersen" > <ru...@innovsys.com<mailto:ru...@innovsys.com><mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>>> > wrote: > More info. > The failing ports are all opened as TIPC_ZONE_SCOPE. > Addresses of the two computers are 1.1.1 and 1.1.2. > > If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to > update correctly. > > > -Original Message- > From: Rune Torgersen > [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com><mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>>] > Sent: Thursday, March 17, 2016 7:06 PM > To: > 'tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net><mailto:tipc-<mailto:tipc-> > discuss...@lists.sourceforge.net<mailto:discuss...@lists.sourceforge.net>>' > Subject: [tipc-discussion] tipc nametable update problem > > Hi. > > The product I work on uses TIPC for communicati
Re: [tipc-discussion] tipc nametable update problem
Yes I have. There are quite a few at the same time like this: Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {1853110816, 1952998688, 1801810542} from <1.1.1> key=1633905523 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {542000723, 544613732, 544437616} from <1.1.1> key=167800175 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {544239474, 1953325424, 543582572} from <1.1.1> key=1930035237 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {1933189232, 1869771885, 1634738291} from <1.1.1> key=1768843040 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {1717660012, 1701054976, 628308512} from <1.1.1> key=1869881446 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {16397, 1073741824, 16397} from <1.1.1> key=29285 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {1633943667, 1752134260, 544367969} from <1.1.1> key=1679834144 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {1869771808, 2003986804, 1698300018} from <1.1.1> key=4294915584 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {1073741824, 65279, 4294902016} from <1.1.1> key=1073741824 Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {65279, 4294901760, 59154} from <1.1.1> key=65023 From: Erik Hugne [mailto:erik.hu...@gmail.com] Sent: Friday, March 18, 2016 1:48 AM To: Rune Torgersen Cc: tipc-discussion@lists.sourceforge.net Subject: Re: [tipc-discussion] tipc nametable update problem Hi Rune. When the problem occurs, have you seen any traces like "tipc: Dropping name table update" ? //E On Mar 18, 2016 02:11, "Rune Torgersen" <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: More info. The failing ports are all opened as TIPC_ZONE_SCOPE. Addresses of the two computers are 1.1.1 and 1.1.2. If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to update correctly. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] Sent: Thursday, March 17, 2016 7:06 PM To: 'tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>' Subject: [tipc-discussion] tipc nametable update problem Hi. The product I work on uses TIPC for communication between different computers on a network. We've actually been using older version (1.7.7 and older ) for nearly 10 years. On a new product, we're using the latest Ubuntu server (16.04, still in beta) using kernel 4.4.0. On several occasions now, after boot, programs that open TIPC sockets during the boot process, have ports that does not show in the nametable on the other computer. This of course causes the programs to not being able to talk. If we restart the program, reopening the TIPC port, then it shows up on both sides. I know this is somewhat sparse info, but I am not sure where to start to look at this. One piece of info that might be useful, is that we kind of require the old interface naming on our interfaces, so we have turned off systemd's ethernet naming scheme, and use udev to name the devices. This should be done well before we initializer the tipc driver module and give it a netid and address and enable the bearer links. -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/tipc-discussion -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/tipc-discussion -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] tipc nametable update problem
Hi Rune. When the problem occurs, have you seen any traces like "tipc: Dropping name table update" ? //E On Mar 18, 2016 02:11, "Rune Torgersen"wrote: > More info. > The failing ports are all opened as TIPC_ZONE_SCOPE. > Addresses of the two computers are 1.1.1 and 1.1.2. > > If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to > update correctly. > > > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Thursday, March 17, 2016 7:06 PM > To: 'tipc-discussion@lists.sourceforge.net' > Subject: [tipc-discussion] tipc nametable update problem > > Hi. > > The product I work on uses TIPC for communication between different > computers on a network. We've actually been using older version (1.7.7 and > older ) for nearly 10 years. > > On a new product, we're using the latest Ubuntu server (16.04, still in > beta) using kernel 4.4.0. > > On several occasions now, after boot, programs that open TIPC sockets > during the boot process, have ports that does not show in the nametable on > the other computer. This of course causes the programs to not being able to > talk. > If we restart the program, reopening the TIPC port, then it shows up on > both sides. > > > I know this is somewhat sparse info, but I am not sure where to start to > look at this. > > One piece of info that might be useful, is that we kind of require the old > interface naming on our interfaces, so we have turned off systemd's > ethernet naming scheme, and use udev to name the devices. > > This should be done well before we initializer the tipc driver module and > give it a netid and address and enable the bearer links. > > > -- > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 > ___ > tipc-discussion mailing list > tipc-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > -- > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 > ___ > tipc-discussion mailing list > tipc-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] tipc nametable update problem
More info. The failing ports are all opened as TIPC_ZONE_SCOPE. Addresses of the two computers are 1.1.1 and 1.1.2. If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to update correctly. -Original Message- From: Rune Torgersen [mailto:ru...@innovsys.com] Sent: Thursday, March 17, 2016 7:06 PM To: 'tipc-discussion@lists.sourceforge.net' Subject: [tipc-discussion] tipc nametable update problem Hi. The product I work on uses TIPC for communication between different computers on a network. We've actually been using older version (1.7.7 and older ) for nearly 10 years. On a new product, we're using the latest Ubuntu server (16.04, still in beta) using kernel 4.4.0. On several occasions now, after boot, programs that open TIPC sockets during the boot process, have ports that does not show in the nametable on the other computer. This of course causes the programs to not being able to talk. If we restart the program, reopening the TIPC port, then it shows up on both sides. I know this is somewhat sparse info, but I am not sure where to start to look at this. One piece of info that might be useful, is that we kind of require the old interface naming on our interfaces, so we have turned off systemd's ethernet naming scheme, and use udev to name the devices. This should be done well before we initializer the tipc driver module and give it a netid and address and enable the bearer links. -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] tipc nametable update problem
Still weird that this starts happening now, when this issue is supposed to be remedied, and not earlier, when it wasn't. We really need that "permit overlapping publications" solution I have been preaching about. Br ///jon > -Original Message- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Friday, 18 March, 2016 10:25 > To: 'Erik Hugne' > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > Yes I have. > There are quite a few at the same time like this: > > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1853110816, > 1952998688, 1801810542} from <1.1.1> key=1633905523 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {542000723, > 544613732, 544437616} from <1.1.1> key=167800175 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {544239474, > 1953325424, 543582572} from <1.1.1> key=1930035237 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1933189232, > 1869771885, 1634738291} from <1.1.1> key=1768843040 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1717660012, > 1701054976, 628308512} from <1.1.1> key=1869881446 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {16397, > 1073741824, 16397} from <1.1.1> key=29285 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1633943667, > 1752134260, 544367969} from <1.1.1> key=1679834144 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1869771808, > 2003986804, 1698300018} from <1.1.1> key=4294915584 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of > {1073741824, > 65279, 4294902016} from <1.1.1> key=1073741824 > Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {65279, > 4294901760, 59154} from <1.1.1> key=65023 > > > From: Erik Hugne [mailto:erik.hu...@gmail.com] > Sent: Friday, March 18, 2016 1:48 AM > To: Rune Torgersen > Cc: tipc-discussion@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc nametable update problem > > > Hi Rune. > When the problem occurs, have you seen any traces like "tipc: Dropping name > table update" ? > > //E > On Mar 18, 2016 02:11, "Rune Torgersen" > <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: > More info. > The failing ports are all opened as TIPC_ZONE_SCOPE. > Addresses of the two computers are 1.1.1 and 1.1.2. > > If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to > update correctly. > > > -Original Message- > From: Rune Torgersen > [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > Sent: Thursday, March 17, 2016 7:06 PM > To: 'tipc-discussion@lists.sourceforge.net<mailto:tipc- > discuss...@lists.sourceforge.net>' > Subject: [tipc-discussion] tipc nametable update problem > > Hi. > > The product I work on uses TIPC for communication between different > computers on a network. We've actually been using older version (1.7.7 and > older > ) for nearly 10 years. > > On a new product, we're using the latest Ubuntu server (16.04, still in beta) > using > kernel 4.4.0. > > On several occasions now, after boot, programs that open TIPC sockets during > the boot process, have ports that does not show in the nametable on the other > computer. This of course causes the programs to not being able to talk. > If we restart the program, reopening the TIPC port, then it shows up on both > sides. > > > I know this is somewhat sparse info, but I am not sure where to start to look > at > this. > > One piece of info that might be useful, is that we kind of require the old > interface > naming on our interfaces, so we have turned off systemd's ethernet naming > scheme, and use udev to name the devices. > > This should be done well before we initializer the tipc driver module and > give it a > netid and address and enable the bearer links. > > -- > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140 > ___ > tipc-discussion mailing list > tipc-discussion@lists.sourceforge.net<mailto:tipc- > discuss...@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > --