Re: [tipc-discussion] tipc nametable update problem

2016-04-26 Thread Jon Maloy
Yes. And I issued a bug report and followed it up, so it is now also fixed in 
kernel 4.4, which will be in mainline 16.04 very soon.

BR
///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Tuesday, 26 April, 2016 13:53
> To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: GUNA (gbala...@gmail.com); Ying Xue
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> Testing latest Ubuntu ppa kernel (4.6.0rc5), looks like the issues have been 
> fixed!
> Tahnks!.
> 
> (Hopefully that hits mainline 16.04 kernel too).
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Wednesday, April 06, 2016 1:16 PM
> To: Jon Maloy; Rune Torgersen; 'Jon Maloy'; tipc-
> discuss...@lists.sourceforge.net
> Cc: GUNA (gbala...@gmail.com); Ying Xue
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> 
> 
> > -Original Message-
> > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > Sent: Wednesday, 06 April, 2016 14:07
> > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: GUNA (gbala...@gmail.com); Ying Xue
> > Subject: Re: [tipc-discussion] tipc nametable update problem
> >
> > Hi Rune,
> > I finally found what it is, - a missing buffer linearization, and it turns 
> > out to be a
> > problem that has already been solved. I checked it in last November, and is
> > present in kernel 4.5, but not in 4.4.
> > The reason I didn't realize this right away was that found and solved this 
> > as a
> > UDP-bearer specific problem, and posted the patch as such. Since UDP support
> is
> > relatively new in TIPC, I didn't realize the need to have this correction 
> > applied
> > further back.
> 
> The solution as such was in generic code, so it really is solved even for your
> Ethernet based system.
> Sorry for not being clear enough about this.
> 
> ///j
> 
> > I will create a new patch and try to get it applied on the "stable" branch.
> >
> > Regards
> > ///jon
> >
> >
> > > -Original Message-
> > > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > > Sent: Tuesday, 05 April, 2016 18:10
> > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying
> > > (ying.x...@gmail.com); Ying Xue
> > > Subject: RE: [tipc-discussion] tipc nametable update problem
> > >
> > > So after trying various things to get it to fail again, I finally got it. 
> > > Even got an
> > > corrupted update...
> > >
> > > SO I rebooted 1.1.1, and started TIPC when it was up and running.
> > > Only 1.1.1 ports show in tipc-config -nt (see log_1_1_1.txt).
> > > Then I rebooted 1.1.2 (ended up doing that twice I think).
> > >
> > > Normally I could not talk from 1.1.1 to 1.1.2, but this time that worked 
> > > fine,
> but
> > > 1.1.2 got bad updates after reboot, and 1.1.2 does NOT see the ports open 
> > > on
> > > 1.1.1.
> > > (see log_1_1_2.txt and 1_1_1.txt. Last tipc-config -nt is take within 30 
> > > second
> of
> > > each other)
> > >
> > > The port I was testing was 104,65537 and 104,131073.
> > >
> > > This was done using Ubuntu 4.4.0-15 kernel (4.4.0-15-generic)
> > >
> > > -Original Message-
> > > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > > Sent: Tuesday, April 05, 2016 11:27 AM
> > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying
> > > (ying.x...@gmail.com); Ying Xue
> > > Subject: RE: [tipc-discussion] tipc nametable update problem
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > > > Sent: Tuesday, 05 April, 2016 12:12
> > > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue
> Ying
> > > > (ying.x...@gmail.com); Ying Xue
> > > > Subject: RE: [tipc-discussion] tipc nametable update problem
> > > >
> > > > I should be able to do more testing.
> > > >
> > > > I do not know for sure that the mappings were missing or not before
> reboot.
> > > > If I had restarted applications, then the

Re: [tipc-discussion] tipc nametable update problem

2016-04-26 Thread Rune Torgersen
Testing latest Ubuntu ppa kernel (4.6.0rc5), looks like the issues have been 
fixed! 
Tahnks!.

(Hopefully that hits mainline 16.04 kernel too).

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Wednesday, April 06, 2016 1:16 PM
To: Jon Maloy; Rune Torgersen; 'Jon Maloy'; 
tipc-discussion@lists.sourceforge.net
Cc: GUNA (gbala...@gmail.com); Ying Xue
Subject: RE: [tipc-discussion] tipc nametable update problem



> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Wednesday, 06 April, 2016 14:07
> To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: GUNA (gbala...@gmail.com); Ying Xue
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> Hi Rune,
> I finally found what it is, - a missing buffer linearization, and it turns 
> out to be a
> problem that has already been solved. I checked it in last November, and is
> present in kernel 4.5, but not in 4.4.
> The reason I didn't realize this right away was that found and solved this as 
> a
> UDP-bearer specific problem, and posted the patch as such. Since UDP support 
> is
> relatively new in TIPC, I didn't realize the need to have this correction 
> applied
> further back.

The solution as such was in generic code, so it really is solved even for your 
Ethernet based system.
Sorry for not being clear enough about this.

///j

> I will create a new patch and try to get it applied on the "stable" branch.
> 
> Regards
> ///jon
> 
> 
> > -Original Message-
> > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > Sent: Tuesday, 05 April, 2016 18:10
> > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying
> > (ying.x...@gmail.com); Ying Xue
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > So after trying various things to get it to fail again, I finally got it. 
> > Even got an
> > corrupted update...
> >
> > SO I rebooted 1.1.1, and started TIPC when it was up and running.
> > Only 1.1.1 ports show in tipc-config -nt (see log_1_1_1.txt).
> > Then I rebooted 1.1.2 (ended up doing that twice I think).
> >
> > Normally I could not talk from 1.1.1 to 1.1.2, but this time that worked 
> > fine, but
> > 1.1.2 got bad updates after reboot, and 1.1.2 does NOT see the ports open on
> > 1.1.1.
> > (see log_1_1_2.txt and 1_1_1.txt. Last tipc-config -nt is take within 30 
> > second of
> > each other)
> >
> > The port I was testing was 104,65537 and 104,131073.
> >
> > This was done using Ubuntu 4.4.0-15 kernel (4.4.0-15-generic)
> >
> > -Original Message-
> > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > Sent: Tuesday, April 05, 2016 11:27 AM
> > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying
> > (ying.x...@gmail.com); Ying Xue
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> >
> >
> > > -Original Message-
> > > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > > Sent: Tuesday, 05 April, 2016 12:12
> > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying
> > > (ying.x...@gmail.com); Ying Xue
> > > Subject: RE: [tipc-discussion] tipc nametable update problem
> > >
> > > I should be able to do more testing.
> > >
> > > I do not know for sure that the mappings were missing or not before 
> > > reboot.
> > > If I had restarted applications, then the mappings would be there before
> > reboot.
> > >
> > > I do know that they are definitely missing after reboot. That is how I 
> > > first
> > > discovered it, namely by not seeing the application registration from 
> > > 1.1.2
> after
> > > reboot.
> > >
> > > Looked to me like most mappings were not present, but I'll recheck.
> > >
> > > I'll reboot both with a kernel that I know have a problem;
> > > then start wireshark on 1, restart applications on 1.1.2, and make sure 
> > > they
> can
> > > talk.
> > > print out nametable on both
> > > Then reboot 1.1.2 and see.
> > >
> > > Anything else you'd want to see (short of running diag code)?
> >
> > That sounds like a plan.  What I am most interested in right now is if it 
> > is only the
> > "bulk&quo

Re: [tipc-discussion] tipc nametable update problem

2016-04-05 Thread Jon Maloy


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Tuesday, 05 April, 2016 12:12
> To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying
> (ying.x...@gmail.com); Ying Xue
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> I should be able to do more testing.
> 
> I do not know for sure that the mappings were missing or not before reboot.
> If I had restarted applications, then the mappings would be there before 
> reboot.
> 
> I do know that they are definitely missing after reboot. That is how I first
> discovered it, namely by not seeing the application registration from 1.1.2 
> after
> reboot.
> 
> Looked to me like most mappings were not present, but I'll recheck.
> 
> I'll reboot both with a kernel that I know have a problem;
> then start wireshark on 1, restart applications on 1.1.2, and make sure they 
> can
> talk.
> print out nametable on both
> Then reboot 1.1.2 and see.
> 
> Anything else you'd want to see (short of running diag code)?

That sounds like a plan.  What I am most interested in right now is if it is 
only the "bulk" (pre-establishment) bindings that are lost, or if it is all of 
them.
If we can confirm that this is the case we will have a very specific packet  
(#2) to trace on, and it should be possible to find out what happens to it and 
its contents.

///jon


> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Monday, April 04, 2016 3:47 PM
> To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying
> (ying.x...@gmail.com); Ying Xue
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> Hi Rune,
> I don't get any further with this without more input.
> 
> First a question: were *all* bindings from 1.1.2 missing after the reboot, or 
> only
> the first 6 ones (the ones in the "bulk" publication of message #12646 (in 
> the big
> pcap file)?).
> If the latter is the case, then we know that it is the content of this 
> particular
> message that is not being applied. This message is special, because it 
> contains all
> bindings that were made on 1.1.2 prior to the link being established. This
> message is always sent with sequence #2, and we can see from the dump that it
> was received (after a couple of retransmissions) and acknowledged by 1.1.1,
> which means it was delivered (?) up to the binding table.
> 
> If the bindings were missing in 1.1.1 before the reboot, but not after (which
> seems to be contrary to what  you state) my theory may still be valid. The
> Wireshark dump does not go far enough back to see what happened to the
> original publications; only that they were missing when you tried to remove
> them. I wonder if you (or anybody else who is able to reproduce the problem)
> could still make the effort to apply our patches and see what happens. But of
> course, if you are 100% sure that the bindings were missing even after the 
> reboot
> run you sent me, then the problem must be something else, and I don't see how
> I can get further without instrumenting the code.
> 
> Regards
> ///jon
> 
> > -Original Message-
> > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > Sent: Monday, 04 April, 2016 13:23
> > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > They were not in there after the reboot, might not have been there before
> > either.
> > Only way to actually get it working was to restart whichever application 
> > has the
> > missing registration on 1.1.2.
> >
> >
> > -Original Message-
> > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > Sent: Monday, April 04, 2016 11:44 AM
> > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > Thank you Rune,
> > I think my theory was wrong. I can now see that the dropped items actually
> were
> > withdrawals, not publications, that were sent out just before the 1.1.2
> rebooted,
> > of course because the server application was being killed at that moment.
> > They were probably queued because the corresponding publications could not
> > be found in the table. Were those entries visible in the table of 1.1.1 
> &

Re: [tipc-discussion] tipc nametable update problem

2016-04-05 Thread Rune Torgersen
I should be able to do more testing.

I do not know for sure that the mappings were missing or not before reboot.
If I had restarted applications, then the mappings would be there before reboot.

I do know that they are definitely missing after reboot. That is how I first 
discovered it, namely by not seeing the application registration from 1.1.2 
after reboot.

Looked to me like most mappings were not present, but I'll recheck.

I'll reboot both with a kernel that I know have a problem;
then start wireshark on 1, restart applications on 1.1.2, and make sure they 
can talk.
print out nametable on both
Then reboot 1.1.2 and see.

Anything else you'd want to see (short of running diag code)?

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Monday, April 04, 2016 3:47 PM
To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying 
(ying.x...@gmail.com); Ying Xue
Subject: RE: [tipc-discussion] tipc nametable update problem

Hi Rune,
I don't get any further with this without more input. 

First a question: were *all* bindings from 1.1.2 missing after the reboot, or 
only the first 6 ones (the ones in the "bulk" publication of message #12646 (in 
the big pcap file)?). 
If the latter is the case, then we know that it is the content of this 
particular message that is not being applied. This message is special, because 
it contains all bindings that were made on 1.1.2 prior to the link being 
established. This message is always sent with sequence #2, and we can see from 
the dump that it was received (after a couple of retransmissions) and 
acknowledged by 1.1.1, which means it was delivered (?) up to the binding table.

If the bindings were missing in 1.1.1 before the reboot, but not after (which 
seems to be contrary to what  you state) my theory may still be valid. The 
Wireshark dump does not go far enough back to see what happened to the original 
publications; only that they were missing when you tried to remove them. I 
wonder if you (or anybody else who is able to reproduce the problem) could 
still make the effort to apply our patches and see what happens. But of course, 
if you are 100% sure that the bindings were missing even after the reboot run 
you sent me, then the problem must be something else, and I don't see how I can 
get further without instrumenting the code.

Regards
///jon

> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Monday, 04 April, 2016 13:23
> To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> They were not in there after the reboot, might not have been there before
> either.
> Only way to actually get it working was to restart whichever application has 
> the
> missing registration on 1.1.2.
> 
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Monday, April 04, 2016 11:44 AM
> To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> Thank you Rune,
> I think my theory was wrong. I can now see that the dropped items actually 
> were
> withdrawals, not publications, that were sent out just before the 1.1.2 
> rebooted,
> of course because the server application was being killed at that moment.
> They were probably queued because the corresponding publications could not
> be found in the table. Were those entries visible in the table of 1.1.1 
> before you
> rebooted? My guess is not...
> 
> ///jon
> 
> 
> > -Original Message-
> > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > Sent: Monday, 04 April, 2016 11:11
> > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > Here is the full capture.
> > (If this is too big, I'll make it available on a dropbox share).
> >
> > Reboot happened approx 21:31:48, 2016-03-30 UTC.
> >
> > -Original Message-
> > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > Sent: Monday, April 04, 2016 9:57 AM
> > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> >
> >
> > > -Original Message-
> > > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > > Sent: Monda

Re: [tipc-discussion] tipc nametable update problem

2016-04-04 Thread Rune Torgersen
They were not in there after the reboot, might not have been there before 
either.
Only way to actually get it working was to restart whichever application has 
the missing registration on 1.1.2.


-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Monday, April 04, 2016 11:44 AM
To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
Subject: RE: [tipc-discussion] tipc nametable update problem

Thank you Rune,
I think my theory was wrong. I can now see that the dropped items actually were 
withdrawals, not publications, that were sent out just before the 1.1.2 
rebooted, of course because the server application was being killed at that 
moment.
They were probably queued because the corresponding publications could not be 
found in the table. Were those entries visible in the table of 1.1.1 before you 
rebooted? My guess is not...

///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Monday, 04 April, 2016 11:11
> To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> Here is the full capture.
> (If this is too big, I'll make it available on a dropbox share).
> 
> Reboot happened approx 21:31:48, 2016-03-30 UTC.
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Monday, April 04, 2016 9:57 AM
> To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> 
> 
> > -Original Message-
> > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > Sent: Monday, 04 April, 2016 09:53
> > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > The test set up I have are two servers with SuperMicro X10DRL-i 
> > motherboards,
> > each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory.
> > I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet
> > interfaces, but only one was active in this case, and I only use one  as a 
> > bearer.
> >
> > There are other server pairs running on the same subnet with different 
> > netids.
> >
> > This particular issue happens when I reboot one of the two servers. The 
> > reboot
> > (full cold reboot) takes almost 5 minutes because of POST with 10 NICS 
> > trying to
> > do PXE boot.
> 
> My guess is that in this particular run you rebooted node 1.1.2?
> 
> If so, it doesn't contradict my theory. The dropped entries may quite well 
> have
> been lingering in  1.1.1's backlog during the five minutes it took to reboot 
> the
> peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during that 
> period.
> It is first when we try to make an additional insertion (probably the one of 
> the link
> going up) that the expired backlog items are discovered and purged.
> So, I am still very interested in what happened before the reboot, since I 
> believe
> that the dropped entries is just a late symptom of a problem that manifested
> itself much earlier.
> 
> ///jon
> 
> >
> > So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA,
> > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one
> > that does not have a problem.
> > All other I have tried (4.2, 4.4 and 4.5) have shown this problem.
> >
> > I should be able to compile a kernel and try.
> >
> > -Original Message-
> > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > Sent: Friday, April 01, 2016 7:07 PM
> > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > Hi Rune,
> > I am totally unable to reproduce your scenario, and the Wireshark dump does
> not
> > in itself show anything wrong.
> >
> > But by comparing the dropped publications with ones arriving in the 
> > messages, I
> > could quickly confirm that those are not the same, although they partially
> contain
> > the same port names.
> > Those of type 101 that are dropped has instance numbers that are not the
> same
> > as in the arriving messages (you are probably generating new ones for each
> > se

Re: [tipc-discussion] tipc nametable update problem

2016-04-04 Thread Rune Torgersen
They might not have been. 

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Monday, April 04, 2016 11:44 AM
To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
Subject: RE: [tipc-discussion] tipc nametable update problem

Thank you Rune,
I think my theory was wrong. I can now see that the dropped items actually were 
withdrawals, not publications, that were sent out just before the 1.1.2 
rebooted, of course because the server application was being killed at that 
moment.
They were probably queued because the corresponding publications could not be 
found in the table. Were those entries visible in the table of 1.1.1 before you 
rebooted? My guess is not...

///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Monday, 04 April, 2016 11:11
> To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> Here is the full capture.
> (If this is too big, I'll make it available on a dropbox share).
> 
> Reboot happened approx 21:31:48, 2016-03-30 UTC.
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Monday, April 04, 2016 9:57 AM
> To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> 
> 
> > -Original Message-
> > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > Sent: Monday, 04 April, 2016 09:53
> > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > The test set up I have are two servers with SuperMicro X10DRL-i 
> > motherboards,
> > each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory.
> > I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet
> > interfaces, but only one was active in this case, and I only use one  as a 
> > bearer.
> >
> > There are other server pairs running on the same subnet with different 
> > netids.
> >
> > This particular issue happens when I reboot one of the two servers. The 
> > reboot
> > (full cold reboot) takes almost 5 minutes because of POST with 10 NICS 
> > trying to
> > do PXE boot.
> 
> My guess is that in this particular run you rebooted node 1.1.2?
> 
> If so, it doesn't contradict my theory. The dropped entries may quite well 
> have
> been lingering in  1.1.1's backlog during the five minutes it took to reboot 
> the
> peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during that 
> period.
> It is first when we try to make an additional insertion (probably the one of 
> the link
> going up) that the expired backlog items are discovered and purged.
> So, I am still very interested in what happened before the reboot, since I 
> believe
> that the dropped entries is just a late symptom of a problem that manifested
> itself much earlier.
> 
> ///jon
> 
> >
> > So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA,
> > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one
> > that does not have a problem.
> > All other I have tried (4.2, 4.4 and 4.5) have shown this problem.
> >
> > I should be able to compile a kernel and try.
> >
> > -Original Message-
> > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > Sent: Friday, April 01, 2016 7:07 PM
> > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > Hi Rune,
> > I am totally unable to reproduce your scenario, and the Wireshark dump does
> not
> > in itself show anything wrong.
> >
> > But by comparing the dropped publications with ones arriving in the 
> > messages, I
> > could quickly confirm that those are not the same, although they partially
> contain
> > the same port names.
> > Those of type 101 that are dropped has instance numbers that are not the
> same
> > as in the arriving messages (you are probably generating new ones for each
> > session), while those which are the same (103, 105..) have different 
> > publication
> > key numbers.
> >
> > The conclusion can only be that those are l

Re: [tipc-discussion] tipc nametable update problem

2016-04-04 Thread Jon Maloy
Thank you Rune,
I think my theory was wrong. I can now see that the dropped items actually were 
withdrawals, not publications, that were sent out just before the 1.1.2 
rebooted, of course because the server application was being killed at that 
moment.
They were probably queued because the corresponding publications could not be 
found in the table. Were those entries visible in the table of 1.1.1 before you 
rebooted? My guess is not...

///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Monday, 04 April, 2016 11:11
> To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> Here is the full capture.
> (If this is too big, I'll make it available on a dropbox share).
> 
> Reboot happened approx 21:31:48, 2016-03-30 UTC.
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Monday, April 04, 2016 9:57 AM
> To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> 
> 
> > -Original Message-
> > From: Rune Torgersen [mailto:ru...@innovsys.com]
> > Sent: Monday, 04 April, 2016 09:53
> > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > The test set up I have are two servers with SuperMicro X10DRL-i 
> > motherboards,
> > each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory.
> > I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet
> > interfaces, but only one was active in this case, and I only use one  as a 
> > bearer.
> >
> > There are other server pairs running on the same subnet with different 
> > netids.
> >
> > This particular issue happens when I reboot one of the two servers. The 
> > reboot
> > (full cold reboot) takes almost 5 minutes because of POST with 10 NICS 
> > trying to
> > do PXE boot.
> 
> My guess is that in this particular run you rebooted node 1.1.2?
> 
> If so, it doesn't contradict my theory. The dropped entries may quite well 
> have
> been lingering in  1.1.1's backlog during the five minutes it took to reboot 
> the
> peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during that 
> period.
> It is first when we try to make an additional insertion (probably the one of 
> the link
> going up) that the expired backlog items are discovered and purged.
> So, I am still very interested in what happened before the reboot, since I 
> believe
> that the dropped entries is just a late symptom of a problem that manifested
> itself much earlier.
> 
> ///jon
> 
> >
> > So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA,
> > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one
> > that does not have a problem.
> > All other I have tried (4.2, 4.4 and 4.5) have shown this problem.
> >
> > I should be able to compile a kernel and try.
> >
> > -Original Message-
> > From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> > Sent: Friday, April 01, 2016 7:07 PM
> > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> > Subject: RE: [tipc-discussion] tipc nametable update problem
> >
> > Hi Rune,
> > I am totally unable to reproduce your scenario, and the Wireshark dump does
> not
> > in itself show anything wrong.
> >
> > But by comparing the dropped publications with ones arriving in the 
> > messages, I
> > could quickly confirm that those are not the same, although they partially
> contain
> > the same port names.
> > Those of type 101 that are dropped has instance numbers that are not the
> same
> > as in the arriving messages (you are probably generating new ones for each
> > session), while those which are the same (103, 105..) have different 
> > publication
> > key numbers.
> >
> > The conclusion can only be that those are leftovers from a previous session
> which
> > have not been purged when the contact with node 1.1.2 was lost. A quick 
> > check
> > of the code confirms this; entries in the name table backlog are only purged
> > based on expiration time, and not on loss of contact with their originating 
> > code,
&g

Re: [tipc-discussion] tipc nametable update problem

2016-04-04 Thread Jon Maloy


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Monday, 04 April, 2016 09:53
> To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> The test set up I have are two servers with SuperMicro X10DRL-i motherboards,
> each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory.
> I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet
> interfaces, but only one was active in this case, and I only use one  as a 
> bearer.
> 
> There are other server pairs running on the same subnet with different netids.
> 
> This particular issue happens when I reboot one of the two servers. The reboot
> (full cold reboot) takes almost 5 minutes because of POST with 10 NICS trying 
> to
> do PXE boot.

My guess is that in this particular run you rebooted node 1.1.2? 

If so, it doesn't contradict my theory. The dropped entries may quite well have 
been lingering in  1.1.1's backlog during the five minutes it took to reboot 
the peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during that 
period. It is first when we try to make an additional insertion (probably the 
one of the link going up) that the expired backlog items are discovered and 
purged. 
So, I am still very interested in what happened before the reboot, since I 
believe that the dropped entries is just a late symptom of a problem that 
manifested itself much earlier.

///jon

> 
> So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA,
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one
> that does not have a problem.
> All other I have tried (4.2, 4.4 and 4.5) have shown this problem.
> 
> I should be able to compile a kernel and try.
> 
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: Friday, April 01, 2016 7:07 PM
> To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
> Subject: RE: [tipc-discussion] tipc nametable update problem
> 
> Hi Rune,
> I am totally unable to reproduce your scenario, and the Wireshark dump does 
> not
> in itself show anything wrong.
> 
> But by comparing the dropped publications with ones arriving in the messages, 
> I
> could quickly confirm that those are not the same, although they partially 
> contain
> the same port names.
> Those of type 101 that are dropped has instance numbers that are not the same
> as in the arriving messages (you are probably generating new ones for each
> session), while those which are the same (103, 105..) have different 
> publication
> key numbers.
> 
> The conclusion can only be that those are leftovers from a previous session 
> which
> have not been purged when the contact with node 1.1.2 was lost. A quick check
> of the code confirms this; entries in the name table backlog are only purged
> based on expiration time, and not on loss of contact with their originating 
> code,
> as they should be. This is clearly a bug that somebody has to fix (Partha,
> Richard?).
> 
> Remains to understand how those entries got into the backlog in the first 
> place.
> Something must have happened in the previous session that prevented them
> from being applied. Since you never use instance sequences, overlapping
> sequences cannot be the problem. If it were a memory allocation problem this
> would be visible in the log. One possibility I see is that we have a race 
> condition
> between the purging of binding table from the pre-previous session and the
> previous one. The call to the purging function tipc_publ_notify() is done 
> outside
> any lock protection, so it is fully possible that a link that quickly goes 
> down and
> comes back may be able to deliver a new batch of publications before the 
> purging
> action is finished. This becomes particularly likely if the number of 
> publications is
> large, and we are running in a multi-VM or multi-namespace environment on the
> same host. (Can you confirm, Rune?)
> If only the interface or link is cycled, while the same application server 
> continues
> running on 1.1.2, and 1.1.1 still is intact, this is a possible scenario.
> The newly delivered publications will find a set of exactly equal 
> publications from
> the previous session in the name table, and hence be put in the backlog.
> 
> How do we resolve this? My first idea was to just run a process_backlog() on 
> the
> flank of tipc_publ_notify(). But unfortunately we first need to run a purge 
> on the
> backlog, according to the above, and this purge would be unab

Re: [tipc-discussion] tipc nametable update problem

2016-04-04 Thread Rune Torgersen
The test set up I have are two servers with SuperMicro X10DRL-i motherboards, 
each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory.
I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet 
interfaces, but only one was active in this case, and I only use one  as a 
bearer.

There are other server pairs running on the same subnet with different netids.

This particular issue happens when I reboot one of the two servers. The reboot 
(full cold reboot) takes almost 5 minutes because of POST with 10 NICS trying 
to do PXE boot.

So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA, 
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find one that 
does not have a problem.
All other I have tried (4.2, 4.4 and 4.5) have shown this problem.

I should be able to compile a kernel and try.

-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Friday, April 01, 2016 7:07 PM
To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan
Subject: RE: [tipc-discussion] tipc nametable update problem

Hi Rune,
I am totally unable to reproduce your scenario, and the Wireshark dump does not 
in itself show anything wrong.

But by comparing the dropped publications with ones arriving in the messages, I 
could quickly confirm that those are not the same, although they partially 
contain the same port names.
Those of type 101 that are dropped has instance numbers that are not the same 
as in the arriving messages (you are probably generating new ones for each 
session), while those which are the same (103, 105..) have different 
publication key numbers.

The conclusion can only be that those are leftovers from a previous session 
which have not been purged when the contact with node 1.1.2 was lost. A quick 
check of the code confirms this; entries in the name table backlog are only 
purged based on expiration time, and not on loss of contact with their 
originating code, as they should be. This is clearly a bug that somebody has to 
fix (Partha, Richard?).

Remains to understand how those entries got into the backlog in the first 
place. Something must have happened in the previous session that prevented them 
from being applied. Since you never use instance sequences, overlapping 
sequences cannot be the problem. If it were a memory allocation problem this 
would be visible in the log. One possibility I see is that we have a race 
condition between the purging of binding table from the pre-previous session 
and the previous one. The call to the purging function tipc_publ_notify() is 
done outside any lock protection, so it is fully possible that a link that 
quickly goes down and comes back may be able to deliver a new batch of 
publications before the purging action is finished. This becomes particularly 
likely if the number of publications is large, and we are running in a multi-VM 
or multi-namespace environment on the same host. (Can you confirm, Rune?)
If only the interface or link is cycled, while the same application server 
continues running on 1.1.2, and 1.1.1 still is intact, this is a possible 
scenario.
The newly delivered publications will find a set of exactly equal publications 
from the previous session in the name table, and hence be put in the backlog. 

How do we resolve this? My first idea was to just run a process_backlog() on 
the flank of tipc_publ_notify(). But unfortunately we first need to run a purge 
on the backlog, according to the above, and this purge would be unable to 
distinguish between "old" and "new" backlog items, and would have to purge them 
all.

A better, but maybe not so neat solution would be to use a similar solution as 
we do with socket wakeup. We create a pseudo message with a new message type 
PURGER, and append that to the tail of the node's namedq when we lose contact 
with a node, but this time *before* we release the node write lock. We could 
then test for this type, in addition to the PUBLICATION and WITHDRAWAL types, 
inside tipc_update_nametbl(), and call tipc_publ_notify(), still inside the 
name table lock, whenever this message type is encountered. This would 
guarantee that things happen in sequential order, since any new publications 
would end up behind the PURGER message in the node's namedq.

Who has time to implement this? 

Also, do you Rune build your own kernel, so you could try out a patch from us 
and confirm my theory before we deliver such a solution upstream?

Regards
///jon

> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Thursday, 31 March, 2016 14:56
> To: 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> Have not been able to capture a corrupted update yet, but did manage to get
> one where it dropped the updates.
> 
> Here is the 

Re: [tipc-discussion] tipc nametable update problem

2016-04-01 Thread Jon Maloy
Hi Rune,
I am totally unable to reproduce your scenario, and the Wireshark dump does not 
in itself show anything wrong.

But by comparing the dropped publications with ones arriving in the messages, I 
could quickly confirm that those are not the same, although they partially 
contain the same port names.
Those of type 101 that are dropped has instance numbers that are not the same 
as in the arriving messages (you are probably generating new ones for each 
session), while those which are the same (103, 105..) have different 
publication key numbers.

The conclusion can only be that those are leftovers from a previous session 
which have not been purged when the contact with node 1.1.2 was lost. A quick 
check of the code confirms this; entries in the name table backlog are only 
purged based on expiration time, and not on loss of contact with their 
originating code, as they should be. This is clearly a bug that somebody has to 
fix (Partha, Richard?).

Remains to understand how those entries got into the backlog in the first 
place. Something must have happened in the previous session that prevented them 
from being applied. Since you never use instance sequences, overlapping 
sequences cannot be the problem. If it were a memory allocation problem this 
would be visible in the log. One possibility I see is that we have a race 
condition between the purging of binding table from the pre-previous session 
and the previous one. The call to the purging function tipc_publ_notify() is 
done outside any lock protection, so it is fully possible that a link that 
quickly goes down and comes back may be able to deliver a new batch of 
publications before the purging action is finished. This becomes particularly 
likely if the number of publications is large, and we are running in a multi-VM 
or multi-namespace environment on the same host. (Can you confirm, Rune?)
If only the interface or link is cycled, while the same application server 
continues running on 1.1.2, and 1.1.1 still is intact, this is a possible 
scenario.
The newly delivered publications will find a set of exactly equal publications 
from the previous session in the name table, and hence be put in the backlog. 

How do we resolve this? My first idea was to just run a process_backlog() on 
the flank of tipc_publ_notify(). But unfortunately we first need to run a purge 
on the backlog, according to the above, and this purge would be unable to 
distinguish between "old" and "new" backlog items, and would have to purge them 
all.

A better, but maybe not so neat solution would be to use a similar solution as 
we do with socket wakeup. We create a pseudo message with a new message type 
PURGER, and append that to the tail of the node's namedq when we lose contact 
with a node, but this time *before* we release the node write lock. We could 
then test for this type, in addition to the PUBLICATION and WITHDRAWAL types, 
inside tipc_update_nametbl(), and call tipc_publ_notify(), still inside the 
name table lock, whenever this message type is encountered. This would 
guarantee that things happen in sequential order, since any new publications 
would end up behind the PURGER message in the node's namedq.

Who has time to implement this? 

Also, do you Rune build your own kernel, so you could try out a patch from us 
and confirm my theory before we deliver such a solution upstream?

Regards
///jon

> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Thursday, 31 March, 2016 14:56
> To: 'Jon Maloy'; tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> Have not been able to capture a corrupted update yet, but did manage to get
> one where it dropped the updates.
> 
> Here is the dmesg output (times are in CST).
> Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {103,
> 1003, 1003} from <1.1.2> key=4271114002
> Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {103, 
> 3, 3}
> from <1.1.2> key=3675117576
> Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101,
> 10, 10} from <1.1.2> key=2005280282
> Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of
> {13762562, 0, 0} from <1.1.2> key=3568185108
> Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {206, 
> 9, 9}
> from <1.1.2> key=3641103006
> Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101,
> 133398, 133398} from <1.1.2> key=2675546830
> Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101,
> 133138, 133138} from <1.1.2> key=2939408752
> Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {105, 
> 104,
> 104} from <1.1.2> key=140803529
> Mar 30 16:31:48 testserv218 kernel: Dropping name t

Re: [tipc-discussion] tipc nametable update problem

2016-03-31 Thread Rune Torgersen
Have not been able to capture a corrupted update yet, but did manage to get one 
where it dropped the updates.

Here is the dmesg output (times are in CST).
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {103, 
1003, 1003} from <1.1.2> key=4271114002
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {103, 3, 
3} from <1.1.2> key=3675117576
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, 
10, 10} from <1.1.2> key=2005280282
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of 
{13762562, 0, 0} from <1.1.2> key=3568185108
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {206, 9, 
9} from <1.1.2> key=3641103006
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, 
133398, 133398} from <1.1.2> key=2675546830
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, 
133138, 133138} from <1.1.2> key=2939408752
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {105, 
104, 104} from <1.1.2> key=140803529
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {105, 4, 
4} from <1.1.2> key=3695579549
Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) of {101, 
133386, 133386} from <1.1.2> key=808970575

Attached it the tipc packets received on 1.1.1.  (where the log is from during 
the same time period).


-Original Message-
From: Jon Maloy [mailto:ma...@donjonn.com] 
Sent: Saturday, March 26, 2016 8:51 AM
To: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Hi Rune,
I assume you are still using Ethernet/L2 a bearer, and not UDP?  UDP is 
relatively new as supported bearer, and may incur new problems.
Anyway, name table updates are never fragmented.
As far as I can see you have five bindings on each node, and that 
amounts to either one "bulk" update message with all bindings in one 
message (100 bytes in your case)
or five individual update messages of 20 bytes each. All depending on 
whether you application was started, and the bindings made before or 
after the link between the nodes are established.

To me it looks like the dropped bindings are severely corrupted, and 
that may be a starting point for our trouble shooting. Could you start 
Wireshark and have a look at the messages being exchanged when this 
happens? If you only look for NAME_DISTRIBUTOR messages, the number of 
messages to analyze should be very limited, and we can at least see if 
our bug is on the sending or the receiving side.

God påske
///jon


On 03/25/2016 12:05 PM, Rune Torgersen wrote:
> Is it possible for the update messages to be greater than 1 MTU?
>
> Because were doing a lot of video multicast, we’re turning on UDP RSS hashing 
> to get messages to differnet receive queue (via ethtool -N ethN rx-flow-hash 
> udp4 sdfn)
> Because of that, there is a kernel warning per interface, and I am curious if 
> that is what is causing this:
>
> igb :07:00.0: enabling UDP RSS: fragmented packets may arrive out of 
> order to the stack above
>
>
>
>
> From: Erik Hugne [mailto:erik.hu...@gmail.com]
> Sent: Wednesday, March 23, 2016 12:07 PM
> To: Rune Torgersen
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
>
> When an update is received, bit cannot immediately be applied to the local 
> nametable, we retain it for a few seconds in a backlog queue.
> Then for each subsequent update received (that may have cleared up the 
> conflict) we try to apply any update stored in the backlog.
> The timeout can be set with sysctl -w tipc.named_timeout=xxx
> Default is 2000ms.
>
> So clock drift does not matter.
>
> I'm guessing that the nametable updates are dropped on the sending side.
> Are there any interface renaming going on after tipc is enabled?
>
> //E
> On Mar 23, 2016 17:04, "Rune Torgersen" 
> <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
> How much clock drift between units does the nametable update allow?
>
> On one of the test units, the clock was off by about a second between them.
>
> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
> Sent: Tuesday, March 22, 2016 10:58 AM
> To: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Still having nametable update problems (Using TIPC_CLUSTER_SCOPE)
>
> Here is an except of tipc-config -nt on both systems:
> address 1.1.1:
>
> 1041025   1025   <1.1.1:3540751351> 3540751351  
> cluster
> 10465537   

Re: [tipc-discussion] tipc nametable update problem

2016-03-26 Thread Jon Maloy
Hi Rune,
I assume you are still using Ethernet/L2 a bearer, and not UDP?  UDP is 
relatively new as supported bearer, and may incur new problems.
Anyway, name table updates are never fragmented.
As far as I can see you have five bindings on each node, and that 
amounts to either one "bulk" update message with all bindings in one 
message (100 bytes in your case)
or five individual update messages of 20 bytes each. All depending on 
whether you application was started, and the bindings made before or 
after the link between the nodes are established.

To me it looks like the dropped bindings are severely corrupted, and 
that may be a starting point for our trouble shooting. Could you start 
Wireshark and have a look at the messages being exchanged when this 
happens? If you only look for NAME_DISTRIBUTOR messages, the number of 
messages to analyze should be very limited, and we can at least see if 
our bug is on the sending or the receiving side.

God påske
///jon


On 03/25/2016 12:05 PM, Rune Torgersen wrote:
> Is it possible for the update messages to be greater than 1 MTU?
>
> Because were doing a lot of video multicast, we’re turning on UDP RSS hashing 
> to get messages to differnet receive queue (via ethtool -N ethN rx-flow-hash 
> udp4 sdfn)
> Because of that, there is a kernel warning per interface, and I am curious if 
> that is what is causing this:
>
> igb :07:00.0: enabling UDP RSS: fragmented packets may arrive out of 
> order to the stack above
>
>
>
>
> From: Erik Hugne [mailto:erik.hu...@gmail.com]
> Sent: Wednesday, March 23, 2016 12:07 PM
> To: Rune Torgersen
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
>
> When an update is received, bit cannot immediately be applied to the local 
> nametable, we retain it for a few seconds in a backlog queue.
> Then for each subsequent update received (that may have cleared up the 
> conflict) we try to apply any update stored in the backlog.
> The timeout can be set with sysctl -w tipc.named_timeout=xxx
> Default is 2000ms.
>
> So clock drift does not matter.
>
> I'm guessing that the nametable updates are dropped on the sending side.
> Are there any interface renaming going on after tipc is enabled?
>
> //E
> On Mar 23, 2016 17:04, "Rune Torgersen" 
> <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
> How much clock drift between units does the nametable update allow?
>
> On one of the test units, the clock was off by about a second between them.
>
> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
> Sent: Tuesday, March 22, 2016 10:58 AM
> To: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Still having nametable update problems (Using TIPC_CLUSTER_SCOPE)
>
> Here is an except of tipc-config -nt on both systems:
> address 1.1.1:
>
> 1041025   1025   <1.1.1:3540751351> 3540751351  
> cluster
> 10465537  65537  <1.1.1:4046699456> 4046699456  
> cluster
> 104131073 131073 <1.1.2:59828181>   59828181
> cluster
> 10416777984   16777984   <1.1.1:3135589675> 3135589675  
> cluster
> 10433555200   33555200   <1.1.2:2193437365> 2193437365  
> cluster
>
> Address 1.1.2:
> 104131073 131073 <1.1.2:59828181>   59828181
> cluster
> 10433555200   33555200   <1.1.2:2193437365> 2193437365  
> cluster
>
> So in this case 1 sees all address 2 has published, while 2 is not seeing the 
> addesses from 1.
> 2 was rebooted to make this happen.
>
> Is tere a possibility I'm calling tipc-config too early, and the interface is 
> not yet up, or is this still the same roblem I saw before.
>
> There is nome dropped nametable update messages in kernel:
>
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of 
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of 
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of 
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of 
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of 
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of 
> {0, 0, 0} from <1.1.1> key=0
> Ma

Re: [tipc-discussion] tipc nametable update problem

2016-03-24 Thread Rune Torgersen
It looks to me from my dmesg dump that I'm actually crashing on line 285 in  
subscr.c.

From: Jon Maloy [jon.ma...@ericsson.com]
Sent: Thursday, March 24, 2016 1:56 PM
To: Rune Torgersen; tipc-discussion@lists.sourceforge.net
Subject: RE: [tipc-discussion] tipc nametable update problem

Hi Rune,
As far as I can see the fix is present in the 4.5.0 code (subscr.c, line 299), 
so it may be that there still is a problem.
I suspect you will have to wait until Partha is back from Easter leave to get a 
better answer to this.

Regards
///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Thursday, 24 March, 2016 10:08
> To: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> 4.5.0 kernel still gets a NULL ptr. I have kernel core dumps if anyone knows 
> what
> to look for.
> I am now trying 4.2 kernel.
>
> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Wednesday, March 23, 2016 12:48 PM
> To: 'Erik Hugne'
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Interfaces are renamed, yes, but that should all have been done before TIPC is
> loaded and configured.
>
> So I have been testing different kernels, and the nametable update problems
> only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5
> mainline.
> Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least 
> after
> reboot, no nametable issues, and all entries seem to be on both sides).
>
> Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable 
> enough (looks
> like it also might have a fix for the NULL ptr crash I saw).
>
> Another interesting thing I saw on one reboot, was that one unit got 
> completely
> invalid entries from the other.
>
> [   29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2>
> key=3210283144
> [   29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} 
> from
> <1.1.2> key=0
> [   29.916013] Dropping name table update (0) of {3260614792, 4294955122,
> 3243837576} from <1.1.2> key=2743205887
> [   29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0
> [   29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> 
> key=2184
> [   29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from
> <1.1.2> key=57427
> [   29.916025] Dropping name table update (0) of {3831105672, 4294959155,
> 442042504} from <1.1.2> key=89786504
>
>
> on 1.1.1:
> 294907136 0  49389  <1.1.2:3260614792> 3260614792  cluster
> 2560  0  <1.1.2:0>  0   
> cluster
> 4294911232 0  8418048<1.1.2:268959744>  268959744   
> cluster
> 4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922  
> cluster
> 4294928877 3260614792 4294901760 <1.1.2:0>  0   
> cluster
> 4294904576 0  38 <1.1.2:1062668424> 1062668424  
> cluster
> 2816   0  63724  <1.1.2:3260614792> 3260614792  
> cluster
> 1097732991 18669  3260614792 <1.1.2:4294949099>     4294949099  
> cluster
>
> while on 1.1.2 those addresses does not exist.
>
>
> From: Erik Hugne [mailto:erik.hu...@gmail.com]
> Sent: Wednesday, March 23, 2016 12:07 PM
> To: Rune Torgersen
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
>
> When an update is received, bit cannot immediately be applied to the local
> nametable, we retain it for a few seconds in a backlog queue.
> Then for each subsequent update received (that may have cleared up the
> conflict) we try to apply any update stored in the backlog.
> The timeout can be set with sysctl -w tipc.named_timeout=xxx
> Default is 2000ms.
>
> So clock drift does not matter.
>
> I'm guessing that the nametable updates are dropped on the sending side.
> Are there any interface renaming going on after tipc is enabled?
>
> //E
> On Mar 23, 2016 17:04, "Rune Torgersen"
> <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
> How much clock drift between units does the nametable update allow?
>
> On one of the test units, the clock was off by about a second between them.
>
> -Original Message-
> From: Rune Torgersen
> [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
> Sent: Tuesday, March 22, 2016 10:58 AM
> To: tipc-discussion@lists.sourceforge.net&l

Re: [tipc-discussion] tipc nametable update problem

2016-03-24 Thread Jon Maloy
Hi Rune,
As far as I can see the fix is present in the 4.5.0 code (subscr.c, line 299), 
so it may be that there still is a problem.
I suspect you will have to wait until Partha is back from Easter leave to get a 
better answer to this.

Regards
///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Thursday, 24 March, 2016 10:08
> To: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> 4.5.0 kernel still gets a NULL ptr. I have kernel core dumps if anyone knows 
> what
> to look for.
> I am now trying 4.2 kernel.
> 
> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Wednesday, March 23, 2016 12:48 PM
> To: 'Erik Hugne'
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> Interfaces are renamed, yes, but that should all have been done before TIPC is
> loaded and configured.
> 
> So I have been testing different kernels, and the nametable update problems
> only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5
> mainline.
> Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least 
> after
> reboot, no nametable issues, and all entries seem to be on both sides).
> 
> Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable 
> enough (looks
> like it also might have a fix for the NULL ptr crash I saw).
> 
> Another interesting thing I saw on one reboot, was that one unit got 
> completely
> invalid entries from the other.
> 
> [   29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2>
> key=3210283144
> [   29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} 
> from
> <1.1.2> key=0
> [   29.916013] Dropping name table update (0) of {3260614792, 4294955122,
> 3243837576} from <1.1.2> key=2743205887
> [   29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0
> [   29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> 
> key=2184
> [   29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from
> <1.1.2> key=57427
> [   29.916025] Dropping name table update (0) of {3831105672, 4294959155,
> 442042504} from <1.1.2> key=89786504
> 
> 
> on 1.1.1:
> 294907136 0  49389  <1.1.2:3260614792> 3260614792  cluster
> 2560  0  <1.1.2:0>  0   
> cluster
> 4294911232 0  8418048<1.1.2:268959744>  268959744   
> cluster
> 4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922  
> cluster
> 4294928877 3260614792 4294901760 <1.1.2:0>  0   
> cluster
> 4294904576 0  38 <1.1.2:1062668424> 1062668424  
> cluster
> 2816   0  63724  <1.1.2:3260614792> 3260614792  
> cluster
> 1097732991 18669  3260614792 <1.1.2:4294949099>     4294949099  
> cluster
> 
> while on 1.1.2 those addresses does not exist.
> 
> 
> From: Erik Hugne [mailto:erik.hu...@gmail.com]
> Sent: Wednesday, March 23, 2016 12:07 PM
> To: Rune Torgersen
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> 
> When an update is received, bit cannot immediately be applied to the local
> nametable, we retain it for a few seconds in a backlog queue.
> Then for each subsequent update received (that may have cleared up the
> conflict) we try to apply any update stored in the backlog.
> The timeout can be set with sysctl -w tipc.named_timeout=xxx
> Default is 2000ms.
> 
> So clock drift does not matter.
> 
> I'm guessing that the nametable updates are dropped on the sending side.
> Are there any interface renaming going on after tipc is enabled?
> 
> //E
> On Mar 23, 2016 17:04, "Rune Torgersen"
> <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
> How much clock drift between units does the nametable update allow?
> 
> On one of the test units, the clock was off by about a second between them.
> 
> -Original Message-
> From: Rune Torgersen
> [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
> Sent: Tuesday, March 22, 2016 10:58 AM
> To: tipc-discussion@lists.sourceforge.net<mailto:tipc-
> discuss...@lists.sourceforge.net>
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> Still having nametable update problems (Using TIPC_CLUSTER_SCOPE)
> 
> Here is an except of tipc-config -nt on both systems:
> address 1.1.1:
> 
> 10410

Re: [tipc-discussion] tipc nametable update problem

2016-03-24 Thread Rune Torgersen
4.5.0 kernel still gets a NULL ptr. I have kernel core dumps if anyone knows 
what to look for.
I am now trying 4.2 kernel.

-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com] 
Sent: Wednesday, March 23, 2016 12:48 PM
To: 'Erik Hugne'
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Interfaces are renamed, yes, but that should all have been done before TIPC is 
loaded and configured.

So I have been testing different kernels, and the nametable update problems 
only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5 
mainline.
Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least after 
reboot, no nametable issues, and all entries seem to be on both sides).

Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable 
enough (looks like it also might have a fix for the NULL ptr crash I saw).

Another interesting thing I saw on one reboot, was that one unit got completely 
invalid entries from the other.

[   29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2> 
key=3210283144
[   29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} 
from <1.1.2> key=0
[   29.916013] Dropping name table update (0) of {3260614792, 4294955122, 
3243837576} from <1.1.2> key=2743205887
[   29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0
[   29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=2184
[   29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from 
<1.1.2> key=57427
[   29.916025] Dropping name table update (0) of {3831105672, 4294959155, 
442042504} from <1.1.2> key=89786504


on 1.1.1:
294907136 0  49389  <1.1.2:3260614792> 3260614792  cluster
2560  0  <1.1.2:0>  0   cluster
4294911232 0  8418048<1.1.2:268959744>  268959744   cluster
4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922  cluster
4294928877 3260614792 4294901760 <1.1.2:0>  0   cluster
4294904576 0  38 <1.1.2:1062668424> 1062668424  cluster
2816   0  63724  <1.1.2:3260614792> 3260614792  cluster
1097732991 18669  3260614792 <1.1.2:4294949099> 4294949099  cluster

while on 1.1.2 those addresses does not exist.


From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: Wednesday, March 23, 2016 12:07 PM
To: Rune Torgersen
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem


When an update is received, bit cannot immediately be applied to the local 
nametable, we retain it for a few seconds in a backlog queue.
Then for each subsequent update received (that may have cleared up the 
conflict) we try to apply any update stored in the backlog.
The timeout can be set with sysctl -w tipc.named_timeout=xxx
Default is 2000ms.

So clock drift does not matter.

I'm guessing that the nametable updates are dropped on the sending side.
Are there any interface renaming going on after tipc is enabled?

//E
On Mar 23, 2016 17:04, "Rune Torgersen" 
<ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
How much clock drift between units does the nametable update allow?

On one of the test units, the clock was off by about a second between them.

-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
Sent: Tuesday, March 22, 2016 10:58 AM
To: 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
Subject: Re: [tipc-discussion] tipc nametable update problem

Still having nametable update problems (Using TIPC_CLUSTER_SCOPE)

Here is an except of tipc-config -nt on both systems:
address 1.1.1:

1041025   1025   <1.1.1:3540751351> 3540751351  cluster
10465537  65537  <1.1.1:4046699456> 4046699456  cluster
104131073 131073 <1.1.2:59828181>   59828181cluster
10416777984   16777984   <1.1.1:3135589675> 3135589675  cluster
10433555200   33555200   <1.1.2:2193437365> 2193437365  cluster

Address 1.1.2:
104131073 131073 <1.1.2:59828181>   59828181cluster
10433555200   33555200   <1.1.2:2193437365> 2193437365  cluster

So in this case 1 sees all address 2 has published, while 2 is not seeing the 
addesses from 1.
2 was rebooted to make this happen.

Is tere a possibility I'm calling tipc-config too early, and the interface is 
not yet up, or is this still the same roblem I saw before.

There is nome dropped nametable update messages in kernel:

Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 

Re: [tipc-discussion] tipc nametable update problem

2016-03-23 Thread Rune Torgersen
Logs indicate all renames done about 15 seconds before tipc starts:

Mar 23 12:37:58 testserv218 kernel: igb :04:00.1 rename3: renamed from eth1
Mar 23 12:37:58 testserv218 kernel: igb :04:00.0 rename2: renamed from eth0
Mar 23 12:37:58 testserv218 kernel: igb :04:00.2 rename4: renamed from eth2
Mar 23 12:37:58 testserv218 kernel: igb :04:00.3 rename5: renamed from eth3
Mar 23 12:37:58 testserv218 kernel: igb :08:00.0 eth1: renamed from eth5
Mar 23 12:37:58 testserv218 kernel: igb :07:00.0 eth0: renamed from eth4
Mar 23 12:37:58 testserv218 kernel: igb :04:00.1 eth3: renamed from rename3
Mar 23 12:37:58 testserv218 kernel: igb :04:00.0 eth2: renamed from rename2
Mar 23 12:37:58 testserv218 kernel: igb :04:00.2 eth4: renamed from rename4
Mar 23 12:37:58 testserv218 kernel: igb :04:00.3 eth5: renamed from rename5

Mar 23 12:38:13 testserv218 kernel: tipc: Activated (version 2.0.0)
Mar 23 12:38:13 testserv218 kernel: NET: Registered protocol family 30
Mar 23 12:38:13 testserv218 kernel: tipc: Started in single node mode
Mar 23 12:38:13 testserv218 kernel: Started in network mode
Mar 23 12:38:13 testserv218 kernel: Own node address <1.1.1>, network identity 
3013
Mar 23 12:38:13 testserv218 kernel: Enabled bearer , discovery domain 
<1.1.0>, priority 10

-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com] 
Sent: Wednesday, March 23, 2016 12:48 PM
To: 'Erik Hugne'
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Interfaces are renamed, yes, but that should all have been done before TIPC is 
loaded and configured.

So I have been testing different kernels, and the nametable update problems 
only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5 
mainline.
Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least after 
reboot, no nametable issues, and all entries seem to be on both sides).

Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable 
enough (looks like it also might have a fix for the NULL ptr crash I saw).

Another interesting thing I saw on one reboot, was that one unit got completely 
invalid entries from the other.

[   29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2> 
key=3210283144
[   29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} 
from <1.1.2> key=0
[   29.916013] Dropping name table update (0) of {3260614792, 4294955122, 
3243837576} from <1.1.2> key=2743205887
[   29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0
[   29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=2184
[   29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from 
<1.1.2> key=57427
[   29.916025] Dropping name table update (0) of {3831105672, 4294959155, 
442042504} from <1.1.2> key=89786504


on 1.1.1:
294907136 0  49389  <1.1.2:3260614792> 3260614792  cluster
2560  0  <1.1.2:0>  0   cluster
4294911232 0  8418048<1.1.2:268959744>  268959744   cluster
4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922  cluster
4294928877 3260614792 4294901760 <1.1.2:0>  0   cluster
4294904576 0  38 <1.1.2:1062668424> 1062668424  cluster
2816   0  63724  <1.1.2:3260614792> 3260614792  cluster
1097732991 18669  3260614792 <1.1.2:4294949099> 4294949099  cluster

while on 1.1.2 those addresses does not exist.


From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: Wednesday, March 23, 2016 12:07 PM
To: Rune Torgersen
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem


When an update is received, bit cannot immediately be applied to the local 
nametable, we retain it for a few seconds in a backlog queue.
Then for each subsequent update received (that may have cleared up the 
conflict) we try to apply any update stored in the backlog.
The timeout can be set with sysctl -w tipc.named_timeout=xxx
Default is 2000ms.

So clock drift does not matter.

I'm guessing that the nametable updates are dropped on the sending side.
Are there any interface renaming going on after tipc is enabled?

//E
On Mar 23, 2016 17:04, "Rune Torgersen" 
<ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
How much clock drift between units does the nametable update allow?

On one of the test units, the clock was off by about a second between them.

-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
Sent: Tuesday, March 22, 2016 10:58 AM
To: 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
Subject: Re: [tipc-discussion] tipc n

Re: [tipc-discussion] tipc nametable update problem

2016-03-23 Thread Rune Torgersen
Interfaces are renamed, yes, but that should all have been done before TIPC is 
loaded and configured.

So I have been testing different kernels, and the nametable update problems 
only seem to occur with the Ubuntu 4.4.0 kernel, which is based on the 4.4.5 
mainline.
Older kernels (3.14.1 and 4.2.0) does not seem to have problems (at least after 
reboot, no nametable issues, and all entries seem to be on both sides).

Also 4.5.0 kernel seem to work. I’ll give it a try and see if it is stable 
enough (looks like it also might have a fix for the NULL ptr crash I saw).

Another interesting thing I saw on one reboot, was that one unit got completely 
invalid entries from the other.

[   29.916004] Dropping name table update (0) of {0, 0, 0} from <1.1.2> 
key=3210283144
[   29.916010] Dropping name table update (0) of {442042504, 4294901760, 0} 
from <1.1.2> key=0
[   29.916013] Dropping name table update (0) of {3260614792, 4294955122, 
3243837576} from <1.1.2> key=2743205887
[   29.916016] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=0
[   29.916019] Dropping name table update (0) of {0, 0, 0} from <1.1.2> key=2184
[   29.916022] Dropping name table update (0) of {4294911232, 0, 29952} from 
<1.1.2> key=57427
[   29.916025] Dropping name table update (0) of {3831105672, 4294959155, 
442042504} from <1.1.2> key=89786504


on 1.1.1:
294907136 0  49389  <1.1.2:3260614792> 3260614792  cluster
2560  0  <1.1.2:0>  0   cluster
4294911232 0  8418048<1.1.2:268959744>  268959744   cluster
4294957547 3260614792 4294934691 <1.1.2:1312882922> 1312882922  cluster
4294928877 3260614792 4294901760 <1.1.2:0>  0   cluster
4294904576 0  38 <1.1.2:1062668424> 1062668424  cluster
2816   0  63724  <1.1.2:3260614792> 3260614792  cluster
1097732991 18669  3260614792 <1.1.2:4294949099> 4294949099  cluster

while on 1.1.2 those addresses does not exist.


From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: Wednesday, March 23, 2016 12:07 PM
To: Rune Torgersen
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem


When an update is received, bit cannot immediately be applied to the local 
nametable, we retain it for a few seconds in a backlog queue.
Then for each subsequent update received (that may have cleared up the 
conflict) we try to apply any update stored in the backlog.
The timeout can be set with sysctl -w tipc.named_timeout=xxx
Default is 2000ms.

So clock drift does not matter.

I'm guessing that the nametable updates are dropped on the sending side.
Are there any interface renaming going on after tipc is enabled?

//E
On Mar 23, 2016 17:04, "Rune Torgersen" 
<ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
How much clock drift between units does the nametable update allow?

On one of the test units, the clock was off by about a second between them.

-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
Sent: Tuesday, March 22, 2016 10:58 AM
To: 
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
Subject: Re: [tipc-discussion] tipc nametable update problem

Still having nametable update problems (Using TIPC_CLUSTER_SCOPE)

Here is an except of tipc-config -nt on both systems:
address 1.1.1:

1041025   1025   <1.1.1:3540751351> 3540751351  cluster
10465537  65537  <1.1.1:4046699456> 4046699456  cluster
104131073 131073 <1.1.2:59828181>   59828181cluster
10416777984   16777984   <1.1.1:3135589675> 3135589675  cluster
10433555200   33555200   <1.1.2:2193437365> 2193437365  cluster

Address 1.1.2:
104131073 131073 <1.1.2:59828181>   59828181cluster
10433555200   33555200   <1.1.2:2193437365> 2193437365  cluster

So in this case 1 sees all address 2 has published, while 2 is not seeing the 
addesses from 1.
2 was rebooted to make this happen.

Is tere a possibility I'm calling tipc-config too early, and the interface is 
not yet up, or is this still the same roblem I saw before.

There is nome dropped nametable update messages in kernel:

Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34

Re: [tipc-discussion] tipc nametable update problem

2016-03-23 Thread Erik Hugne
When an update is received, bit cannot immediately be applied to the local
nametable, we retain it for a few seconds in a backlog queue.
Then for each subsequent update received (that may have cleared up the
conflict) we try to apply any update stored in the backlog.
The timeout can be set with sysctl -w tipc.named_timeout=xxx
Default is 2000ms.

So clock drift does not matter.

I'm guessing that the nametable updates are dropped on the sending side.
Are there any interface renaming going on after tipc is enabled?

//E
On Mar 23, 2016 17:04, "Rune Torgersen" <ru...@innovsys.com> wrote:

> How much clock drift between units does the nametable update allow?
>
> On one of the test units, the clock was off by about a second between them.
>
> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Tuesday, March 22, 2016 10:58 AM
> To: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Still having nametable update problems (Using TIPC_CLUSTER_SCOPE)
>
> Here is an except of tipc-config -nt on both systems:
> address 1.1.1:
>
> 1041025   1025   <1.1.1:3540751351> 3540751351
> cluster
> 10465537  65537  <1.1.1:4046699456> 4046699456
> cluster
> 104131073 131073 <1.1.2:59828181>   59828181
> cluster
> 10416777984   16777984   <1.1.1:3135589675> 3135589675
> cluster
> 10433555200   33555200   <1.1.2:2193437365> 2193437365
> cluster
>
> Address 1.1.2:
> 104131073 131073 <1.1.2:59828181>   59828181
> cluster
> 10433555200   33555200   <1.1.2:2193437365> 2193437365
> cluster
>
> So in this case 1 sees all address 2 has published, while 2 is not seeing
> the addesses from 1.
> 2 was rebooted to make this happen.
>
> Is tere a possibility I'm calling tipc-config too early, and the interface
> is not yet up, or is this still the same roblem I saw before.
>
> There is nome dropped nametable update messages in kernel:
>
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 0} from <1.1.1> key=0
> Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of
> {0, 0, 16600} from <1.1.1> key=4294915584
>
> but they do not mention port 104.
>
> If I restart the application on 1 having 104:1025 open, it shows up on 2.
>
> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Monday, March 21, 2016 12:17 AM
> To: Jon Maloy; Erik Hugne
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Using TIPC_CLUSTER_SCOPE will work.
> This was new system bring-up, and code was ported from older system, which
> used TIPC 1.7.7 driver.
> A quick search and replace of TIPC_ZONE_SCOPE is not a bad workaround.
> 
> From: Jon Maloy [jon.ma...@ericsson.com]
> Sent: Saturday, March 19, 2016 10:57 AM
> To: Erik Hugne
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Maybe not completely trivial, but not very complex either. I know I failed
> to describe this verbally to you at one moment, but I can put it on paper,
> and you will realize it is not a big deal.
> If you or anybody else are interested I can make an effort to describe
> this next week. I don't have time to implement it myself at the moment.
>
> ///jon
>
>
> From: Erik Hugne [mailto:erik.hu...@gmail.com]
> Sent: Friday, 18 March, 2016 12:38
> To: Jon Maloy
> Subject: RE: [tipc-discussion] tipc nametable update problem
>
>
> Agree.
> But implementing a new lookup mecha

Re: [tipc-discussion] tipc nametable update problem

2016-03-23 Thread Rune Torgersen
How much clock drift between units does the nametable update allow?

On one of the test units, the clock was off by about a second between them.

-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com] 
Sent: Tuesday, March 22, 2016 10:58 AM
To: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Still having nametable update problems (Using TIPC_CLUSTER_SCOPE)

Here is an except of tipc-config -nt on both systems:
address 1.1.1:

1041025   1025   <1.1.1:3540751351> 3540751351  cluster
10465537  65537  <1.1.1:4046699456> 4046699456  cluster
104131073 131073 <1.1.2:59828181>   59828181cluster
10416777984   16777984   <1.1.1:3135589675> 3135589675  cluster
10433555200   33555200   <1.1.2:2193437365> 2193437365  cluster

Address 1.1.2:
104131073 131073 <1.1.2:59828181>   59828181cluster
10433555200   33555200   <1.1.2:2193437365> 2193437365  cluster

So in this case 1 sees all address 2 has published, while 2 is not seeing the 
addesses from 1.
2 was rebooted to make this happen.

Is tere a possibility I'm calling tipc-config too early, and the interface is 
not yet up, or is this still the same roblem I saw before.

There is nome dropped nametable update messages in kernel:

Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 16600} from <1.1.1> key=4294915584

but they do not mention port 104.

If I restart the application on 1 having 104:1025 open, it shows up on 2.

-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com] 
Sent: Monday, March 21, 2016 12:17 AM
To: Jon Maloy; Erik Hugne
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Using TIPC_CLUSTER_SCOPE will work.
This was new system bring-up, and code was ported from older system, which used 
TIPC 1.7.7 driver.
A quick search and replace of TIPC_ZONE_SCOPE is not a bad workaround.

From: Jon Maloy [jon.ma...@ericsson.com]
Sent: Saturday, March 19, 2016 10:57 AM
To: Erik Hugne
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Maybe not completely trivial, but not very complex either. I know I failed to 
describe this verbally to you at one moment, but I can put it on paper, and you 
will realize it is not a big deal.
If you or anybody else are interested I can make an effort to describe this 
next week. I don't have time to implement it myself at the moment.

///jon


From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: Friday, 18 March, 2016 12:38
To: Jon Maloy
Subject: RE: [tipc-discussion] tipc nametable update problem


Agree.
But implementing a new lookup mechanism is not trivial.. :)

@Rune afaik there is no functional limitation on using cluster scoped 
publications, so i hope that's an acceptable workaround for you.

//E
On Mar 18, 2016 16:46, "Jon Maloy" 
<jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote:
Still weird that this starts happening now, when this issue is supposed to be 
remedied, and not earlier, when it wasn't.
We really need that "permit overlapping publications"  solution I have been 
preaching about.

Br
///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
> Sent: Friday, 18 March, 2016 10:25
> To: 'Erik Hugne'
> Cc: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Yes I have.
> There are quite a few at the same time like this:
>
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1853110816,
> 19529986

Re: [tipc-discussion] tipc nametable update problem

2016-03-22 Thread Rune Torgersen
Still having nametable update problems (Using TIPC_CLUSTER_SCOPE)

Here is an except of tipc-config -nt on both systems:
address 1.1.1:

1041025   1025   <1.1.1:3540751351> 3540751351  cluster
10465537  65537  <1.1.1:4046699456> 4046699456  cluster
104131073 131073 <1.1.2:59828181>   59828181cluster
10416777984   16777984   <1.1.1:3135589675> 3135589675  cluster
10433555200   33555200   <1.1.2:2193437365> 2193437365  cluster

Address 1.1.2:
104131073 131073 <1.1.2:59828181>   59828181cluster
10433555200   33555200   <1.1.2:2193437365> 2193437365  cluster

So in this case 1 sees all address 2 has published, while 2 is not seeing the 
addesses from 1.
2 was rebooted to make this happen.

Is tere a possibility I'm calling tipc-config too early, and the interface is 
not yet up, or is this still the same roblem I saw before.

There is nome dropped nametable update messages in kernel:

Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 0} from <1.1.1> key=0
Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table update (0) of {0, 
0, 16600} from <1.1.1> key=4294915584

but they do not mention port 104.

If I restart the application on 1 having 104:1025 open, it shows up on 2.

-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com] 
Sent: Monday, March 21, 2016 12:17 AM
To: Jon Maloy; Erik Hugne
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Using TIPC_CLUSTER_SCOPE will work.
This was new system bring-up, and code was ported from older system, which used 
TIPC 1.7.7 driver.
A quick search and replace of TIPC_ZONE_SCOPE is not a bad workaround.

From: Jon Maloy [jon.ma...@ericsson.com]
Sent: Saturday, March 19, 2016 10:57 AM
To: Erik Hugne
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Maybe not completely trivial, but not very complex either. I know I failed to 
describe this verbally to you at one moment, but I can put it on paper, and you 
will realize it is not a big deal.
If you or anybody else are interested I can make an effort to describe this 
next week. I don't have time to implement it myself at the moment.

///jon


From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: Friday, 18 March, 2016 12:38
To: Jon Maloy
Subject: RE: [tipc-discussion] tipc nametable update problem


Agree.
But implementing a new lookup mechanism is not trivial.. :)

@Rune afaik there is no functional limitation on using cluster scoped 
publications, so i hope that's an acceptable workaround for you.

//E
On Mar 18, 2016 16:46, "Jon Maloy" 
<jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote:
Still weird that this starts happening now, when this issue is supposed to be 
remedied, and not earlier, when it wasn't.
We really need that "permit overlapping publications"  solution I have been 
preaching about.

Br
///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
> Sent: Friday, 18 March, 2016 10:25
> To: 'Erik Hugne'
> Cc: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Yes I have.
> There are quite a few at the same time like this:
>
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1853110816,
> 1952998688, 1801810542} from <1.1.1> key=1633905523
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {542000723,
> 544613732, 544437616} from <1.1.1> key=167800175
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {544239474,
> 1953325424, 543582572} from <1.1.1> key=1930035237

Re: [tipc-discussion] tipc nametable update problem

2016-03-20 Thread Rune Torgersen
Using TIPC_CLUSTER_SCOPE will work.
This was new system bring-up, and code was ported from older system, which used 
TIPC 1.7.7 driver.
A quick search and replace of TIPC_ZONE_SCOPE is not a bad workaround.

From: Jon Maloy [jon.ma...@ericsson.com]
Sent: Saturday, March 19, 2016 10:57 AM
To: Erik Hugne
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem

Maybe not completely trivial, but not very complex either. I know I failed to 
describe this verbally to you at one moment, but I can put it on paper, and you 
will realize it is not a big deal.
If you or anybody else are interested I can make an effort to describe this 
next week. I don’t have time to implement it myself at the moment.

///jon


From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: Friday, 18 March, 2016 12:38
To: Jon Maloy
Subject: RE: [tipc-discussion] tipc nametable update problem


Agree.
But implementing a new lookup mechanism is not trivial.. :)

@Rune afaik there is no functional limitation on using cluster scoped 
publications, so i hope that's an acceptable workaround for you.

//E
On Mar 18, 2016 16:46, "Jon Maloy" 
<jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>> wrote:
Still weird that this starts happening now, when this issue is supposed to be 
remedied, and not earlier, when it wasn't.
We really need that "permit overlapping publications"  solution I have been 
preaching about.

Br
///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
> Sent: Friday, 18 March, 2016 10:25
> To: 'Erik Hugne'
> Cc: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
> Yes I have.
> There are quite a few at the same time like this:
>
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1853110816,
> 1952998688, 1801810542} from <1.1.1> key=1633905523
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {542000723,
> 544613732, 544437616} from <1.1.1> key=167800175
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {544239474,
> 1953325424, 543582572} from <1.1.1> key=1930035237
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1933189232,
> 1869771885, 1634738291} from <1.1.1> key=1768843040
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1717660012,
> 1701054976, 628308512} from <1.1.1> key=1869881446
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {16397,
> 1073741824, 16397} from <1.1.1> key=29285
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1633943667,
> 1752134260, 544367969} from <1.1.1> key=1679834144
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1869771808,
> 2003986804, 1698300018} from <1.1.1> key=4294915584
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1073741824,
> 65279, 4294902016} from <1.1.1> key=1073741824
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {65279,
> 4294901760, 59154} from <1.1.1> key=65023
>
>
> From: Erik Hugne [mailto:erik.hu...@gmail.com<mailto:erik.hu...@gmail.com>]
> Sent: Friday, March 18, 2016 1:48 AM
> To: Rune Torgersen
> Cc: 
> tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
> Subject: Re: [tipc-discussion] tipc nametable update problem
>
>
> Hi Rune.
> When the problem occurs, have you seen any traces like "tipc: Dropping name
> table update" ?
>
> //E
> On Mar 18, 2016 02:11, "Rune Torgersen"
> <ru...@innovsys.com<mailto:ru...@innovsys.com><mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>>>
>  wrote:
> More info.
> The failing ports are all opened as TIPC_ZONE_SCOPE.
> Addresses of the two computers are 1.1.1 and 1.1.2.
>
> If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to
> update correctly.
>
>
> -Original Message-
> From: Rune Torgersen
> [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com><mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>>]
> Sent: Thursday, March 17, 2016 7:06 PM
> To: 
> 'tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net><mailto:tipc-<mailto:tipc->
> discuss...@lists.sourceforge.net<mailto:discuss...@lists.sourceforge.net>>'
> Subject: [tipc-discussion] tipc nametable update problem
>
> Hi.
>
> The product I work on uses TIPC for communicati

Re: [tipc-discussion] tipc nametable update problem

2016-03-19 Thread Rune Torgersen
Yes I have.
There are quite a few at the same time like this:

Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
{1853110816, 1952998688, 1801810542} from <1.1.1> key=1633905523
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {542000723, 
544613732, 544437616} from <1.1.1> key=167800175
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {544239474, 
1953325424, 543582572} from <1.1.1> key=1930035237
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
{1933189232, 1869771885, 1634738291} from <1.1.1> key=1768843040
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
{1717660012, 1701054976, 628308512} from <1.1.1> key=1869881446
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {16397, 
1073741824, 16397} from <1.1.1> key=29285
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
{1633943667, 1752134260, 544367969} from <1.1.1> key=1679834144
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
{1869771808, 2003986804, 1698300018} from <1.1.1> key=4294915584
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
{1073741824, 65279, 4294902016} from <1.1.1> key=1073741824
Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {65279, 
4294901760, 59154} from <1.1.1> key=65023


From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: Friday, March 18, 2016 1:48 AM
To: Rune Torgersen
Cc: tipc-discussion@lists.sourceforge.net
Subject: Re: [tipc-discussion] tipc nametable update problem


Hi Rune.
When the problem occurs, have you seen any traces like "tipc: Dropping name 
table update" ?

//E
On Mar 18, 2016 02:11, "Rune Torgersen" 
<ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
More info.
The failing ports are all opened as TIPC_ZONE_SCOPE.
Addresses of the two computers are 1.1.1 and 1.1.2.

If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to update 
correctly.


-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
Sent: Thursday, March 17, 2016 7:06 PM
To: 
'tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>'
Subject: [tipc-discussion] tipc nametable update problem

Hi.

The product I work on uses TIPC for communication between different computers 
on a network. We've actually been using older version (1.7.7 and older ) for 
nearly 10 years.

On a new product, we're using the latest Ubuntu server (16.04, still in beta) 
using kernel 4.4.0.

On several occasions now, after boot, programs that open TIPC sockets during 
the boot process, have ports that does not show in the nametable on the other 
computer. This of course causes the programs to not being able to talk.
If we restart the program, reopening the TIPC port, then it shows up on both 
sides.


I know this is somewhat sparse info, but I am not sure where to start to look 
at this.

One piece of info that might be useful, is that we kind of require the old 
interface naming on  our interfaces, so we have turned off systemd's ethernet 
naming scheme, and use udev to name the devices.

This should be done well before we initializer the tipc driver  module and give 
it a netid and address and enable the bearer links.

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net<mailto:tipc-discussion@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/tipc-discussion
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] tipc nametable update problem

2016-03-19 Thread Erik Hugne
Hi Rune.
When the problem occurs, have you seen any traces like "tipc: Dropping name
table update" ?

//E
On Mar 18, 2016 02:11, "Rune Torgersen"  wrote:

> More info.
> The failing ports are all opened as TIPC_ZONE_SCOPE.
> Addresses of the two computers are 1.1.1 and 1.1.2.
>
> If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to
> update correctly.
>
>
> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Thursday, March 17, 2016 7:06 PM
> To: 'tipc-discussion@lists.sourceforge.net'
> Subject: [tipc-discussion] tipc nametable update problem
>
> Hi.
>
> The product I work on uses TIPC for communication between different
> computers on a network. We've actually been using older version (1.7.7 and
> older ) for nearly 10 years.
>
> On a new product, we're using the latest Ubuntu server (16.04, still in
> beta) using kernel 4.4.0.
>
> On several occasions now, after boot, programs that open TIPC sockets
> during the boot process, have ports that does not show in the nametable on
> the other computer. This of course causes the programs to not being able to
> talk.
> If we restart the program, reopening the TIPC port, then it shows up on
> both sides.
>
>
> I know this is somewhat sparse info, but I am not sure where to start to
> look at this.
>
> One piece of info that might be useful, is that we kind of require the old
> interface naming on  our interfaces, so we have turned off systemd's
> ethernet naming scheme, and use udev to name the devices.
>
> This should be done well before we initializer the tipc driver  module and
> give it a netid and address and enable the bearer links.
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] tipc nametable update problem

2016-03-18 Thread Rune Torgersen
More info.
The failing ports are all opened as TIPC_ZONE_SCOPE.
Addresses of the two computers are 1.1.1 and 1.1.2.

If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to update 
correctly.


-Original Message-
From: Rune Torgersen [mailto:ru...@innovsys.com] 
Sent: Thursday, March 17, 2016 7:06 PM
To: 'tipc-discussion@lists.sourceforge.net'
Subject: [tipc-discussion] tipc nametable update problem

Hi.

The product I work on uses TIPC for communication between different computers 
on a network. We've actually been using older version (1.7.7 and older ) for 
nearly 10 years.

On a new product, we're using the latest Ubuntu server (16.04, still in beta) 
using kernel 4.4.0.

On several occasions now, after boot, programs that open TIPC sockets during 
the boot process, have ports that does not show in the nametable on the other 
computer. This of course causes the programs to not being able to talk.
If we restart the program, reopening the TIPC port, then it shows up on both 
sides.


I know this is somewhat sparse info, but I am not sure where to start to look 
at this.

One piece of info that might be useful, is that we kind of require the old 
interface naming on  our interfaces, so we have turned off systemd's ethernet 
naming scheme, and use udev to name the devices.

This should be done well before we initializer the tipc driver  module and give 
it a netid and address and enable the bearer links.

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] tipc nametable update problem

2016-03-18 Thread Jon Maloy
Still weird that this starts happening now, when this issue is supposed to be 
remedied, and not earlier, when it wasn't.
We really need that "permit overlapping publications"  solution I have been 
preaching about.

Br
///jon


> -Original Message-
> From: Rune Torgersen [mailto:ru...@innovsys.com]
> Sent: Friday, 18 March, 2016 10:25
> To: 'Erik Hugne'
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> Yes I have.
> There are quite a few at the same time like this:
> 
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1853110816,
> 1952998688, 1801810542} from <1.1.1> key=1633905523
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {542000723,
> 544613732, 544437616} from <1.1.1> key=167800175
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {544239474,
> 1953325424, 543582572} from <1.1.1> key=1930035237
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1933189232,
> 1869771885, 1634738291} from <1.1.1> key=1768843040
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1717660012,
> 1701054976, 628308512} from <1.1.1> key=1869881446
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {16397,
> 1073741824, 16397} from <1.1.1> key=29285
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1633943667,
> 1752134260, 544367969} from <1.1.1> key=1679834144
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1869771808,
> 2003986804, 1698300018} from <1.1.1> key=4294915584
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of 
> {1073741824,
> 65279, 4294902016} from <1.1.1> key=1073741824
> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) of {65279,
> 4294901760, 59154} from <1.1.1> key=65023
> 
> 
> From: Erik Hugne [mailto:erik.hu...@gmail.com]
> Sent: Friday, March 18, 2016 1:48 AM
> To: Rune Torgersen
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc nametable update problem
> 
> 
> Hi Rune.
> When the problem occurs, have you seen any traces like "tipc: Dropping name
> table update" ?
> 
> //E
> On Mar 18, 2016 02:11, "Rune Torgersen"
> <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote:
> More info.
> The failing ports are all opened as TIPC_ZONE_SCOPE.
> Addresses of the two computers are 1.1.1 and 1.1.2.
> 
> If I change the open param to TIPC_CLUSTER_SCOPE, the nametable seems to
> update correctly.
> 
> 
> -Original Message-
> From: Rune Torgersen
> [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>]
> Sent: Thursday, March 17, 2016 7:06 PM
> To: 'tipc-discussion@lists.sourceforge.net<mailto:tipc-
> discuss...@lists.sourceforge.net>'
> Subject: [tipc-discussion] tipc nametable update problem
> 
> Hi.
> 
> The product I work on uses TIPC for communication between different
> computers on a network. We've actually been using older version (1.7.7 and 
> older
> ) for nearly 10 years.
> 
> On a new product, we're using the latest Ubuntu server (16.04, still in beta) 
> using
> kernel 4.4.0.
> 
> On several occasions now, after boot, programs that open TIPC sockets during
> the boot process, have ports that does not show in the nametable on the other
> computer. This of course causes the programs to not being able to talk.
> If we restart the program, reopening the TIPC port, then it shows up on both
> sides.
> 
> 
> I know this is somewhat sparse info, but I am not sure where to start to look 
> at
> this.
> 
> One piece of info that might be useful, is that we kind of require the old 
> interface
> naming on  our interfaces, so we have turned off systemd's ethernet naming
> scheme, and use udev to name the devices.
> 
> This should be done well before we initializer the tipc driver  module and 
> give it a
> netid and address and enable the bearer links.
> 
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net<mailto:tipc-
> discuss...@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> 
> --