Re: Bizarre arp entry corruption

2017-03-10 Thread Sebastian Benoit
Joe Holden(m...@m.jwh.me.uk) on 2017.03.09 13:41:26 +:
> On 09/03/2017 11:51, Martin Pieuchot wrote:
> >On 07/03/17(Tue) 19:38, Joe Holden wrote:
> >>On 12/12/2016 16:55, Joe Holden wrote:
> >>>On 12/12/2016 10:27, Martin Pieuchot wrote:
> On 11/12/16(Sun) 00:50, Joe Holden wrote:
> >On 10/12/2016 08:43, Mihai Popescu wrote:
> seeing some bizarre behaviour on one box, on one specific interface:
> >>
> >>Hello,
> >>
> >>This looks like some stupid TV game, where contesters are given some
> >>clues from time to time and they have to guess what is the real shit.
> >>
> >>Do post your FULL dmesg and configurations for network if you really
> >>want someone to even think at your issue. Isn't that obvious?
> >>
> >>Bye!
> >>
> >
> >Appreciate the useless response (but still better than nothing!), the
> >affected box has since been reverted to older snapshot and thus no more
> >debugging can be done - someone else will have to do it.
> 
> I'd appreciate to see the output of 'netstat -rnf inet' when it is
> relevant.  Without that information it's hard to understand.
> 
> But there's a bug somewhere, it has to be fixed.
> 
> >Not that dmesg is even relevant since it is a userland bug not a kernel
> >problem but anyway:
> 
> It's a kernel problem.
> 
> >>>I'll see if I can recreate it but I'm not holding my breath - it only
> >>>breaks once BGP loaded the table which leads me to thing it is actually
> >>>bgpd that is updating the llinfo with bogus info and even though I have
> >>>a feed in my lab it doesn't do the same thing.
> >>>
> >>Ok so, inadvertantly recreated this (pretty much exactly the same) issue 
> >>on
> >>a lab/test setup:
> >>
> >>For the purposes of debug, ignore the fact that the interfaces are tap
> >>interfaces, they're still emulated ethernet...
> >>
> >>Wall of text incoming, various info...
> >>
> >>box#1:
> >>
> >>tap1: flags=8843 mtu 1500
> >>lladdr fe:e1:ba:d1:be:f3
> >>index 7 priority 0 llprio 3
> >>groups: tap
> >>status: active
> >>inet 172.20.230.72 netmask 0xfffe
> >>
> >>box#2:
> >>
> >>tap1: flags=8843 mtu 1500
> >>lladdr fe:e1:ba:d1:cf:92
> >>index 7 priority 0 llprio 3
> >>groups: tap
> >>status: active
> >>inet 172.20.230.73 netmask 0xfffe
> >>
> >>All is fine after starting ospfd, but as soon as I start bgpd, box#2 shows
> >>the following:
> >>
> >>Host Ethernet AddressNetif Expire 
> >>Flags
> >>172.20.230.7200:00:00:00:20:12   ? 12m30s
> >>
> >># route -n get 172.20.230.72
> >>   route to: 172.20.230.72
> >>destination: 172.20.230.72
> >>   mask: 255.255.255.255
> >>  interface: tap1
> >> if address: 172.20.230.73
> >>   priority: 3 ()
> >>  flags: 
> >> use   mtuexpire
> >>  20 0   702
> >>
> >>flags destination  gateway  lpref   med aspath origin
> >>IS*>  172.20.230.72/31 172.20.230.64  200 0 i
> >>
> >>.64 is the loopback on one of its connected boxes that doesn't have broken
> >>entries
> >>
> >>tcpdump looks ok, afterwards:
> >>
> >>19:14:23.723876 arp who-has 172.20.230.72 tell 172.20.230.73
> >>19:14:23.901883 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
> >>19:14:24.022948 arp who-has 172.20.230.72 tell 172.20.230.73
> >>19:14:24.201095 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
> >>
> >>but the correct entry is never installed, after I delete the broken arp
> >>entry it never readds a new one.
> >>
> >>This only happens with redist connected as far as I can tell, but bgpd
> >>probably shouldn't be able to mangle arp entries and prevent the correct 
> >>one
> >>being added.
> >
> >Here's the fix.
> >
> >Index: net/rtsock.c
> >===
> >RCS file: /cvs/src/sys/net/rtsock.c,v
> >retrieving revision 1.232
> >diff -u -p -r1.232 rtsock.c
> >--- net/rtsock.c 7 Mar 2017 09:23:27 -   1.232
> >+++ net/rtsock.c 8 Mar 2017 16:06:22 -
> >@@ -895,10 +895,22 @@ rtm_output(struct rt_msghdr *rtm, struct
> > }
> > }
> > change:
> >-if (info->rti_info[RTAX_GATEWAY] != NULL && (error =
> >-rt_setgate(rt, info->rti_info[RTAX_GATEWAY],
> >-tableid)))
> >-break;
> >+if (info->rti_info[RTAX_GATEWAY] != NULL) {
> >+/*
> >+ * When updating the gateway, make sure it's
> >+ * valid.
> >+ */
> >+if (!newgate && rt->rt_gateway->sa_family !=
> >+info->rti_info[RTAX_GATEWAY]->sa_family) 
> >{
> >+error = EINVAL;
>

Re: Bizarre arp entry corruption

2017-03-09 Thread Joe Holden

On 09/03/2017 11:51, Martin Pieuchot wrote:

On 07/03/17(Tue) 19:38, Joe Holden wrote:

On 12/12/2016 16:55, Joe Holden wrote:

On 12/12/2016 10:27, Martin Pieuchot wrote:

On 11/12/16(Sun) 00:50, Joe Holden wrote:

On 10/12/2016 08:43, Mihai Popescu wrote:

seeing some bizarre behaviour on one box, on one specific interface:


Hello,

This looks like some stupid TV game, where contesters are given some
clues from time to time and they have to guess what is the real shit.

Do post your FULL dmesg and configurations for network if you really
want someone to even think at your issue. Isn't that obvious?

Bye!



Appreciate the useless response (but still better than nothing!), the
affected box has since been reverted to older snapshot and thus no more
debugging can be done - someone else will have to do it.


I'd appreciate to see the output of 'netstat -rnf inet' when it is
relevant.  Without that information it's hard to understand.

But there's a bug somewhere, it has to be fixed.


Not that dmesg is even relevant since it is a userland bug not a kernel
problem but anyway:


It's a kernel problem.


I'll see if I can recreate it but I'm not holding my breath - it only
breaks once BGP loaded the table which leads me to thing it is actually
bgpd that is updating the llinfo with bogus info and even though I have
a feed in my lab it doesn't do the same thing.


Ok so, inadvertantly recreated this (pretty much exactly the same) issue on
a lab/test setup:

For the purposes of debug, ignore the fact that the interfaces are tap
interfaces, they're still emulated ethernet...

Wall of text incoming, various info...

box#1:

tap1: flags=8843 mtu 1500
lladdr fe:e1:ba:d1:be:f3
index 7 priority 0 llprio 3
groups: tap
status: active
inet 172.20.230.72 netmask 0xfffe

box#2:

tap1: flags=8843 mtu 1500
lladdr fe:e1:ba:d1:cf:92
index 7 priority 0 llprio 3
groups: tap
status: active
inet 172.20.230.73 netmask 0xfffe

All is fine after starting ospfd, but as soon as I start bgpd, box#2 shows
the following:

Host Ethernet AddressNetif Expire Flags
172.20.230.7200:00:00:00:20:12   ? 12m30s

# route -n get 172.20.230.72
   route to: 172.20.230.72
destination: 172.20.230.72
   mask: 255.255.255.255
  interface: tap1
 if address: 172.20.230.73
   priority: 3 ()
  flags: 
 use   mtuexpire
  20 0   702

flags destination  gateway  lpref   med aspath origin
IS*>  172.20.230.72/31 172.20.230.64  200 0 i

.64 is the loopback on one of its connected boxes that doesn't have broken
entries

tcpdump looks ok, afterwards:

19:14:23.723876 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:23.901883 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
19:14:24.022948 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:24.201095 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3

but the correct entry is never installed, after I delete the broken arp
entry it never readds a new one.

This only happens with redist connected as far as I can tell, but bgpd
probably shouldn't be able to mangle arp entries and prevent the correct one
being added.


Here's the fix.

Index: net/rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.232
diff -u -p -r1.232 rtsock.c
--- net/rtsock.c7 Mar 2017 09:23:27 -   1.232
+++ net/rtsock.c8 Mar 2017 16:06:22 -
@@ -895,10 +895,22 @@ rtm_output(struct rt_msghdr *rtm, struct
}
}
 change:
-   if (info->rti_info[RTAX_GATEWAY] != NULL && (error =
-   rt_setgate(rt, info->rti_info[RTAX_GATEWAY],
-   tableid)))
-   break;
+   if (info->rti_info[RTAX_GATEWAY] != NULL) {
+   /*
+* When updating the gateway, make sure it's
+* valid.
+*/
+   if (!newgate && rt->rt_gateway->sa_family !=
+   info->rti_info[RTAX_GATEWAY]->sa_family) {
+   error = EINVAL;
+   break;
+   }
+
+   error = rt_setgate(rt,
+   info->rti_info[RTAX_GATEWAY], tableid);
+   if (error)
+   break;
+   }
 #ifdef MPLS
if ((rtm->rtm_flags & RTF_MPLS) &&
info->rti_info[RTAX_SRC] != NULL) {

Looking good - have tried to break it since and it's fine, thanks for 
your help!


Will this make it into 6.1?



Re: Bizarre arp entry corruption

2017-03-09 Thread Martin Pieuchot
On 07/03/17(Tue) 19:38, Joe Holden wrote:
> On 12/12/2016 16:55, Joe Holden wrote:
> > On 12/12/2016 10:27, Martin Pieuchot wrote:
> > > On 11/12/16(Sun) 00:50, Joe Holden wrote:
> > > > On 10/12/2016 08:43, Mihai Popescu wrote:
> > > > > > > seeing some bizarre behaviour on one box, on one specific 
> > > > > > > interface:
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > This looks like some stupid TV game, where contesters are given some
> > > > > clues from time to time and they have to guess what is the real shit.
> > > > > 
> > > > > Do post your FULL dmesg and configurations for network if you really
> > > > > want someone to even think at your issue. Isn't that obvious?
> > > > > 
> > > > > Bye!
> > > > > 
> > > > 
> > > > Appreciate the useless response (but still better than nothing!), the
> > > > affected box has since been reverted to older snapshot and thus no more
> > > > debugging can be done - someone else will have to do it.
> > > 
> > > I'd appreciate to see the output of 'netstat -rnf inet' when it is
> > > relevant.  Without that information it's hard to understand.
> > > 
> > > But there's a bug somewhere, it has to be fixed.
> > > 
> > > > Not that dmesg is even relevant since it is a userland bug not a kernel
> > > > problem but anyway:
> > > 
> > > It's a kernel problem.
> > > 
> > I'll see if I can recreate it but I'm not holding my breath - it only
> > breaks once BGP loaded the table which leads me to thing it is actually
> > bgpd that is updating the llinfo with bogus info and even though I have
> > a feed in my lab it doesn't do the same thing.
> > 
> Ok so, inadvertantly recreated this (pretty much exactly the same) issue on
> a lab/test setup:
> 
> For the purposes of debug, ignore the fact that the interfaces are tap
> interfaces, they're still emulated ethernet...
> 
> Wall of text incoming, various info...
> 
> box#1:
> 
> tap1: flags=8843 mtu 1500
> lladdr fe:e1:ba:d1:be:f3
> index 7 priority 0 llprio 3
> groups: tap
> status: active
> inet 172.20.230.72 netmask 0xfffe
> 
> box#2:
> 
> tap1: flags=8843 mtu 1500
> lladdr fe:e1:ba:d1:cf:92
> index 7 priority 0 llprio 3
> groups: tap
> status: active
> inet 172.20.230.73 netmask 0xfffe
> 
> All is fine after starting ospfd, but as soon as I start bgpd, box#2 shows
> the following:
> 
> Host Ethernet AddressNetif Expire Flags
> 172.20.230.7200:00:00:00:20:12   ? 12m30s
> 
> # route -n get 172.20.230.72
>route to: 172.20.230.72
> destination: 172.20.230.72
>mask: 255.255.255.255
>   interface: tap1
>  if address: 172.20.230.73
>priority: 3 ()
>   flags: 
>  use   mtuexpire
>   20 0   702
> 
> flags destination  gateway  lpref   med aspath origin
> IS*>  172.20.230.72/31 172.20.230.64  200 0 i
> 
> .64 is the loopback on one of its connected boxes that doesn't have broken
> entries
> 
> tcpdump looks ok, afterwards:
> 
> 19:14:23.723876 arp who-has 172.20.230.72 tell 172.20.230.73
> 19:14:23.901883 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
> 19:14:24.022948 arp who-has 172.20.230.72 tell 172.20.230.73
> 19:14:24.201095 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
> 
> but the correct entry is never installed, after I delete the broken arp
> entry it never readds a new one.
> 
> This only happens with redist connected as far as I can tell, but bgpd
> probably shouldn't be able to mangle arp entries and prevent the correct one
> being added.

Here's the fix.

Index: net/rtsock.c
===
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.232
diff -u -p -r1.232 rtsock.c
--- net/rtsock.c7 Mar 2017 09:23:27 -   1.232
+++ net/rtsock.c8 Mar 2017 16:06:22 -
@@ -895,10 +895,22 @@ rtm_output(struct rt_msghdr *rtm, struct
}
}
 change:
-   if (info->rti_info[RTAX_GATEWAY] != NULL && (error =
-   rt_setgate(rt, info->rti_info[RTAX_GATEWAY],
-   tableid)))
-   break;
+   if (info->rti_info[RTAX_GATEWAY] != NULL) {
+   /*
+* When updating the gateway, make sure it's
+* valid.
+*/
+   if (!newgate && rt->rt_gateway->sa_family !=
+   info->rti_info[RTAX_GATEWAY]->sa_family) {
+   error = EINVAL;
+   break;
+   }
+
+   error = rt_setgate(rt,
+   info->rti_info[RTAX_GATEWAY], tableid);
+   

Re: Bizarre arp entry corruption

2017-03-07 Thread Joe Holden

On 12/12/2016 16:55, Joe Holden wrote:

On 12/12/2016 10:27, Martin Pieuchot wrote:

On 11/12/16(Sun) 00:50, Joe Holden wrote:

On 10/12/2016 08:43, Mihai Popescu wrote:

seeing some bizarre behaviour on one box, on one specific interface:


Hello,

This looks like some stupid TV game, where contesters are given some
clues from time to time and they have to guess what is the real shit.

Do post your FULL dmesg and configurations for network if you really
want someone to even think at your issue. Isn't that obvious?

Bye!



Appreciate the useless response (but still better than nothing!), the
affected box has since been reverted to older snapshot and thus no more
debugging can be done - someone else will have to do it.


I'd appreciate to see the output of 'netstat -rnf inet' when it is
relevant.  Without that information it's hard to understand.

But there's a bug somewhere, it has to be fixed.


Not that dmesg is even relevant since it is a userland bug not a kernel
problem but anyway:


It's a kernel problem.


I'll see if I can recreate it but I'm not holding my breath - it only
breaks once BGP loaded the table which leads me to thing it is actually
bgpd that is updating the llinfo with bogus info and even though I have
a feed in my lab it doesn't do the same thing.

Ok so, inadvertantly recreated this (pretty much exactly the same) issue 
on a lab/test setup:


For the purposes of debug, ignore the fact that the interfaces are tap 
interfaces, they're still emulated ethernet...


Wall of text incoming, various info...

box#1:

tap1: flags=8843 mtu 1500
lladdr fe:e1:ba:d1:be:f3
index 7 priority 0 llprio 3
groups: tap
status: active
inet 172.20.230.72 netmask 0xfffe

box#2:

tap1: flags=8843 mtu 1500
lladdr fe:e1:ba:d1:cf:92
index 7 priority 0 llprio 3
groups: tap
status: active
inet 172.20.230.73 netmask 0xfffe

All is fine after starting ospfd, but as soon as I start bgpd, box#2 
shows the following:


Host Ethernet AddressNetif Expire 
Flags

172.20.230.7200:00:00:00:20:12   ? 12m30s

# route -n get 172.20.230.72
   route to: 172.20.230.72
destination: 172.20.230.72
   mask: 255.255.255.255
  interface: tap1
 if address: 172.20.230.73
   priority: 3 ()
  flags: 
 use   mtuexpire
  20 0   702

flags destination  gateway  lpref   med aspath origin
IS*>  172.20.230.72/31 172.20.230.64  200 0 i

.64 is the loopback on one of its connected boxes that doesn't have 
broken entries


tcpdump looks ok, afterwards:

19:14:23.723876 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:23.901883 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
19:14:24.022948 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:24.201095 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3

but the correct entry is never installed, after I delete the broken arp 
entry it never readds a new one.


This only happens with redist connected as far as I can tell, but bgpd 
probably shouldn't be able to mangle arp entries and prevent the correct 
one being added.


If someone thinks they can diag/fix it then hit me up off-list and I can 
fire over ssh details.


Thanks



Re: Bizarre arp entry corruption

2016-12-12 Thread Joe Holden

On 12/12/2016 10:27, Martin Pieuchot wrote:

On 11/12/16(Sun) 00:50, Joe Holden wrote:

On 10/12/2016 08:43, Mihai Popescu wrote:

seeing some bizarre behaviour on one box, on one specific interface:


Hello,

This looks like some stupid TV game, where contesters are given some
clues from time to time and they have to guess what is the real shit.

Do post your FULL dmesg and configurations for network if you really
want someone to even think at your issue. Isn't that obvious?

Bye!



Appreciate the useless response (but still better than nothing!), the
affected box has since been reverted to older snapshot and thus no more
debugging can be done - someone else will have to do it.


I'd appreciate to see the output of 'netstat -rnf inet' when it is
relevant.  Without that information it's hard to understand.

But there's a bug somewhere, it has to be fixed.


Not that dmesg is even relevant since it is a userland bug not a kernel
problem but anyway:


It's a kernel problem.

I'll see if I can recreate it but I'm not holding my breath - it only 
breaks once BGP loaded the table which leads me to thing it is actually 
bgpd that is updating the llinfo with bogus info and even though I have 
a feed in my lab it doesn't do the same thing.




Re: Bizarre arp entry corruption

2016-12-12 Thread Martin Pieuchot
On 11/12/16(Sun) 00:50, Joe Holden wrote:
> On 10/12/2016 08:43, Mihai Popescu wrote:
> > > > seeing some bizarre behaviour on one box, on one specific interface:
> > 
> > Hello,
> > 
> > This looks like some stupid TV game, where contesters are given some
> > clues from time to time and they have to guess what is the real shit.
> > 
> > Do post your FULL dmesg and configurations for network if you really
> > want someone to even think at your issue. Isn't that obvious?
> > 
> > Bye!
> > 
> 
> Appreciate the useless response (but still better than nothing!), the
> affected box has since been reverted to older snapshot and thus no more
> debugging can be done - someone else will have to do it.

I'd appreciate to see the output of 'netstat -rnf inet' when it is
relevant.  Without that information it's hard to understand.

But there's a bug somewhere, it has to be fixed.

> Not that dmesg is even relevant since it is a userland bug not a kernel
> problem but anyway:

It's a kernel problem.



Re: Bizarre arp entry corruption

2016-12-10 Thread Joe Holden

On 10/12/2016 08:43, Mihai Popescu wrote:

seeing some bizarre behaviour on one box, on one specific interface:


Hello,

This looks like some stupid TV game, where contesters are given some
clues from time to time and they have to guess what is the real shit.

Do post your FULL dmesg and configurations for network if you really
want someone to even think at your issue. Isn't that obvious?

Bye!



Appreciate the useless response (but still better than nothing!), the 
affected box has since been reverted to older snapshot and thus no more 
debugging can be done - someone else will have to do it.


Not that dmesg is even relevant since it is a userland bug not a kernel 
problem but anyway:


OpenBSD 6.0-current (GENERIC.MP) #19: Wed Dec  7 12:07:13 MST 2016
bu...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4273471488 (4075MB)
avail mem = 4139397120 (3947MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x9d000 (74 entries)
bios0: vendor American Megatrends Inc. version "1ADQW068" date 11/16/2010
bios0: Sun Microsystems SUN FIRE X4150
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S5
acpi0: tables DSDT FACP APIC SPCR MCFG SSDT OEMB HPET EINJ BERT ERST HEST
acpi0: wakeup devices SPE4(S1) SPE2(S1) SPE1(S1) P8PC(S1) P0P1(S1) 
UAR1(S1) P0P5(S1) P0P6(S1) P0P7(S1) NPE4(S1) NPE5(S1) NPE6(S1) NPE7(S1) 
USB0(S1) USB1(S1) USB2(S1) [...]

acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5450 @ 3.00GHz, 4189.89 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE,LONG,LAHF,PERF,SENSOR

cpu0: 6MB 64b/line 16-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 7 var ranges, 88 fixed ranges
cpu0: apic clock running at 332MHz
cpu0: mwait min=64, max=64, C-substates=0.2.2.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5450 @ 3.00GHz, 2992.51 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE,LONG,LAHF,PERF,SENSOR

cpu1: 6MB 64b/line 16-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5450 @ 3.00GHz, 2992.51 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE,LONG,LAHF,PERF,SENSOR

cpu2: 6MB 64b/line 16-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU E5450 @ 3.00GHz, 2992.52 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE,LONG,LAHF,PERF,SENSOR

cpu3: 6MB 64b/line 16-way L2 cache
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins
ioapic1 at mainbus0: apid 5 pa 0xfec8, version 20, 24 pins
acpimcfg0 at acpi0 addr 0xe000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (NPES)
acpiprt2 at acpi0: bus 2 (SPE4)
acpiprt3 at acpi0: bus -1 (SPE2)
acpiprt4 at acpi0: bus 3 (SPE1)
acpiprt5 at acpi0: bus 4 (P8PC)
acpiprt6 at acpi0: bus 15 (P0P1)
acpiprt7 at acpi0: bus -1 (P0P5)
acpiprt8 at acpi0: bus -1 (P0P6)
acpiprt9 at acpi0: bus -1 (P0P7)
acpiprt10 at acpi0: bus 7 (NPE4)
acpiprt11 at acpi0: bus 11 (NPE5)
acpiprt12 at acpi0: bus 12 (NPE6)
acpiprt13 at acpi0: bus 13 (NPE7)
acpiprt14 at acpi0: bus 14 (P0P4)
acpiprt15 at acpi0: bus -1 (BR1E)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpicpu2 at acpi0: C1(@1 halt!)
acpicpu3 at acpi0: C1(@1 halt!)
"PNP0501" at acpi0 not configured
"PNP0501" at acpi0 not configured
acpibtn0 at acpi0: PWRB
"IPI0001" at acpi0 not configured
ipmi at mainbus0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 5000P Host" rev 0xb1
ppb0 at pci0 dev 2 function 0 "Intel 5000 PCIE" rev 0xb1
pci1 at ppb0 bus 1
ppb1 at pci1 dev 0 function 0 "Intel 6321ESB PCIE" rev 0x01
pci2 at ppb1 bus 2
ppb2 at pci2 dev 0 function 0 "Intel 6321ESB PCIE" rev 0x01
pci3 at ppb2 bus 3
ppb3 at pci2 dev 2 function 0 "Intel 6321ESB PCIE" rev 0x01
pci4 at ppb3 bus 4
em0 at pci4 dev 0 function 0 "Intel 80003ES2" rev 0x01: msi, address 
00:23:8b:57:b4:9e
em1 at pci4 dev 0 function 1 "Intel 80003ES2" rev 0x01: msi, address 
00:23:8b:57:b4:9f

ppb4 at pci1 dev 0 function 3 "Intel 6321ESB PCIE-PCIX" rev 0x01
pci5 at ppb4 bus 5
ppb5 at pci0 dev 3 function 0 "Intel 5000 PCIE" rev 0xb1
pci6 at ppb5 bus 6

Re: Bizarre arp entry corruption

2016-12-10 Thread Mihai Popescu
>> seeing some bizarre behaviour on one box, on one specific interface:

Hello,

This looks like some stupid TV game, where contesters are given some
clues from time to time and they have to guess what is the real shit.

Do post your FULL dmesg and configurations for network if you really
want someone to even think at your issue. Isn't that obvious?

Bye!



Re: Bizarre arp entry corruption

2016-12-09 Thread Joe Holden

On 08/12/2016 14:35, Joe Holden wrote:

On 08/12/2016 13:56, Joe Holden wrote:

Hi guys,

I've just updated a couple of boxes to the Dec 7th snapshot and I'm
seeing some bizarre behaviour on one box, on one specific interface:

The box in question is an OSPF and BGP speaker, and the following
happens when booted:

After OSPF and BGP tables load, a couple of minutes later the following
appear:

Dec  8 06:33:03 edge-pe-2 /bsd: arp_rtrequest: bad gateway value: em0
Dec  8 06:33:03 edge-pe-2 last message repeated 2 times
Dec  8 06:33:04 edge-pe-2 /bsd: arpresolve: X.X.X.X: incorrect arp
information

Then some seconds later:

Dec  8 06:41:41 edge-pe-2 /bsd: arpresolve: unresolved and rt_expire == 0

At this point the arp entry for the neighbour in question has been
updated so that the lladdr is all zeros and the interface is simply '?'
according to arp -n.

The box it is paired with that has a pretty much identical config
doesn't exhibit the same problem and this only occurs on the single em0
interface (the box has about 6 active in total, mix of em and ix).


I should clarify that this isn't CARP, but rather the box it is directly
connected to.


OpenBSD 6.0-current (GENERIC.MP) #19: Wed Dec  7 12:07:13 MST 2016
bu...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP


I don't see any odd behaviour on the wire, according to pcap the who-has
and associated reply is seen once as expected with the correct lladdr,
but at some point it gets overwritten with the above.

Previous kernel was about 2 months old which leaves a large number of
commits to check through - I can't see anything that might cause this
from a quick look though so I was hoping someone might have an idea.

For now i've had to add a static arp entry with permanent to prevent it
misbehaving but that has stopped working at least once so far.

I also have limited debug ability as the box is part of a live network
and obviously it causes disruption, and I can't recreate it in a lab
with identical configurations.

Any pointers appreciated!

Cheers


Actually looks like it breaks when BGP comes up, a route -nvd get  
looks ok, but what else should I be checking?


After it breaks it doesn't seem to want to do any arp resolution on the 
interface until it I do down/up...




Re: Bizarre arp entry corruption

2016-12-08 Thread Joe Holden

On 08/12/2016 13:56, Joe Holden wrote:

Hi guys,

I've just updated a couple of boxes to the Dec 7th snapshot and I'm
seeing some bizarre behaviour on one box, on one specific interface:

The box in question is an OSPF and BGP speaker, and the following
happens when booted:

After OSPF and BGP tables load, a couple of minutes later the following
appear:

Dec  8 06:33:03 edge-pe-2 /bsd: arp_rtrequest: bad gateway value: em0
Dec  8 06:33:03 edge-pe-2 last message repeated 2 times
Dec  8 06:33:04 edge-pe-2 /bsd: arpresolve: X.X.X.X: incorrect arp
information

Then some seconds later:

Dec  8 06:41:41 edge-pe-2 /bsd: arpresolve: unresolved and rt_expire == 0

At this point the arp entry for the neighbour in question has been
updated so that the lladdr is all zeros and the interface is simply '?'
according to arp -n.

The box it is paired with that has a pretty much identical config
doesn't exhibit the same problem and this only occurs on the single em0
interface (the box has about 6 active in total, mix of em and ix).

I should clarify that this isn't CARP, but rather the box it is directly 
connected to.



OpenBSD 6.0-current (GENERIC.MP) #19: Wed Dec  7 12:07:13 MST 2016
bu...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP


I don't see any odd behaviour on the wire, according to pcap the who-has
and associated reply is seen once as expected with the correct lladdr,
but at some point it gets overwritten with the above.

Previous kernel was about 2 months old which leaves a large number of
commits to check through - I can't see anything that might cause this
from a quick look though so I was hoping someone might have an idea.

For now i've had to add a static arp entry with permanent to prevent it
misbehaving but that has stopped working at least once so far.

I also have limited debug ability as the box is part of a live network
and obviously it causes disruption, and I can't recreate it in a lab
with identical configurations.

Any pointers appreciated!

Cheers




Bizarre arp entry corruption

2016-12-08 Thread Joe Holden

Hi guys,

I've just updated a couple of boxes to the Dec 7th snapshot and I'm 
seeing some bizarre behaviour on one box, on one specific interface:


The box in question is an OSPF and BGP speaker, and the following 
happens when booted:


After OSPF and BGP tables load, a couple of minutes later the following 
appear:


Dec  8 06:33:03 edge-pe-2 /bsd: arp_rtrequest: bad gateway value: em0
Dec  8 06:33:03 edge-pe-2 last message repeated 2 times
Dec  8 06:33:04 edge-pe-2 /bsd: arpresolve: X.X.X.X: incorrect arp 
information


Then some seconds later:

Dec  8 06:41:41 edge-pe-2 /bsd: arpresolve: unresolved and rt_expire == 0

At this point the arp entry for the neighbour in question has been 
updated so that the lladdr is all zeros and the interface is simply '?' 
according to arp -n.


The box it is paired with that has a pretty much identical config 
doesn't exhibit the same problem and this only occurs on the single em0 
interface (the box has about 6 active in total, mix of em and ix).


OpenBSD 6.0-current (GENERIC.MP) #19: Wed Dec  7 12:07:13 MST 2016
bu...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP


I don't see any odd behaviour on the wire, according to pcap the who-has 
and associated reply is seen once as expected with the correct lladdr, 
but at some point it gets overwritten with the above.


Previous kernel was about 2 months old which leaves a large number of 
commits to check through - I can't see anything that might cause this 
from a quick look though so I was hoping someone might have an idea.


For now i've had to add a static arp entry with permanent to prevent it 
misbehaving but that has stopped working at least once so far.


I also have limited debug ability as the box is part of a live network 
and obviously it causes disruption, and I can't recreate it in a lab 
with identical configurations.


Any pointers appreciated!

Cheers