Ok, this smells like a buggy implementation of OSPF on the dot-com
vendor-side.
Upgrade of firmware on the both Cisco Nexus 3000-series to
NXOS: version 7.0(3)I4(4) fixed my problem with
ospf stuck in EXCHG/EXSTA.

Setup involving Dell switch shows following then ospfd on the obsd side is run
with ’-dvvv’:

spf_calc: area 0.0.0.0 calculated
nbr_fsm: event HELLO_RECEIVED resulted in action START_INACTIVITY_TIMER and
changing state for neighbor ID 10.4.255.26 from DOWN to INIT
nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for
neighbor ID 10.4.255.26 from INIT to 2-WAY
if_fsm: event NEIGHBORCHANGE resulted in action NOTHING and changing state for
interface trunk1 from WAIT to WAIT
nbr_fsm: event HELLO_RECEIVED resulted in action START_INACTIVITY_TIMER and
changing state for neighbor ID 10.4.255.29 from DOWN to INIT
nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing state for
neighbor ID 10.4.255.29 from INIT to 2-WAY
if_fsm: event NEIGHBORCHANGE resulted in action NOTHING and changing state for
interface trunk1 from WAIT to WAIT
recv_db_description: neighbor ID 10.4.255.29: packet ignored in state 2-WAY
if_act_elect: interface trunk1 old dr none new dr 10.4.255.29, old bdr none
new bdr 10.4.255.26
nbr_fsm: event ADJ_OK resulted in action EVAL and changing state for neighbor
ID 10.4.255.29 from 2-WAY to EXSTA
nbr_fsm: event ADJ_OK resulted in action EVAL and changing state for neighbor
ID 10.4.255.26 from 2-WAY to EXSTA
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface trunk1
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface trunk1
if_fsm: event BACKUPSEEN resulted in action ELECT and changing state for
interface trunk1 from WAIT to OTHER
nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state
for neighbor ID 10.4.255.29 from EXSTA to SNAP
nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing
state for neighbor ID 10.4.255.29 from SNAP to EXCHG
recv_db_description: dupe from neighbor ID 10.4.255.29
recv_db_description: neighbor ID 10.4.255.29: seq num mismatch, bad flags
nbr_fsm: event SEQ_NUM_MISMATCH resulted in action RESET_DD and changing state
for neighbor ID 10.4.255.29 from EXCHG to EXSTA
nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and changing state
for neighbor ID 10.4.255.29 from EXSTA to SNAP
nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and changing
state for neighbor ID 10.4.255.29 from SNAP to EXCHG
recv_db_description: dupe from neighbor ID 10.4.255.29
recv_db_description: neighbor ID 10.4.255.29: seq num mismatch, bad flags

Eg:
recv_db_description: dupe from neighbor ID 10.4.255.29
recv_db_description: neighbor ID 10.4.255.29: seq num mismatch, bad flags



> 14 feb. 2017 kl. 11:56 skrev Maxim Bourmistrov <m...@alumni.chalmers.se>:
>
>
>> 14 feb. 2017 kl. 11:33 skrev Jeremie Courreges-Anglas <j...@wxcvbn.org
<mailto:j...@wxcvbn.org>>:
>>
>> I have no idea why you're getting this kind of error, but maybe you
>> can simplify your setup a bit more.  Can you reproduce when using just
>> em1 (out of the trunk) instead of trunk1?  Just bnx1?
>
> I’ll try to modd this setup.
>
> Any how, I see almost exactly the same behavior with another setup
> involving  Cisco Nexus 3000-series.
> Similarities in those two is - trunk used in both locations.
> However reboot does not solve problem with Nexus.
>
> [fw1]-[11:33:02]# ospfctl sh nei
> ID              Pri State        DeadTime Address         Iface     Uptime
> 10.6.255.1      1   2-WAY/OTHER  00:00:38 10.6.255.1      trunk1    -
> 10.6.255.28     1   2-WAY/OTHER  00:00:35 10.6.255.28     trunk1    -
> 10.6.255.2      1   2-WAY/OTHER  00:00:38 10.6.255.2      trunk1    -
> 10.6.255.30     1   EXSTA/DR     00:00:31 10.6.255.30     trunk1    -
> 10.6.255.29     1   FULL/BCKUP   00:00:35 10.6.255.29     trunk1
01:24:41
>
>
> [fw2]-[11:45:00]# ospfctl sh nei
> ID              Pri State        DeadTime Address         Iface     Uptime
> 10.6.255.1      1   2-WAY/OTHER  00:00:37 10.6.255.1      trunk1    -
> 10.6.255.27     1   2-WAY/OTHER  00:00:37 10.6.255.27     trunk1    -
> 10.6.255.2      1   2-WAY/OTHER  00:00:37 10.6.255.2      trunk1    -
> 10.6.255.30     1   EXCHG/DR     00:00:39 10.6.255.30     trunk1    -
> 10.6.255.29     1   EXSTA/BCKUP  00:00:31 10.6.255.29     trunk1    -
>
> fw1/fw2 - openbsd 6.0-stable
> fw1 local IP on trunk1 - 10.6.255.27
> fw2 local IP on trunk1 - 10.6.255.28
>
> 10.6.255.{1,2} - openbsd 5.9-stable VMs
>
> 10.6.255.{29,30} - two Nexus with vPC (MLAG)
>
> fw1/fw2 connected to both switches and forming vPC (vPC on top of LAPC. LACP
required for vPC to work).
> Both are identical hardware wise as well as configuration wise.
>
> trunk1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
>         lladdr a0:36:9f:37:d3:60
>         description: VLAN990
>         index 8 priority 0 llprio 3
>         trunk: trunkproto lacp
>         trunk id: [(8000,a0:36:9f:37:d3:60,4045,0000,0000),
>                  (7F9B,00:23:04:ee:be:01,802E,0000,0000)]
>                 trunkport ix1 active,collecting,distributing
>                 trunkport ix2 active,collecting,distributing
>         groups: trunk
>         media: Ethernet autoselect
>         status: active
>         inet 10.6.255.28 netmask 0xffffffe0 broadcast 10.6.255.31
>
>
> Sometimes fw1/fw2 get connected to both switches. Sometimes not. Sometimes
to only one, sometimes to none.
>
> [fw2]-[11:52:00]# tcpdump -n -i trunk1 proto ospf
> tcpdump: listening on trunk1, link-type EN10MB
> 11:52:09.039855 10.6.255.1 > 224.0.0.5: OSPFv2-hello  64: rtrid 10.6.255.1
backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1]
> 11:52:09.039901 10.6.255.28 > 224.0.0.5: OSPFv2-hello  64: rtrid 10.6.255.28
backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1]
> 11:52:09.039981 10.6.255.27 > 224.0.0.5: OSPFv2-hello  64: rtrid 10.6.255.27
backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1]
> 11:52:09.040108 10.6.255.2 > 224.0.0.5: OSPFv2-hello  64: rtrid 10.6.255.2
backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1]
> 11:52:09.463707 10.6.255.30 > 10.6.255.28: OSPFv2-dd  32: rtrid 10.6.255.30
backbone E I/M/MS mtu 1500 S 3522476A [tos 0xc0] [ttl 1]
> 11:52:09.463798 10.6.255.28 > 10.6.255.30: OSPFv2-dd  32: rtrid 10.6.255.28
backbone E I/M/MS mtu 1500 S 352265AF [tos 0xc0] [ttl 1]
> 11:52:09.955800 10.6.255.29 > 10.6.255.28: OSPFv2-dd  32: rtrid 10.6.255.29
backbone E I/M/MS mtu 1500 S 84F508D [tos 0xc0] [ttl 1]
> 11:52:09.955838 10.6.255.28 > 10.6.255.29: OSPFv2-dd  32: rtrid 10.6.255.28
backbone E I/M/MS mtu 1500 S 84F932C [tos 0xc0] [ttl 1]
> 11:52:10.832978 10.6.255.2 > 224.0.0.6: OSPFv2-ls_upd  64: rtrid 10.6.255.2
backbone [tos 0xc0] [ttl 1]
> 11:52:11.278971 10.6.255.1 > 224.0.0.6: OSPFv2-ls_upd  72: rtrid 10.6.255.1
backbone [tos 0xc0] [ttl 1]
> 11:52:11.560311 10.6.255.30 > 224.0.0.5: OSPFv2-hello  64: rtrid 10.6.255.30
backbone dr 10.6.255.30 bdr 10.6.255.29 [tos 0xc0] [ttl 1]
> 11:52:11.596931 10.6.255.30 > 224.0.0.5: OSPFv2-ls_ack  64: rtrid
10.6.255.30 backbone [tos 0xc0] [ttl 1]
> 11:52:14.475690 10.6.255.28 > 10.6.255.30: OSPFv2-dd  32: rtrid 10.6.255.28
backbone E I/M/MS mtu 1500 S 352265AF [tos 0xc0] [ttl 1]
> 11:52:14.730459 10.6.255.30 > 10.6.255.28: OSPFv2-dd  32: rtrid 10.6.255.30
backbone E I/M/MS mtu 1500 S 3522476A [tos 0xc0] [ttl 1]
> 11:52:14.730613 10.6.255.28 > 10.6.255.30: OSPFv2-dd  132: rtrid 10.6.255.28
backbone E M mtu 1500 S 3522476A [tos 0xc0] [ttl 1]
> 11:52:14.965713 10.6.255.28 > 10.6.255.29: OSPFv2-dd  32: rtrid 10.6.255.28
backbone E I/M/MS mtu 1500 S 84F932C [tos 0xc0] [ttl 1]
> 11:52:15.019118 10.6.255.29 > 10.6.255.28: OSPFv2-dd  32: rtrid 10.6.255.29
backbone E I/M/MS mtu 1500 S 84F508D [tos 0xc0] [ttl 1]
> 11:52:15.019252 10.6.255.28 > 10.6.255.29: OSPFv2-dd  132: rtrid 10.6.255.28
backbone E M mtu 1500 S 84F508D [tos 0xc0] [ttl 1]
> 11:52:16.287215 10.6.255.1 > 224.0.0.6: OSPFv2-ls_upd  72: rtrid 10.6.255.1
backbone [tos 0xc0] [ttl 1]
> ^C
> 68 packets received by filter
> 0 packets dropped by kernel
>
> [fw2]-[11:53:09]# tail -10 /var/log/daemon
> Feb 14 11:52:20 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID
10.6.255.30: seq num mismatch, bad flags
> Feb 14 11:52:20 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID
10.6.255.29: seq num mismatch, bad flags
> Feb 14 11:52:31 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID
10.6.255.30: seq num mismatch, bad flags
> Feb 14 11:52:31 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID
10.6.255.29: seq num mismatch, bad flags
> Feb 14 11:52:42 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID
10.6.255.30: seq num mismatch, bad flags
> Feb 14 11:52:43 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID
10.6.255.29: seq num mismatch, bad flags
> Feb 14 11:52:53 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID
10.6.255.29: seq num mismatch, bad flags
> Feb 14 11:52:54 prdfwl0002 ospfd[32106]: recv_db_description: neighbor ID
10.6.255.30: seq num mismatch, bad flags
>
>
> Any clues?

Reply via email to