Re: OpenOSPFD crashes when using mpls traffic-eng on Cisco

2012-04-04 Thread Claudio Jeker
On Wed, Apr 04, 2012 at 07:01:14AM -0500, Chris Wopat wrote:
> > From: Claudio Jeker 
> >
> > Thanks for the log and tcpdumps. It seems you're the first person to try
> > opaque LSA against ospfd. Can you give the following diff a spin?
> > I think this will solve the problems.
> 
> Claudio,
> 
> Thanks for the patch. I've compiled this in a lab and indeed things
> are indeed stable.
> 
> lab# ospfctl show neigh | grep  FULL
> 1.0.0.80200 FULL/DR  00:00:37 10.1.1.80   em0   18:41:01
> 1.0.0.72100 FULL/BCKUP   00:00:30 10.1.1.72   em0   18:41:0
> 
> Out of curiosity, why is the default to terminate instead of ignore
> the invalid LSA?

Invalid LSA should not make it into the LSDB and therefor not into the SPF
calculation. The problem was, that I added the opaque LSA support in opsfd
whithout any way to test them correctly (my bad) and forgot that having
them inside the LSDB will cause the SPF calculation to run into those
nodes when recalculating even though they're not referenced by any other
node. I guess we could ignore these nodes but at the same time it is an
indication of a bigger problem and that should be fixed. So in the end the
fatals are there to generate bug reports in case something unexpected
happens.
-- 
:wq Claudio



Re: OpenOSPFD crashes when using mpls traffic-eng on Cisco

2012-04-04 Thread Chris Wopat
> From: Claudio Jeker 
>
> Thanks for the log and tcpdumps. It seems you're the first person to try
> opaque LSA against ospfd. Can you give the following diff a spin?
> I think this will solve the problems.

Claudio,

Thanks for the patch. I've compiled this in a lab and indeed things
are indeed stable.

lab# ospfctl show neigh | grep  FULL
1.0.0.80200 FULL/DR  00:00:37 10.1.1.80   em0   18:41:01
1.0.0.72100 FULL/BCKUP   00:00:30 10.1.1.72   em0   18:41:0

Out of curiosity, why is the default to terminate instead of ignore
the invalid LSA?



Re: OpenOSPFD crashes when using mpls traffic-eng on Cisco

2012-03-30 Thread Claudio Jeker
On Sun, Mar 25, 2012 at 08:58:53PM -0500, Chris Wopat wrote:
> Claudio and crew,
> 
> When you enable OSPF-TE (http://tools.ietf.org/html/rfc3630) on a
> Cisco router, OpenOSPFD crashes with "Invalid LSA type".  Assuming you
> have a functional setup, adding this (the last line) will recreate:
> 
> router ospf 1
>  mpls traffic-eng router-id Loopback0
>  mpls traffic-eng area 0
> 
> 
> OpenBSD Log:
> 
> Mar 25 19:54:27 openbsd-lab ospfd[223]: fatal in rde: rt_calc: invalid LSA 
> type
> Mar 25 19:54:27 openbsd-lab ospfd[12417]: lost child: route decision
> engine exited
> 

Thanks for the log and tcpdumps. It seems you're the first person to try
opaque LSA against ospfd. Can you give the following diff a spin?
I think this will solve the problems.

-- 
:wq Claudio

Index: rde_spf.c
===
RCS file: /cvs/src/usr.sbin/ospfd/rde_spf.c,v
retrieving revision 1.73
diff -u -p -r1.73 rde_spf.c
--- rde_spf.c   24 May 2011 20:21:51 -  1.73
+++ rde_spf.c   30 Mar 2012 20:24:42 -
@@ -262,6 +262,9 @@ rt_calc(struct vertex *v, struct area *a
}
 
break;
+   case LSA_TYPE_AREA_OPAQ:
+   /* nothing to calculate */
+   break;
default:
/* as-external LSA are stored in a different tree */
fatalx("rt_calc: invalid LSA type");
@@ -338,6 +341,9 @@ asext_calc(struct vertex *v)
rt_update(addr, mask2prefixlen(v->lsa->data.asext.mask),
&v->nexthop, v->type, v->cost, cost2, a, adv_rtr, type,
DT_NET, 0, ntohl(v->lsa->data.asext.ext_tag));
+   break;
+   case LSA_TYPE_AS_OPAQ:
+   /* nothing to calculate */
break;
default:
fatalx("asext_calc: invalid LSA type");



OpenOSPFD crashes when using mpls traffic-eng on Cisco

2012-03-25 Thread Chris Wopat
Claudio and crew,

When you enable OSPF-TE (http://tools.ietf.org/html/rfc3630) on a
Cisco router, OpenOSPFD crashes with "Invalid LSA type".  Assuming you
have a functional setup, adding this (the last line) will recreate:

router ospf 1
 mpls traffic-eng router-id Loopback0
 mpls traffic-eng area 0


OpenBSD Log:

Mar 25 19:54:27 openbsd-lab ospfd[223]: fatal in rde: rt_calc: invalid LSA type
Mar 25 19:54:27 openbsd-lab ospfd[12417]: lost child: route decision
engine exited


More info:
openbsd-lab# ospfd -d
WARNING: IP forwarding NOT enabled, running as stub router
startup
rde_asext_get: 1.0.0.10/32 is net LSA
rde_asext_get: 10.1.1.0/24 is net LSA
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
if_fsm: event UP resulted in action START and changing state for
interface em0 from DOWN to WAIT
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
if_fsm: event UP resulted in action START and changing state for
interface lo1 from DOWN to LOOP
nbr_fsm: event HELLO_RECEIVED resulted in action
START_INACTIVITY_TIMER and changing state for neighbor ID 1.0.0.72
from DOWN to INIT
nbr_fsm: event 2_WAY_RECEIVED resulted in action EVAL and changing
state for neighbor ID 1.0.0.72 from INIT to 2-WAY
if_act_elect: interface em0 old dr none new dr 10.1.1.72, old bdr none
new bdr 10.1.1.10
nbr_fsm: event ADJ_OK resulted in action EVAL and changing state for
neighbor ID 1.0.0.72 from 2-WAY to EXSTA
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
if_fsm: event BACKUPSEEN resulted in action ELECT and changing state
for interface em0 from WAIT to BCKUP
if_act_elect: interface em0 old dr 10.1.1.72 new dr 10.1.1.72, old bdr
10.1.1.10 new bdr 10.1.1.10
if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing
state for interface em0 from BCKUP to BCKUP
nbr_fsm: event NEGOTIATION_DONE resulted in action SNAPSHOT and
changing state for neighbor ID 1.0.0.72 from EXSTA to SNAP
nbr_fsm: event SNAPSHOT_DONE resulted in action SNAPSHOT_DONE and
changing state for neighbor ID 1.0.0.72 from SNAP to EXCHG
nbr_fsm: event EXCHANGE_DONE resulted in action EXCHANGE_DONE and
changing state for neighbor ID 1.0.0.72 from EXCHG to LOAD
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: transit net, interface em0
nbr_fsm: event LOADING_DONE resulted in action NOTHING and changing
state for neighbor ID 1.0.0.72 from LOAD to FULL
spf_calc: w id 1.0.0.72 type 1 has
no link to v id 10.1.1.72 type 2
spf_calc: area 0.0.0.0 calculated
spf_calc: area 0.0.0.0 calculated

## ADDED "mpls traffic-eng area 0" HERE:

spf_calc: area 0.0.0.0 calculated
fatal in rde: rt_calc: invalid LSA type
lost child: route decision engine exited
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
if_act_elect: interface em0 old dr 10.1.1.72 new dr 10.1.1.10, old bdr
10.1.1.10 new bdr none
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
if_fsm: event NEIGHBORCHANGE resulted in action ELECT and changing
state for interface em0 from BCKUP to DR
nbr_fsm: event KILL_NBR resulted in action DELETE and changing state
for neighbor ID 1.0.0.72 from FULL to DOWN
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
if_fsm: event DOWN resulted in action RESET and changing state for
interface em0 from BCKUP to DOWN
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
orig_rtr_lsa: area 0.0.0.0
orig_rtr_lsa: stub net, interface em0
if_fsm: event DOWN resulted in action RESET and changing state for
interface lo1 from LOOP to DOWN
ospf engine exiting
kernel routing table decoupled
terminating


I have some sniffs if interested. .72 is a Cisco 7200, .10 is OpenBSD:

* Good: http://falz.net/static/ospfd-sniff-ok.pcap
* Bad: http://falz.net/static/ospfd-sniff-mpls-te.pcap

Of course also happens on other platforms (tested Juniper)

--Chris