On Sun, Jun 21, 2015 at 11:39:30AM +0100, Paul Jakma wrote:
> Hi Ondrej,
> 
> On Thu, 18 Jun 2015, Ondrej Zajicek wrote:
> 
> >Hello
> >
> >We noticed an incompatibility between Quagga and BIRD in
> >OSPF loading phase:
> >
> >In short:
> >
> >1) BIRD sends LSREQ packet with ~60 LSR.
> >
> >2) Quagga answers with one LSUPD packet with ~20 LSA.
> >
> >3) BIRD waits for remaining LSUPD packets (to cover all requested LSAs)
> >before sending new LSREQ.
> >
> >4) Usually no more LSUPD is sent by Quagga, therefore BIRD timeouts
> >(RxmtInterval) before sending next LSREQ packet. This leads to
> >significantly longer loading phase.
> 
> That doesn't sound good. Do you still have the pcap?

I don't have pcap, i have log from the BIRD side (from Andrew
<[email protected]>), describing sent and received packets and neighbor
events.


> >We noticed this problem with Quagga 0.99.22 and BIRD 1.5.0
> >
> >My interpretation of RFC 2328 10.9 is that when LSREQ is received, it
> >should be immediately answered by multiple LSUPD packets to cover all LSRs
> >from LSREQ. And when LSREQ is sent, it should be completely answered
> >before sending the next one. But i would say that this aspect of RFC 2328
> >is especially vague and i see other alternative interpretations. What is
> >the opinion of Quagga developers w.r.t. this issue?
> 
> The code concerned in Quagga has not changed in a long long time I think. We
> did implement Ogier's optimisation to the DD Exchange process, but that
> should be compatible with normal OSPF, and shouldn't affect the LSA
> Req/Update process thereafter regardless I think.
> 
> Quagga builds a list of all the LSAs to send when parsing the LSA Req. Then,
> in a seperate green-thread that is scheduled without delay after that, it
> should turn that list into a series of LSA Updates packets which are
> scheduled to be written out in another green-thread.

Glad to see that at least everybody agrees on interpretation of RFC 2328 10.9 .

Are there some kind of artificial delays in the last thread (like to
maintaining max packet rate per) between such LSA Update transmissions?

> Quagga shouldn't delay sending those.
> 
> Possibilities:
> 
> * One of the LSAs sent by BIRD was not in Quagga's LSDB, so Quagga raised
>   a BadLSReq event, and should have returned to the ExStart neighbour-state.
>   You should see DD negotiation packets from Quagga then.

That did not happen, as that would reset neighbor state even on the other
side, which did not happen:

2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from Down to Init
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from Init to 2-Way
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from 2-Way to ExStart
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from ExStart to Exchange
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from Exchange to Loading
2015-06-16 22:56:40 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from Loading to Full


> * One of the LSAs was too big to be sent in one MTU packet, Quagga tried
>   to send it in a fragmented LSA Update packet, but that didn't work and
>   got dropped somewhere under Quagga. Though, you should still have seen
>   the rest of the LSA Updates then - unless all remaining were very large
>   LSAs.
> 
>   Was it a network where routers had /lots/ of links?
> 
>   Unlikely...
> 
> * Similar, all the remaining LSAs were too large to fit in an IP+OSPF
>   packet. We drop the packet and warn the admin. Unlikely...

This is very unlikely, as all Type 1 LSAs from the first LSREQ packet
were sent in the first LSUPD packet (together with others) and remaining
LSAs were Type 3 and Type 4.


> * Some kind of unfairness issue somewhere meant Quagga never got around to
>   either the "build the LSA Update response packets" green-thread, or else
>   the "write out the packet queue for that neighbour" green-thread (we do
>   prioritise some packets in the write-out-queue, e.g. Hellos - so
>   there's potential unfairness there that could affect things under high
>   loads).

Unlikely, as the first LSA Update packets triggered by LS Request packets were
sent immediately.


> Or something else....
> 
> I don't see anything obvious unfortunately.
> 
> Do you have more details on the network used? Do you remember if it was
> reproducible?

Well, i did not experience the issue, i am just BIRD developer trying to
resolve the bug in BIRD/Quagga loading reported on BIRD mailing list.

Here is the commented relevant part of the log (sent separately):

2015-06-16 22:54:55 <TRACE> ospf_ua: HELLO packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: New neighbor 10.255.192.102 on vlan1, IP 
address 192.168.255.93
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from Down to Init
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from Init to 2-Way
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from 2-Way to ExStart
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from ExStart to Exchange
 ... the concerned exchange started

2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: LSREQ packet sent to nbr 10.255.192.102 on 
vlan1
 ... initial LSREQ sent

2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: LSUPD packet received from nbr 
10.255.192.102 on vlan1
 ... first part of answer to the initial LSREQ containing proper LSAs

2015-06-16 22:54:55 <TRACE> ospf_ua: LSUPD packet flooded via vlan1
 ... propagation of upgraded self-originated LSA triggered by the old
     one received in the LSUPD above

2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on 
vlan1
 ... long sequence of DBDES sent/received exchange
 .., I would already expect remaining LSUPD packets mixed-in here

2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: Neighbor 10.255.192.102 on vlan1 changed 
state from Exchange to Loading
 ... here the DBDES exchange is completed

2015-06-16 22:54:55 <TRACE> ospf_ua: LSUPD packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:55 <TRACE> ospf_ua: LSUPD packet received from nbr 
10.255.192.102 on vlan1
 ... now we received two more LSUPD packets [*], but these LSUPDs contain only
     some freshly originated Type 5 LSAs not required by the initial LSREQ.

     This is probably triggered by DBES exchange completion. Not sure if this
     may be the cause of problem, but probably no - they should be already sent

2015-06-16 22:54:55 <TRACE> ospf_ua: Non-HELLO packet received from unknown nbr 
10.255.192.101 on vlan1, src 192.168.255.66
2015-06-16 22:54:56 <TRACE> ospf_ua: HELLO packet received from nbr 
xxx.xxx.132.13 on vlan1
2015-06-16 22:54:56 <TRACE> ospf_ua: HELLO packet received from nbr 
10.255.192.101 on vlan1
2015-06-16 22:54:57 <TRACE> ospf_ua: DBDES packet sent to nbr 10.255.192.101 on 
vlan1
 ... here starts some other initial neighbor exchange, but there is already
     1-2 sec interval where nbr 10.255.192.102 sent nothing while it should
     send remaining LSUPDs

2015-06-16 22:54:57 <TRACE> ospf_ua: LSACK packet received from nbr 
10.255.192.102 on vlan1
2015-06-16 22:54:59 <TRACE> ospf_ua: LSACK packet received from nbr 
10.255.192.102 on vlan1
 ... here are acknowledgemens for LSAs from the

2015-06-16 22:55:00 <TRACE> ospf_ua: LSUPD packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:55:00 <TRACE> ospf_ua: LSUPD packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:55:00 <TRACE> ospf_ua: LSUPD packet sent to nbr 10.255.192.102 on 
vlan1
2015-06-16 22:55:00 <TRACE> ospf_ua: LSREQ packet sent to nbr 10.255.192.102 on 
vlan1
 ... another LSREQ sent after RxmtInterval


-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: [email protected])
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to