Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Ondrej Zajicek
On Thu, Jan 28, 2010 at 12:54:03PM +0300, Mikhail A. Grishin wrote:
 First of all, thank you for patch and for fast respond!

 After applying both patches (date patch and well-known communities) on  
 production server, we got some strange errors:

 Jan 28 12:02:04 msk-rsm2 bird: R34485x1: Error: Finite state machine error
 Jan 28 12:02:17 msk-rsm2 bird: R13174x1: Error: Finite state machine error
 Jan 28 12:02:28 msk-rsm2 bird: R3218x1: Error: Finite state machine error
 Jan 28 12:02:30 msk-rsm2 bird: R41842x1: Error: Finite state machine error
 Jan 28 12:03:09 msk-rsm2 bird: R34485x1: Error: Finite state machine error
 Jan 28 12:03:26 msk-rsm2 bird: R41842x1: Error: Finite state machine error
 Jan 28 12:03:29 msk-rsm2 bird: R13174x1: Error: Finite state machine error
 Jan 28 12:03:37 msk-rsm2 bird: R3218x1: Error: Finite state machine error

 What does it mean?

This error messages mean that BIRD received BGP messages (packets) that
were unexpected with regard to the current state of the BGP session.
According to the debug log you sent, it seems that the neighbor sent
UPDATE message immediately after it sent OPEN message, but it should
send KEEPALIVE message first.

I have no idea how such problem might be caused by these patches.
Is this problem related to just a small number of neighbors
and other neighbors work well?

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
To err is human -- to blame it on a computer is even more so.


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Mikhail A. Grishin

Ondrej Zajicek пишет:

On Thu, Jan 28, 2010 at 12:54:03PM +0300, Mikhail A. Grishin wrote:

First of all, thank you for patch and for fast respond!

After applying both patches (date patch and well-known communities) on  
production server, we got some strange errors:


Jan 28 12:02:04 msk-rsm2 bird: R34485x1: Error: Finite state machine error
Jan 28 12:02:17 msk-rsm2 bird: R13174x1: Error: Finite state machine error
Jan 28 12:02:28 msk-rsm2 bird: R3218x1: Error: Finite state machine error
Jan 28 12:02:30 msk-rsm2 bird: R41842x1: Error: Finite state machine error
Jan 28 12:03:09 msk-rsm2 bird: R34485x1: Error: Finite state machine error
Jan 28 12:03:26 msk-rsm2 bird: R41842x1: Error: Finite state machine error
Jan 28 12:03:29 msk-rsm2 bird: R13174x1: Error: Finite state machine error
Jan 28 12:03:37 msk-rsm2 bird: R3218x1: Error: Finite state machine error

What does it mean?


This error messages mean that BIRD received BGP messages (packets) that
were unexpected with regard to the current state of the BGP session.
According to the debug log you sent, it seems that the neighbor sent
UPDATE message immediately after it sent OPEN message, but it should
send KEEPALIVE message first.

I have no idea how such problem might be caused by these patches.
Is this problem related to just a small number of neighbors
and other neighbors work well?



Hi,

We found that Finite state machine error problem is not related to 
your patches. It randomly occurs on our production server at the time of 
daemon startup :((


The problem is occurs on small number of peers.
(2 or 3 or 4 from ~280)
Some problem peers are the same at next startup, some - not.

On test server with small number of active peers (and same config) we 
doesn't see this issue.


What can be done? Right now we see the problem on pure 1.2.0 release...

About UPDATE message immediately after it sent OPEN - we ask one of 
our customers (which hit that problem) to collect debug from his side.

See the attachments (3 files).


One more, after applying the well-known communities patch, the BIRD 
goes to .core two times ( 5-30 seconds after startup):(

Do you need the last core file?
Without well-known communities patch we doesn't see this. (yet)

--
Mikhail A. GrishinE-mail: m...@ripn.net
Phone: +7 (495) 737-0685  MSK-IX  Russian Institute for Public Networks
Phone: +7 (499) 192-9179  Network Operations Center
c7606s-m9-1#show version 
Cisco IOS Software, c7600rsp72043_rp Software
(c7600rsp72043_rp-ADVIPSERVICESK9-M), Version 12.2(33)SRB4, RELEASE SOFTWARE
(fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2008 by Cisco Systems, Inc.
Compiled Wed 23-Jul-08 19:23 by prod_rel_team

ROM: System Bootstrap, Version 12.2(33r)SRB4, RELEASE SOFTWARE (fc1)

 c7606s-m9-1 uptime is 27 weeks, 2 days, 1 hour, 45 minutes
Uptime for this control processor is 27 weeks, 2 days, 2 hours, 19 minutes
Time since c7606s-m9-1 switched to active is 27 weeks, 2 days, 2 hours, 21
minutes
System returned to ROM by s/w reset (SP by bus error at PC 0x8273DCC,
address 0x0)
System restarted at 11:48:30 MSD Tue Jul 21 2009
System image file is
bootdisk:c7600rsp72043-advipservicesk9-mz.122-33.SRB4.bin
Last reload type: Normal Reload


Jan 28 13:02:43.008: BGP: ses global 193.232.246.100 (0) act read request
no-op
.Jan 28 13:02:43.008: BGP: ses global 193.232.246.100 (0) act Adding
topology IPv4 Unicast:base
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active went from Active to
OpenSent
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active sending OPEN, version 4,
my as: 41842, holdtime 180 ID 4DF660A0seconds
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active send message type 1,
length (incl. header) 50
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active rcv message type 1, length
(excl. header) 26
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active rcv OPEN, version 4,
holdtime 180 seconds
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active rcv OPEN w/ OPTION
parameter len: 16
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active rcvd OPEN w/ optional
parameter type 2 (Capability) len 14
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has CAPABILITY code:
1, length 4
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has MP_EXT CAP for
afi/safi: 1/1
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has CAPABILITY code:
2, length 0
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has ROUTE-REFRESH
capability(new) for all address-families
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has CAPABILITY code:
65, length 4
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active unrecognized capability
code: 65 - ingored
.Jan 28 13:02:43.008: BGP: nbr global 193.232.246.100 neighbor does not have
IPv4 MDT topology activated
.Jan 28 13:02:43.008: BGP: nbr global 193.232.246.100 BGP nbr does not have
BGP_AF_IPv4MDT topology activated
.Jan 28 13:02:43.008: BGP: ses global 193.232.246.100 (0) act IPv4
Unicast:base mdt prepare old peer: BGP_MDT_STYLE_NONE or