Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-02-01 Thread Mikhail A. Grishin

Ondrej Zajicek пишет:

On Fri, Jan 29, 2010 at 06:17:43PM +0300, Mikhail A. Grishin wrote:

However, startup process of the daemon still unstable.
It could crash in period 5-40 seconds.
After that time (if not crashed) all seems to be fine.

Do you need latest core file and latest binary?


I cannot get useful info from the core you sent me.
The best thing would be to:

1) use unstripped binary of bird - i don't know whether
stripping is a part of 'make install' or you just stripped
it explicitly, but it should be sufficient to not use
'make install' and just copy bird binary after 'make' to the
final destination. You can see a size difference between
unstripped and stripped binary.

2) enable all logging in bird.conf using 'debug protocols all;'
global option. This is probably too much logging for production
usage, but it would be useful to analyze the crash.

Then, after some crash, send me the (unstripped) binary, the core
and the bird log.


I think that may be this problem isn't related to patches.


I would also expect that. Especially the community patch was
too simple to break anything.



We'll try to reproduce that behavior and to collect debug at time of 
enabling the BIRD on another our route server. I hope that will be on 
this week.







--
Mikhail A. GrishinE-mail: m...@ripn.net
Phone: +7 (495) 737-0685  MSK-IX & Russian Institute for Public Networks
Phone: +7 (499) 192-9179  Network Operations Center


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-29 Thread Ondrej Zajicek
On Fri, Jan 29, 2010 at 06:17:43PM +0300, Mikhail A. Grishin wrote:
> However, startup process of the daemon still unstable.
> It could crash in period 5-40 seconds.
> After that time (if not crashed) all seems to be fine.
>
> Do you need latest core file and latest binary?

I cannot get useful info from the core you sent me.
The best thing would be to:

1) use unstripped binary of bird - i don't know whether
stripping is a part of 'make install' or you just stripped
it explicitly, but it should be sufficient to not use
'make install' and just copy bird binary after 'make' to the
final destination. You can see a size difference between
unstripped and stripped binary.

2) enable all logging in bird.conf using 'debug protocols all;'
global option. This is probably too much logging for production
usage, but it would be useful to analyze the crash.

Then, after some crash, send me the (unstripped) binary, the core
and the bird log.

> I think that may be this problem isn't related to patches.

I would also expect that. Especially the community patch was
too simple to break anything.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-29 Thread Mikhail A. Grishin

Ondrej Zajicek пишет:

On Thu, Jan 28, 2010 at 07:02:46PM +0100, Ondrej Zajicek wrote:

But why at another daemon run the session is up with the same peer?
Direction of session establishing make sense?

Perhaps direction, or perhaps the different daemon is less strict with regard
to state changes and accept this even if it is contrary to the BGP
specification.



Here is a patch that will cause BIRD to be less strict and accept such
behavior.




Hi Ondrej,

After applying the patch this error disappeared. Thanks again!

However, startup process of the daemon still unstable.
It could crash in period 5-40 seconds.
After that time (if not crashed) all seems to be fine.

Do you need latest core file and latest binary?

I think that may be this problem isn't related to patches.
Pure 1.2.0-release version we started only few times on the production 
server and may be that was just lucky attempts...


--
Mikhail A. GrishinE-mail: m...@ripn.net
Phone: +7 (495) 737-0685  MSK-IX & Russian Institute for Public Networks
Phone: +7 (499) 192-9179  Network Operations Center


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Ondrej Zajicek
On Thu, Jan 28, 2010 at 07:02:46PM +0100, Ondrej Zajicek wrote:
> > But why at another daemon run the session is up with the same peer?
> > Direction of session establishing make sense?
> 
> Perhaps direction, or perhaps the different daemon is less strict with regard
> to state changes and accept this even if it is contrary to the BGP
> specification.
> 

Here is a patch that will cause BIRD to be less strict and accept such
behavior.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
diff -uprN bird-1.2.0/proto/bgp/packets.c bird-1.2.0-new/proto/bgp/packets.c
--- bird-1.2.0/proto/bgp/packets.c	2010-01-14 11:06:27.0 +0100
+++ bird-1.2.0-new/proto/bgp/packets.c	2010-01-28 19:14:39.0 +0100
@@ -1001,6 +1001,10 @@ bgp_rx_update(struct bgp_conn *conn, byt
 
   BGP_TRACE_RL(&rl_rcv_update, D_PACKETS, "Got UPDATE");
 
+  /* Workaround for some BGP implementations that skip initial KEEPALIVE */
+  if (conn->state == BS_OPENCONFIRM)
+bgp_conn_enter_established_state(conn);
+
   if (conn->state != BS_ESTABLISHED)
 { bgp_error(conn, 5, 0, NULL, 0); return; }
   bgp_start_timer(conn->hold_timer, conn->hold_time);


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Ondrej Zajicek
On Thu, Jan 28, 2010 at 06:26:38PM +0300, Mikhail A. Grishin wrote:
>> I can't find the KEEPALIVE message in the log, but i don't know Cisco
>> enough to be sure (perhaps it just does not log it).
>>
>> The best thing would be to run on route server:
>>
>> tcpdump -i eth0 -s 0 -v -n  ip host 192.168.1.1 > logfile
>
> See attach (dump with another peer, 193.232.246.198, R34485x1)
>
> As far as I see, the first one from 193.232.246.198 (18:15:06.785501) is  
> Update, the second one (18:15:06.788230) is Keepalive

Yes, that is pretty clear.

> But why at another daemon run the session is up with the same peer?
> Direction of session establishing make sense?

Perhaps direction, or perhaps the different daemon is less strict with regard
to state changes and accept this even if it is contrary to the BGP
specification.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Mikhail A. Grishin

Ondrej Zajicek пишет:

On Thu, Jan 28, 2010 at 04:40:54PM +0300, Mikhail A. Grishin wrote:

Hi,

We found that "Finite state machine error" problem is not related to  
your patches. It randomly occurs on our production server at the time of  
daemon startup :((


The problem is occurs on small number of peers.
(2 or 3 or 4 from ~280)
Some problem peers are the same at next startup, some - not.

On test server with small number of active peers (and same config) we  
doesn't see this issue.


What can be done? Right now we see the problem on pure 1.2.0 release...


This might be a buggy version of firmware in the neighbor, as well as
some strange bug in BIRD.


About "UPDATE message immediately after it sent OPEN" - we ask one of  
our customers (which hit that problem) to collect debug from his side.

See the attachments (3 files).


I can't find the KEEPALIVE message in the log, but i don't know Cisco
enough to be sure (perhaps it just does not log it).

The best thing would be to run on route server:

tcpdump -i eth0 -s 0 -v -n  ip host 192.168.1.1 > logfile


See attach (dump with another peer, 193.232.246.198, R34485x1)

As far as I see, the first one from 193.232.246.198 (18:15:06.785501) is 
Update, the second one (18:15:06.788230) is Keepalive


But why at another daemon run the session is up with the same peer?
Direction of session establishing make sense?



(with appropriate network device and IP address of one of problematic
neighbors)

and send me that logfile.




--
Mikhail A. GrishinE-mail: m...@ripn.net
Phone: +7 (495) 737-0685  MSK-IX & Russian Institute for Public Networks
Phone: +7 (499) 192-9179  Network Operations Center
18:15:06.782511 IP (tos 0xc0, ttl 64, id 9139, offset 0, flags [DF], proto TCP 
(6), length 44) 193.232.246.100.179 > 193.232.246.198.29675: S, cksum 0x711b 
(incorrect (-> 0xf6c2), 1674454884:1674454884(0) ack 791544152 win 65535 
18:15:06.784043 IP (tos 0xc0, ttl 1, id 48499, offset 0, flags [DF], proto TCP 
(6), length 40) 193.232.246.198.29675 > 193.232.246.100.179: ., cksum 0xce7f 
(correct), ack 1 win 16384
18:15:06.784227 IP (tos 0xc0, ttl 1, id 9140, offset 0, flags [DF], proto TCP 
(6), length 85) 193.232.246.100.179 > 193.232.246.198.29675: P, cksum 0x7144 
(incorrect (-> 0x52f9), 1:46(45) ack 1 win 65535: BGP, length: 45
Open Message (1), length: 45
  Version 4, my AS 8631, Holdtime 180s, ID 193.232.246.100
  Optional parameters, length: 16
Option Capabilities Advertisement (2), length: 14
  Multiprotocol Extensions (1), length: 4
AFI IPv4 (1), SAFI Unicast (1)
18:15:06.785012 IP (tos 0xc0, ttl 1, id 48500, offset 0, flags [DF], proto TCP 
(6), length 90) 193.232.246.198.29675 > 193.232.246.100.179: P, cksum 0x8384 
(correct), 1:51(50) ack 1 win 16384: BGP, length: 50
Open Message (1), length: 50
  Version 4, my AS 34485, Holdtime 180s, ID 89.16.63.3
  Optional parameters, length: 21
Option Capabilities Advertisement (2), length: 6
  Multiprotocol Extensions (1), length: 4
AFI IPv4 (1), SAFI Unicast (1)
Option Capabilities Advertisement (2), length: 2
  Route Refresh (Cisco) (128), length: 0
Option Capabilities Advertisement (2), length: 2
  Route Refresh (2), length: 0
Option Capabilities Advertisement (2), length: 3
  Unknown (131), length: 1
no decoder for Capability 131
0x:  00
18:15:06.785501 IP (tos 0xc0, ttl 1, id 48501, offset 0, flags [DF], proto TCP 
(6), length 1450) 193.232.246.198.29675 > 193.232.246.100.179: ., cksum 0xe348 
(correct), 51:1461(1410) ack 1 win 16384: BGP, length: 1410
[|BGP Update]
18:15:06.785516 IP (tos 0xc0, ttl 1, id 9144, offset 0, flags [DF], proto TCP 
(6), length 59) 193.232.246.100.179 > 193.232.246.198.29675: P, cksum 0x712a 
(incorrect (-> 0x094e), 46:65(19) ack 1461 win 64290: BGP, length: 19
Keepalive Message (4), length: 19
18:15:06.787400 IP (tos 0xc0, ttl 1, id 48502, offset 0, flags [DF], proto TCP 
(6), length 1500) 193.232.246.198.29675 > 193.232.246.100.179: ., cksum 0xb370 
(correct), 1461:2921(1460) ack 65 win 16320: BGP, length: 1460
18:15:06.788230 IP (tos 0xc0, ttl 1, id 48503, offset 0, flags [DF], proto TCP 
(6), length 1280) 193.232.246.198.29675 > 193.232.246.100.179: P, cksum 0x4f34 
(correct), 2921:4161(1240) ack 65 win 16320: BGP, length: 1240 [|BGP]
Keepalive Message (4), length: 19
18:15:06.788243 IP (tos 0xc0, ttl 1, id 9145, offset 0, flags [DF], proto TCP 
(6), length 40) 193.232.246.100.179 > 193.232.246.198.29675: ., cksum 0x7117 
(incorrect (-> 0x0233), ack 4161 win 64460
18:15:06.788346 IP (tos 0xc0, ttl 1, id 9146, offset 0, flags [DF], proto TCP 
(6), length 61) 193.232.246.100.179 > 193.232.246.198.29675: P, cksum 0x712c 
(incorrect (-> 0xfac8), 65:86(21) ack 4161 win 65535: BGP, length: 21
Notification Message (3), length

Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Ondrej Zajicek
On Thu, Jan 28, 2010 at 04:40:54PM +0300, Mikhail A. Grishin wrote:
> Hi,
>
> We found that "Finite state machine error" problem is not related to  
> your patches. It randomly occurs on our production server at the time of  
> daemon startup :((
>
> The problem is occurs on small number of peers.
> (2 or 3 or 4 from ~280)
> Some problem peers are the same at next startup, some - not.
>
> On test server with small number of active peers (and same config) we  
> doesn't see this issue.
>
> What can be done? Right now we see the problem on pure 1.2.0 release...

This might be a buggy version of firmware in the neighbor, as well as
some strange bug in BIRD.
>
> About "UPDATE message immediately after it sent OPEN" - we ask one of  
> our customers (which hit that problem) to collect debug from his side.
> See the attachments (3 files).

I can't find the KEEPALIVE message in the log, but i don't know Cisco
enough to be sure (perhaps it just does not log it).

The best thing would be to run on route server:

tcpdump -i eth0 -s 0 -v -n  ip host 192.168.1.1 > logfile

(with appropriate network device and IP address of one of problematic
neighbors)

and send me that logfile.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Ondrej Zajicek
On Thu, Jan 28, 2010 at 04:40:54PM +0300, Mikhail A. Grishin wrote:
> We found that "Finite state machine error" problem is not related to  
> your patches. It randomly occurs on our production server at the time of  
> daemon startup :((
>
> The problem is occurs on small number of peers.
> (2 or 3 or 4 from ~280)
> Some problem peers are the same at next startup, some - not.
>
> On test server with small number of active peers (and same config) we  
> doesn't see this issue.
>
> What can be done? Right now we see the problem on pure 1.2.0 release...
>
> About "UPDATE message immediately after it sent OPEN" - we ask one of  
> our customers (which hit that problem) to collect debug from his side.
> See the attachments (3 files).

I will look at it.

> One more, after applying the "well-known communities" patch, the BIRD  
> goes to .core two times ( 5-30 seconds after startup):(
> Do you need the last core file?

Yes, if you could send me core file and bird binary.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Mikhail A. Grishin

Ondrej Zajicek пишет:

On Thu, Jan 28, 2010 at 12:54:03PM +0300, Mikhail A. Grishin wrote:

First of all, thank you for patch and for fast respond!

After applying both patches (date patch and well-known communities) on  
production server, we got some strange errors:


Jan 28 12:02:04 msk-rsm2 bird: R34485x1: Error: Finite state machine error
Jan 28 12:02:17 msk-rsm2 bird: R13174x1: Error: Finite state machine error
Jan 28 12:02:28 msk-rsm2 bird: R3218x1: Error: Finite state machine error
Jan 28 12:02:30 msk-rsm2 bird: R41842x1: Error: Finite state machine error
Jan 28 12:03:09 msk-rsm2 bird: R34485x1: Error: Finite state machine error
Jan 28 12:03:26 msk-rsm2 bird: R41842x1: Error: Finite state machine error
Jan 28 12:03:29 msk-rsm2 bird: R13174x1: Error: Finite state machine error
Jan 28 12:03:37 msk-rsm2 bird: R3218x1: Error: Finite state machine error

What does it mean?


This error messages mean that BIRD received BGP messages (packets) that
were unexpected with regard to the current state of the BGP session.
According to the debug log you sent, it seems that the neighbor sent
UPDATE message immediately after it sent OPEN message, but it should
send KEEPALIVE message first.

I have no idea how such problem might be caused by these patches.
Is this problem related to just a small number of neighbors
and other neighbors work well?



Hi,

We found that "Finite state machine error" problem is not related to 
your patches. It randomly occurs on our production server at the time of 
daemon startup :((


The problem is occurs on small number of peers.
(2 or 3 or 4 from ~280)
Some problem peers are the same at next startup, some - not.

On test server with small number of active peers (and same config) we 
doesn't see this issue.


What can be done? Right now we see the problem on pure 1.2.0 release...

About "UPDATE message immediately after it sent OPEN" - we ask one of 
our customers (which hit that problem) to collect debug from his side.

See the attachments (3 files).


One more, after applying the "well-known communities" patch, the BIRD 
goes to .core two times ( 5-30 seconds after startup):(

Do you need the last core file?
Without "well-known communities" patch we doesn't see this. (yet)

--
Mikhail A. GrishinE-mail: m...@ripn.net
Phone: +7 (495) 737-0685  MSK-IX & Russian Institute for Public Networks
Phone: +7 (499) 192-9179  Network Operations Center
c7606s-m9-1#show version 
Cisco IOS Software, c7600rsp72043_rp Software
(c7600rsp72043_rp-ADVIPSERVICESK9-M), Version 12.2(33)SRB4, RELEASE SOFTWARE
(fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2008 by Cisco Systems, Inc.
Compiled Wed 23-Jul-08 19:23 by prod_rel_team

ROM: System Bootstrap, Version 12.2(33r)SRB4, RELEASE SOFTWARE (fc1)

 c7606s-m9-1 uptime is 27 weeks, 2 days, 1 hour, 45 minutes
Uptime for this control processor is 27 weeks, 2 days, 2 hours, 19 minutes
Time since c7606s-m9-1 switched to active is 27 weeks, 2 days, 2 hours, 21
minutes
System returned to ROM by s/w reset (SP by bus error at PC 0x8273DCC,
address 0x0)
System restarted at 11:48:30 MSD Tue Jul 21 2009
System image file is
"bootdisk:c7600rsp72043-advipservicesk9-mz.122-33.SRB4.bin"
Last reload type: Normal Reload


Jan 28 13:02:43.008: BGP: ses global 193.232.246.100 (0) act read request
no-op
.Jan 28 13:02:43.008: BGP: ses global 193.232.246.100 (0) act Adding
topology IPv4 Unicast:base
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active went from Active to
OpenSent
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active sending OPEN, version 4,
my as: 41842, holdtime 180 ID 4DF660A0seconds
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active send message type 1,
length (incl. header) 50
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active rcv message type 1, length
(excl. header) 26
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active rcv OPEN, version 4,
holdtime 180 seconds
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active rcv OPEN w/ OPTION
parameter len: 16
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active rcvd OPEN w/ optional
parameter type 2 (Capability) len 14
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has CAPABILITY code:
1, length 4
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has MP_EXT CAP for
afi/safi: 1/1
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has CAPABILITY code:
2, length 0
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has ROUTE-REFRESH
capability(new) for all address-families
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active OPEN has CAPABILITY code:
65, length 4
.Jan 28 13:02:43.008: BGP: 193.232.246.100 active unrecognized capability
code: 65 - ingored
.Jan 28 13:02:43.008: BGP: nbr global 193.232.246.100 neighbor does not have
IPv4 MDT topology activated
.Jan 28 13:02:43.008: BGP: nbr global 193.232.246.100 BGP nbr does not have
BGP_AF_IPv4MDT topology activated
.Jan 28 13:02:43.008: BGP: ses global 193.232.246.100 (0) act IPv4
Unicast:base mdt prepare old peer: BGP_MDT_STYLE_NO

Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Ondrej Zajicek
On Thu, Jan 28, 2010 at 12:54:03PM +0300, Mikhail A. Grishin wrote:
> First of all, thank you for patch and for fast respond!
>
> After applying both patches (date patch and well-known communities) on  
> production server, we got some strange errors:
>
> Jan 28 12:02:04 msk-rsm2 bird: R34485x1: Error: Finite state machine error
> Jan 28 12:02:17 msk-rsm2 bird: R13174x1: Error: Finite state machine error
> Jan 28 12:02:28 msk-rsm2 bird: R3218x1: Error: Finite state machine error
> Jan 28 12:02:30 msk-rsm2 bird: R41842x1: Error: Finite state machine error
> Jan 28 12:03:09 msk-rsm2 bird: R34485x1: Error: Finite state machine error
> Jan 28 12:03:26 msk-rsm2 bird: R41842x1: Error: Finite state machine error
> Jan 28 12:03:29 msk-rsm2 bird: R13174x1: Error: Finite state machine error
> Jan 28 12:03:37 msk-rsm2 bird: R3218x1: Error: Finite state machine error
>
> What does it mean?

This error messages mean that BIRD received BGP messages (packets) that
were unexpected with regard to the current state of the BGP session.
According to the debug log you sent, it seems that the neighbor sent
UPDATE message immediately after it sent OPEN message, but it should
send KEEPALIVE message first.

I have no idea how such problem might be caused by these patches.
Is this problem related to just a small number of neighbors
and other neighbors work well?

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-28 Thread Mikhail A. Grishin

Ondrej Zajicek пишет:

On Wed, Jan 27, 2010 at 04:49:45PM +0300, Mikhail A. Grishin wrote:

Handling of no-export community is hardcoded in BIRD, so such routes are not
exported to the external neighbors, as it is expected. I can send you a patch
that causes BIRD to ignore well-known communities (and leaves such behavior
to configured filters) and we will make this behavior configurable in the
next version.


Yes, it would be nice, please send me a patch.
We expected that we could choose, which well-known communities must go  
through RS, which is not.

Right now we only interested in transparency for "no-export".


Here is the patch that removes hardcoded handling of well-known
communities. If you want the other well-known communities to
work, you have to add appropriate code to export filters.




Hi,

First of all, thank you for patch and for fast respond!

After applying both patches (date patch and well-known communities) on 
production server, we got some strange errors:


Jan 28 12:02:04 msk-rsm2 bird: R34485x1: Error: Finite state machine error
Jan 28 12:02:17 msk-rsm2 bird: R13174x1: Error: Finite state machine error
Jan 28 12:02:28 msk-rsm2 bird: R3218x1: Error: Finite state machine error
Jan 28 12:02:30 msk-rsm2 bird: R41842x1: Error: Finite state machine error
Jan 28 12:03:09 msk-rsm2 bird: R34485x1: Error: Finite state machine error
Jan 28 12:03:26 msk-rsm2 bird: R41842x1: Error: Finite state machine error
Jan 28 12:03:29 msk-rsm2 bird: R13174x1: Error: Finite state machine error
Jan 28 12:03:37 msk-rsm2 bird: R3218x1: Error: Finite state machine error

What does it mean?
These peers worked fine before applying the patches.

Debug:
===
Jan 28 12:40:17 msk-rsm2 bird: R41842x1: Incoming connection from 
193.232.246.200 (port 20880) rejected

Jan 28 12:40:19 msk-rsm2 bird: R41842x1: Started
Jan 28 12:40:19 msk-rsm2 bird: R41842x1: Connect delayed by 60 seconds
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Incoming connection from 
193.232.246.200 (port 33675) accepted
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Sending 
OPEN(ver=4,as=8631,hold=180,id=c1e8f664)
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Got 
OPEN(as=41842,hold=180,id=4df660a0)

Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Sending KEEPALIVE
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Got UPDATE
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Error: Finite state machine error
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Sending NOTIFICATION(code=5.0)
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Down
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Starting
Jan 28 12:40:36 msk-rsm2 bird: R41842x1: Startup delayed by 300 seconds
=

Config for one of that peers:
#-
# Client 'media', AS 41842
table T41842;

filter bgp_in_AS41842
prefix set allnet;

# AS_PATH filter is temporary disabled
#int set allas;

{
  if ! (avoid_martians()) then reject;
  if (bgp_path.first != 41842 ) then reject;

# AS_PATH filter is temporary disabled
#  allas = [ 47445, 45018, 44522, 45029, 44597, 42728, 42139, 42385 ];
#  if ! (bgp_path.last ~ allas) then reject;

  allnet = [ 62.105.32.0/19, 62.105.48.0/20, 62.109.0.0/20, 
62.109.0.0/21, 62.109.8.0/21, 62.109.16.0/21, 62.109.24.0/22, 
62.109.28.0/22, 77.236.224.0/19, 77.236.224.
0/20, 77.236.240.0/21, 77.236.248.0/21, 77.246.96.0/21, 77.246.104.0/21, 
77.246.144.0/21, 77.246.148.0/22, 78.24.216.0/21, 79.98.136.0/21, 
79.174.32.0/19, 79.174.32.0
/20, 79.174.48.0/20, 80.244.224.0/20, 80.253.30.0/24, 80.253.31.0/24, 
82.114.96.0/19, 82.114.96.0/20, 82.114.99.0/24, 82.114.112.0/21, 
82.114.120.0/21, 82.146.32.0/21
, 82.146.37.0/24, 82.146.40.0/21, 82.146.48.0/21, 82.146.56.0/21, 
85.249.0.0/21, 87.255.0.0/23, 87.255.2.0/23, 87.255.2.0/24, 
87.255.3.0/24, 87.255.4.0/22, 87.255.8.0
/22, 87.255.8.0/24, 87.255.9.0/24, 87.255.9.252/30, 87.255.10.0/23, 
87.255.12.0/22, 87.255.16.0/21, 87.255.24.0/21, 88.210.52.0/22, 
89.255.64.0/21, 89.255.68.0/22, 89
.255.72.0/21, 89.255.80.0/22, 89.255.94.0/23, 89.255.95.0/24, 
91.192.244.0/22, 91.200.28.0/22, 91.200.28.0/23, 91.200.30.0/23, 
91.204.108.0/22, 91.206.14.0/23, 91.210
.84.0/22, 91.210.228.0/22, 91.210.228.0/23, 91.210.230.0/24, 
91.210.231.0/24, 92.63.96.0/21, 92.63.104.0/22, 92.63.108.0/22, 
92.63.108.0/24, 93.92.32.0/21, 93.186.48.
0/20, 94.28.112.0/22, 94.28.116.0/22, 94.158.160.0/20, 94.158.160.0/21, 
94.158.168.0/21, 94.159.0.0/17, 95.128.176.0/22, 95.128.178.0/23, 
188.120.32.0/20, 188.120.32.
0/21, 188.120.40.0/22, 188.120.44.0/22, 188.120.224.0/20, 
188.120.240.0/21, 188.120.248.0/21, 188.133.136.0/21, 188.133.152.0/21, 
193.169.32.0/23, 193.169.96.0/23, 19
3.169.174.0/23, 193.192.128.0/20, 193.192.144.0/20, 193.192.144.0/24, 
193.192.144.0/25, 193.192.145.0/24, 194.9.224.0/20, 194.54.176.0/22, 
194.107.23.0/24, 194.110.25
3.0/24, 195.62.62.0/23, 195.62.62.0/24, 195.62.63.0/24, 195.88.92.0/23, 
195.88.170.0/23, 195.88.170.0/24, 195.88.171.0/24, 195.216.241.0/24, 
195.218.134.0/24, 212.16.
0.0/19, 212.16.0.0/2

Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-27 Thread Ondrej Zajicek
On Wed, Jan 27, 2010 at 04:49:45PM +0300, Mikhail A. Grishin wrote:
>> Handling of no-export community is hardcoded in BIRD, so such routes are not
>> exported to the external neighbors, as it is expected. I can send you a patch
>> that causes BIRD to ignore well-known communities (and leaves such behavior
>> to configured filters) and we will make this behavior configurable in the
>> next version.
>>
>
> Yes, it would be nice, please send me a patch.
> We expected that we could choose, which well-known communities must go  
> through RS, which is not.
> Right now we only interested in transparency for "no-export".

Here is the patch that removes hardcoded handling of well-known
communities. If you want the other well-known communities to
work, you have to add appropriate code to export filters.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
diff --git a/proto/bgp/attrs.c b/proto/bgp/attrs.c
index 5316481..bd716c8 100644
--- a/proto/bgp/attrs.c
+++ b/proto/bgp/attrs.c
@@ -679,6 +679,7 @@ bgp_export_check(struct bgp_proto *p, ea_list *new)
 }
 
   /* Check if we aren't forbidden to export the route by communities */
+/*
   a = ea_find(new, EA_CODE(EAP_BGP, BA_COMMUNITY));
   if (a)
 {
@@ -696,7 +697,7 @@ bgp_export_check(struct bgp_proto *p, ea_list *new)
 	  return 0;
 	}
 }
-
+*/
   return 1;
 }
 


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-27 Thread Mikhail A. Grishin

Ondrej Zajicek пишет:

On Mon, Jan 25, 2010 at 07:12:50PM +0300, Mikhail A. Grishin wrote:

Ondrej Zajicek ?:

On Mon, Jan 25, 2010 at 03:15:14PM +0300, Mikhail A. Grishin wrote:

Yes, we had some filters on pipes for every client.
Upon your explanation, we transfered filters from pipes to the 
"protocol  bgp" section of the config.


Now "show protocols all" output shows all needed information.

Just a note that moving filters from pipes to bgp protocols have several
other consequences, some positive, some negative:

1) You see just what is accepted and you cannot directly see rejected routes
(but they can be seen in log, if enabled).
Yes, we planned for that case temporary transfer filters back to pipes  
for some peer... But if it could be done by logs, may be you can show an  
example?


You can use 'debug { filters }' in a bgp protocol section and it will
log all routes in filters of that bgp protocol.


Very valuable info for us!
In general, we want to minimize causes of restarting BGP sessions with  
our clients.
Is this task (new behavior of 'configure' command) have a good priority  
for developers?


Other IXes that use BIRD use (AFAIK) 'configure soft' and 'reload' approach,
so it was not a big demand for this change yet. But it should not be hard
to do that change and i would like to have this change soon.

One more question, about routes with "no-export" community ==  
((65535,65281))


We need to transparently advertise such routes through RS to RS clients.

We try to transparently advertise, these routes seen in 'show routes  
all', and in 'show routes all table Tanother_peer', but peer  
'another_peer' doesn't receive routes with community  (65535,65281).


Handling of no-export community is hardcoded in BIRD, so such routes are not
exported to the external neighbors, as it is expected. I can send you a patch
that causes BIRD to ignore well-known communities (and leaves such behavior
to configured filters) and we will make this behavior configurable in the
next version.



Yes, it would be nice, please send me a patch.
We expected that we could choose, which well-known communities must go 
through RS, which is not.

Right now we only interested in transparency for "no-export".


--
Mikhail A. GrishinE-mail: m...@ripn.net
Phone: +7 (495) 737-0685  MSK-IX & Russian Institute for Public Networks
Phone: +7 (499) 192-9179  Network Operations Center


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-27 Thread Ondrej Zajicek
On Mon, Jan 25, 2010 at 07:12:50PM +0300, Mikhail A. Grishin wrote:
> Ondrej Zajicek ?:
>> On Mon, Jan 25, 2010 at 03:15:14PM +0300, Mikhail A. Grishin wrote:
>>> Yes, we had some filters on pipes for every client.
>>> Upon your explanation, we transfered filters from pipes to the 
>>> "protocol  bgp" section of the config.
>>>
>>> Now "show protocols all" output shows all needed information.
>>
>> Just a note that moving filters from pipes to bgp protocols have several
>> other consequences, some positive, some negative:
>>
>> 1) You see just what is accepted and you cannot directly see rejected routes
>> (but they can be seen in log, if enabled).
>
> Yes, we planned for that case temporary transfer filters back to pipes  
> for some peer... But if it could be done by logs, may be you can show an  
> example?

You can use 'debug { filters }' in a bgp protocol section and it will
log all routes in filters of that bgp protocol.

> Very valuable info for us!
> In general, we want to minimize causes of restarting BGP sessions with  
> our clients.
> Is this task (new behavior of 'configure' command) have a good priority  
> for developers?

Other IXes that use BIRD use (AFAIK) 'configure soft' and 'reload' approach,
so it was not a big demand for this change yet. But it should not be hard
to do that change and i would like to have this change soon.

> One more question, about routes with "no-export" community ==  
> ((65535,65281))
>
> We need to transparently advertise such routes through RS to RS clients.
>
> We try to transparently advertise, these routes seen in 'show routes  
> all', and in 'show routes all table Tanother_peer', but peer  
> 'another_peer' doesn't receive routes with community  (65535,65281).

Handling of no-export community is hardcoded in BIRD, so such routes are not
exported to the external neighbors, as it is expected. I can send you a patch
that causes BIRD to ignore well-known communities (and leaves such behavior
to configured filters) and we will make this behavior configurable in the
next version.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-25 Thread Ondrej Zajicek
On Mon, Jan 25, 2010 at 03:15:14PM +0300, Mikhail A. Grishin wrote:
> Yes, we had some filters on pipes for every client.
> Upon your explanation, we transfered filters from pipes to the "protocol  
> bgp" section of the config.
>
> Now "show protocols all" output shows all needed information.

Just a note that moving filters from pipes to bgp protocols have several
other consequences, some positive, some negative:

1) You see just what is accepted and you cannot directly see rejected routes
(but they can be seen in log, if enabled).

2) Rejected routes does not eat memory.

3) If you change filters and call 'configure', the appropriate protocol
si restarted. If it is a pipe, it causes a route flap (routes propagated
through the pipe a withdrawn and immediately repropagated), if it is a bgp
protocol, it also causes a session flap (BGP session is restarted).
Possible workaround is to use 'configure soft' command (which ignores
changes related to filters) and then 'reload' command for protocols with
changed filters. We will change the behavior of 'configure' command to
do this automatically in next versions.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-22 Thread Ondrej Zajicek
On Fri, Jan 22, 2010 at 10:18:24AM +0300, Alexander Ilin wrote:
> On Thu, Jan 21, 2010 at 11:09:52PM +0100, Ondrej Zajicek wrote:
> > On Fri, Jan 22, 2010 at 12:14:51AM +0300, Alexander Ilin wrote:
> > > > The line with 'Routes:' is probably what you want.
> > > 
> > >  Hi Ondrej.
> > > 
> > > You are not right - these values are not the same. Look at our output:
> > 
> > These values should be the same. It is strange that they are not in your 
> > case.
> > But i see that your R15835x1 is connected to table T15835, but you run 
> > 'show'
> > command on table master. What if you use command:
> > 
> > show route protocol R15835x1 table T15835 count
> > 
> > it is the same as the value in 'show protocols all' ?
> 
>  Hi Ondrej, thanks for a fast reply. Really, your command show another amount
> of prefixes,

Is that value the same as the value in 'show protocols all' ?

> but we want to see amount of prefixes, that accepted from peer,
> and as we could see ' show route protocol R15835x1 count' show exactly amount
> of prefixes, that accepted from peer. So, I have two questions:
> 
> 1. Could you help us to understood what is the difference between the commands
> and why we see a little bit more prefixes with 'show route protocol R15835x1
> table T15835 count' instead of 'show route protocol R15835x1 count' ?

In your BIRD config, there are several routing tables and pipes between
them. Prefixes accepted from protocol R15835x1 are put to table T15835
and then they are propagated through some pipe to table master. The
difference between the commands is that each show the number of prefixes
in the different table. I am not sure why there is any difference,
perhaps you have some configured filter on that pipe?

> 2. Are there any way to see amount of accepted prefixes (show route protocol
> R15835x1 count) in show protocols all?

You can look at 'Routes: ' line in 'show protocols all' not for R15835x1
but for the pipe connecting T15835 and master. But this is also
something slightly different (if there is more protocols connected to
table T15835) and statistics for pipes works in one direction only
(which is a bug and will be fixed).

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-21 Thread Ondrej Zajicek
On Fri, Jan 22, 2010 at 12:14:51AM +0300, Alexander Ilin wrote:
> > The line with 'Routes:' is probably what you want.
> 
>  Hi Ondrej.
> 
> You are not right - these values are not the same. Look at our output:

These values should be the same. It is strange that they are not in your case.
But i see that your R15835x1 is connected to table T15835, but you run 'show'
command on table master. What if you use command:

show route protocol R15835x1 table T15835 count

it is the same as the value in 'show protocols all' ?

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-21 Thread Ondrej Zajicek
On Thu, Jan 21, 2010 at 06:41:15PM +0300, Mikhail A. Grishin wrote:
> Hi,
>
> Another one question regarding looking glass output.
> Is it possible to show current prefix count in 'show protocols' output?
>
> Right now, if we want to collect prefix counters, we need to apply 'show  
> route protocol R1234x1 count' commands for every router which has  
> "Established" state in 'show protocols' output.
>

You can use 'show protocols all' command:

ospf1OSPF master   upJan07  Running
  Preference: 150
  Input filter:   ACCEPT
  Output filter:  REJECT
  Routes: 1 imported, 0 exported, 0 preferred
  Route change stats: received   rejected   filteredignored   accepted
Import updates:  4  0  0  3  1
Import withdraws:0  0---  0  0
Export updates:  2  1  1---  0
Export withdraws:0---------  0

The line with 'Routes:' is probably what you want.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-21 Thread Ondrej Zajicek
On Tue, Jan 19, 2010 at 12:54:43PM +0300, Mikhail A. Grishin wrote:
> About full time and date info in "show" output -- yes, please send to us  
> the patch. And thank you very much for it!

Here is one.

> Do you have a plans to integrate that patch in main source code in the  
> future releases?

We will probably implement soome configurable time format.

> About running the BIRD as "non-root" uid: We'll wait for stable  
> solution. If it requires big changes in source codes (or not so big in  
> RS case), we think that quick patches can reduce the overall stability  
> of the product and should be well-tested.

OK

> P.S. MSK-RSM2, that running BIRD for IPv4, is stable and we didn't see  
> any major issues.

That is nice.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
diff --git a/nest/proto.c b/nest/proto.c
index 9f0311f..da64a7f 100644
--- a/nest/proto.c
+++ b/nest/proto.c
@@ -730,12 +730,12 @@ proto_state_name(struct proto *p)
 static void
 proto_do_show(struct proto *p, int verbose)
 {
-  byte buf[256], reltime[TM_RELTIME_BUFFER_SIZE];
+  byte buf[256], reltime[TM_DATETIME_BUFFER_SIZE];
 
   buf[0] = 0;
   if (p->proto->get_status)
 p->proto->get_status(p, buf);
-  tm_format_reltime(reltime, p->last_state_change);
+  tm_format_datetime(reltime, p->last_state_change);
   cli_msg(-1002, "%-8s %-8s %-8s %-5s %-5s  %s",
 	  p->name,
 	  p->proto->name,
diff --git a/nest/rt-table.c b/nest/rt-table.c
index df2834a..3927c94 100644
--- a/nest/rt-table.c
+++ b/nest/rt-table.c
@@ -1114,11 +1114,11 @@ static void
 rt_show_rte(struct cli *c, byte *ia, rte *e, struct rt_show_data *d, ea_list *tmpa)
 {
   byte via[STD_ADDRESS_P_LENGTH+32], from[STD_ADDRESS_P_LENGTH+6];
-  byte tm[TM_RELTIME_BUFFER_SIZE], info[256];
+  byte tm[TM_DATETIME_BUFFER_SIZE], info[256];
   rta *a = e->attrs;
 
   rt_format_via(e, via);
-  tm_format_reltime(tm, e->lastmod);
+  tm_format_datetime(tm, e->lastmod);
   if (ipa_nonzero(a->from) && !ipa_equal(a->from, a->gw))
 bsprintf(from, " from %I", a->from);
   else


signature.asc
Description: Digital signature


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-15 Thread Martin Mares
Hello!

> It is not possible. I have this feature in TODO list, but it is pretty
> big change. OTOH, for just the route server case (just BGP without
> kernel routing table sync) that might be an easy.

It might be also possible to drop root UID, but retain a subset of root's
capabilities, e.g., binding to a low port.

Have a nice fortnight
-- 
Martin `MJ' Mares http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Black holes are where God divided by zero.


Re: [Euro-ix-rs-vwg] New release 1.2.0

2010-01-15 Thread Ondrej Zajicek
On Fri, Jan 15, 2010 at 05:23:48PM +0300, Mikhail A. Grishin wrote:
> Hi, Ondrej
>
> BIRD at production server is still stable and VERY fast (comparing with  
> Quagga).
>
> We have some questions about BIRD and may be some bugs.
> Hope you could help us with these issues.
>
> 1. Is it possible to run daemon not as root, but as some unpriveleged  
> user (like quagga does)? This is very important for security reasons.
> (binding to 179/tcp port with root priveleges, other tasks without root)

It is not possible. I have this feature in TODO list, but it is pretty
big change. OTOH, for just the route server case (just BGP without
kernel routing table sync) that might be an easy.

> 2. Is it possible to organise birdc interface to work in "read only"  
> mode, with limited set of commands, like "show ...", "help" and "exit"?
> This is for duty staff, and for looking glass access.
> (Also there are many security reasons in this question)
>
> 3. Is it possible to implement option in config file, that specify  
> permissions for sock file (bird.ctl) ?
> This is for access to birdc console from non-root rights (until 2. is  
> unresolved).

No, it is not implemented. A workaround might be to use chmod/chown on
bird.ctl in start script, or define sudo commands for appropriate tasks.

> 4. How you apply automatic cron reconfiguration of bird?
> How could we say "reconfigure" from birdc interface inside some scripts?
> Do you have working examples? We plan to do it from remote machine via ssh.

echo configure | birdc

> 5. Lack of text output filters.
> If we need to view some very big output (1+ routes from some peer),  
> we want to apply search filters to text output (like "| grep", "| grep  
> -v", "| begin" (this is from cisco) )

You can use integrated filters:

show route where ...

or redirect output

echo show route | birdc | grep ...

> 6. Lack of text output redirection to external file.
> If we want to save large output into text file for further analysis, we  
> want to do something like: "show route all > file.txt"

echo show route all | birdc > file.txt

> 7. How could we turn off paging(more) inside birdc console ?

It is turned off if output is redirected to file/pipe.

> 8. (Bug?) On test Bird installation, with 3 peers only, at 11am  
> today(15Jan) I saw that session with some peer is up since "15:36" (and  
> no date). I understood, that is means 15:36 14Jan.
> After 12:30, (90 minules later) the same session shows "Jan14" (there is  
> no more 15:36). Why so?

Limit is 20 hours. Afer that, just a day is shown. Rather strange behavior,
i acknowledge.

> 9. (addition to 8.)In general, we want to see time and date output for  
> every session, every route. Is it possible?
> This is VERY important for looking glass tasks.

There is no config option for it, but it could be done by simple change
in the source code. I could send you a patch, if you want.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."


signature.asc
Description: Digital signature