RE: Illegal header length in BGP error

2009-02-24 Thread Matthew Huff
Yep, got a reply from cisco. It's a cisco bug:

 CSCsj36133
Internally found severe defect: Resolved (R)
Invalid header length BGP notification when sending withdraw


The router that is running the affected software generates enough  
withdraws to fill an entire BGP update message and can generate an  
update message that is 1 or 2 bytes too large when formatting  
withdraws close to the 4096 size boundary.  The error message you  
attached to the service request indicates that you're receiving the  
BGP update with the illegal header length from the provider, correct?

This issue was caused when new features were introduced into the  
12.4(20)T train.  The fix has been integrated into 12.4(20)T2 and will  
also be integrated into 12.4(24)T, when it is released on CCO.

The 12.4(15)T train is unaffected.  So the affected routers could also  
safely move to the latest 12.4(15)T image.




Matthew Huff   | One Manhattanville Rd
OTA Management LLC | Purchase, NY 10577
http://www.ox.com  | Phone: 914-460-4039
aim: matthewbhuff  | Fax:   914-460-4139



 -Original Message-
 From: Renaud RAKOTOMALALA [mailto:ren...@rakotomalala.com]
 Sent: Tuesday, February 24, 2009 10:49 AM
 To: Matthew Huff; 'nanog@nanog.org'
 Subject: Re: Illegal header length in BGP error
 
 Hello Matthew,
 
 We changed the motherboard from cisco one of our from 7206VXR (NPE-G1)
 to 7206VXR (NPE-G2).
 
 Due to incompability with the IOS 12.3(4r)T3 we upgraded this IOS to
 12.4(12.2r)T. At the end we've got the same problem as you between one
 of our 7200 in 12.3 and the new one in 12.4 
 
 We solved the problem by upgrading the cisco withe the IOS from
 12.4(12.2r) to 12.4(4)XD10 and the BGP session came back alive 
 
 So now everything work fine between our 7200 (IOS 12.3) and the other
 7200 in IOS 12.4(4)XD10
 
 I hope it could help you ...
 
 Cheers,
 Renaud
 
 
 Matthew Huff a écrit :
  One of our upstream providers flapped this morning, and since then
 they are
  sending corrupted BPG data. I'm running 12.4(22)T on cisco 7200s. I'm
  getting no BGP errors from that providers and the number of routes
 and basic
  sanity check looks okay. However, when it tries to redistribute the
 bgp
  routes via iBGP to our other board routers, we get:
 
  003372: Feb 24 09:17:13.963 EST: %BGP-5-ADJCHANGE: neighbor x.x.x.x
 Down BGP
  Notification sent
  003373: Feb 24 09:17:13.963 EST: %BGP-3-NOTIFICATION: sent to
 neighbor
  x.x.x.x 1/2 (illegal header length) 2 bytes
 
 
  All routes have identical hardware and IOS versions. My google and
 cisco
  search fu leads me to the AS path length bug, but the interesting
 thing is
  that since we have bgp maxas-limit 75 configured and a recent IOS,
 we
  haven't had the problem before when other people were reporting
 issues. I've
  also looked at the path mtu issue, and although we haven't had a
 problem
  before I disabled bgp mtu path discovery, but have the same issues.
 
  Anyone seeing something like this today, and or does anyone have a
  suggestion on finding out more specific info (which as path for
 example so I
  can filter it)?
 



Matthew Huff.vcf
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature


RE: Illegal header length in BGP error

2009-02-24 Thread Mills, Charles
I ran into exactly the same thing during a code upgrade a few weeks ago.

I wrote it off as a bug in BGP and backed off the code until a new release was 
out.  I was also running 12.4(22)T
On an NPE-G2.

Chuck

-Original Message-
From: Renaud RAKOTOMALALA [mailto:ren...@rakotomalala.com] 
Sent: Tuesday, February 24, 2009 10:49 AM
To: Matthew Huff; 'nanog@nanog.org'
Subject: Re: Illegal header length in BGP error

Hello Matthew,

We changed the motherboard from cisco one of our from 7206VXR (NPE-G1) 
to 7206VXR (NPE-G2).

Due to incompability with the IOS 12.3(4r)T3 we upgraded this IOS to 
12.4(12.2r)T. At the end we've got the same problem as you between one 
of our 7200 in 12.3 and the new one in 12.4 

We solved the problem by upgrading the cisco withe the IOS from 
12.4(12.2r) to 12.4(4)XD10 and the BGP session came back alive 

So now everything work fine between our 7200 (IOS 12.3) and the other 
7200 in IOS 12.4(4)XD10

I hope it could help you ...

Cheers,
Renaud


Matthew Huff a écrit :
 One of our upstream providers flapped this morning, and since then they are
 sending corrupted BPG data. I'm running 12.4(22)T on cisco 7200s. I'm
 getting no BGP errors from that providers and the number of routes and basic
 sanity check looks okay. However, when it tries to redistribute the bgp
 routes via iBGP to our other board routers, we get:

 003372: Feb 24 09:17:13.963 EST: %BGP-5-ADJCHANGE: neighbor x.x.x.x Down BGP
 Notification sent
 003373: Feb 24 09:17:13.963 EST: %BGP-3-NOTIFICATION: sent to neighbor
 x.x.x.x 1/2 (illegal header length) 2 bytes 


 All routes have identical hardware and IOS versions. My google and cisco
 search fu leads me to the AS path length bug, but the interesting thing is
 that since we have bgp maxas-limit 75 configured and a recent IOS, we
 haven't had the problem before when other people were reporting issues. I've
 also looked at the path mtu issue, and although we haven't had a problem
 before I disabled bgp mtu path discovery, but have the same issues.

 Anyone seeing something like this today, and or does anyone have a
 suggestion on finding out more specific info (which as path for example so I
 can filter it)?
   



This e-mail message and any files transmitted with it contain confidential 
information intended only for the person(s) to whom this email message is 
addressed. If you have received this e-mail message in error, please notify the 
sender immediately by telephone or e-mail and destroy the original message 
without making a copy.  Thank you.
Neither this information block, the typed name of the sender, nor anything else 
in this message is intended to constitute an electronic signature unless a 
specific statement to the contrary is included in this message.





Re: Illegal header length in BGP error

2009-02-24 Thread Paul Cosgrove

Are you using PMTUD?

We saw this on a couple of our route reflectors and on one occasion 
picked it up in a capture.   So I can say that the issue is due to bad 
packets being sent, rather than an inaccurate error.  It can be reported 
differently according to where the corruption occurs (e.g. unsupported 
message type, update malformed etc.). 

Two production BGP sessions were affected at different times, and one 
showed errors every few days, the other weeks apart.  Both sessions were 
from route reflectors to other routers receiving full tables, and both 
traversed multiple hops. All other sessions of these routers were fine.  
Whilst investigating we identified that different MTUs were being used 
on the device interfaces at each end of the sessions.  The session on 
which we saw most errors also had lower MTUs on intervening links, so 
PMTUD was suspected to be a factor. 

I replaced one of the paths with a direct link, using identical MTUs, 
and that stopped the errors on that session (since PMTUD had nothing to 
do anymore).  Just to be sure we recreated a multiple hop topology from 
our production route reflectors to isolated lab routers, with low 
intervening link MTUs and ACLs to keep out other unwanted traffic -  
which also produced the same error on those sessions (but only once each 
over three months). 

After correcting all the MTUs in the production network the errors 
ceased completely.  Our test routers shared these links, but also used 
an additional link with a low mtu which we deliberately did not fix; as 
it turned out we not see it again there either so the trigger was not 
entirely clear.


One other thing to note is that, at the time, we were seeing some other 
problems with these production routers, whichcisco believed may have 
been due to SNMP polling of BGP stats.  If you have been changing that 
recently I would also consider it a possibility.


Paul.



Mills, Charles wrote:

I ran into exactly the same thing during a code upgrade a few weeks ago.

I wrote it off as a bug in BGP and backed off the code until a new release was 
out.  I was also running 12.4(22)T
On an NPE-G2.

Chuck

-Original Message-
From: Renaud RAKOTOMALALA [mailto:ren...@rakotomalala.com] 
Sent: Tuesday, February 24, 2009 10:49 AM

To: Matthew Huff; 'nanog@nanog.org'
Subject: Re: Illegal header length in BGP error

Hello Matthew,

We changed the motherboard from cisco one of our from 7206VXR (NPE-G1) 
to 7206VXR (NPE-G2).


Due to incompability with the IOS 12.3(4r)T3 we upgraded this IOS to 
12.4(12.2r)T. At the end we've got the same problem as you between one 
of our 7200 in 12.3 and the new one in 12.4 


We solved the problem by upgrading the cisco withe the IOS from 
12.4(12.2r) to 12.4(4)XD10 and the BGP session came back alive 


So now everything work fine between our 7200 (IOS 12.3) and the other 
7200 in IOS 12.4(4)XD10


I hope it could help you ...

Cheers,
Renaud


Matthew Huff a écrit :
  

One of our upstream providers flapped this morning, and since then they are
sending corrupted BPG data. I'm running 12.4(22)T on cisco 7200s. I'm
getting no BGP errors from that providers and the number of routes and basic
sanity check looks okay. However, when it tries to redistribute the bgp
routes via iBGP to our other board routers, we get:

003372: Feb 24 09:17:13.963 EST: %BGP-5-ADJCHANGE: neighbor x.x.x.x Down BGP
Notification sent
003373: Feb 24 09:17:13.963 EST: %BGP-3-NOTIFICATION: sent to neighbor
x.x.x.x 1/2 (illegal header length) 2 bytes 



All routes have identical hardware and IOS versions. My google and cisco
search fu leads me to the AS path length bug, but the interesting thing is
that since we have bgp maxas-limit 75 configured and a recent IOS, we
haven't had the problem before when other people were reporting issues. I've
also looked at the path mtu issue, and although we haven't had a problem
before I disabled bgp mtu path discovery, but have the same issues.

Anyone seeing something like this today, and or does anyone have a
suggestion on finding out more specific info (which as path for example so I
can filter it)?
  





This e-mail message and any files transmitted with it contain confidential 
information intended only for the person(s) to whom this email message is 
addressed. If you have received this e-mail message in error, please notify the 
sender immediately by telephone or e-mail and destroy the original message 
without making a copy.  Thank you.
Neither this information block, the typed name of the sender, nor anything else 
in this message is intended to constitute an electronic signature unless a 
specific statement to the contrary is included in this message.




  





RE: Illegal header length in BGP error

2009-02-24 Thread Matthew Huff
We were using PMTUD. However:

1) The link was iBGP and was done via crossever with both having default MTU
2) I tried disabling PMTUD with no difference
3) Cisco admitted it was a known bug, and downreving it to 12.4(15)T
resolved the issue.




Matthew Huff   | One Manhattanville Rd
OTA Management LLC | Purchase, NY 10577
http://www.ox.com  | Phone: 914-460-4039
aim: matthewbhuff  | Fax:   914-460-4139



 -Original Message-
 From: Paul Cosgrove [mailto:paul.cosgr...@heanet.ie]
 Sent: Tuesday, February 24, 2009 12:26 PM
 To: Mills, Charles
 Cc: Renaud RAKOTOMALALA; Matthew Huff; nanog@nanog.org
 Subject: Re: Illegal header length in BGP error
 
 Are you using PMTUD?
 
 We saw this on a couple of our route reflectors and on one occasion
 picked it up in a capture.   So I can say that the issue is due to bad
 packets being sent, rather than an inaccurate error.  It can be
 reported
 differently according to where the corruption occurs (e.g. unsupported
 message type, update malformed etc.).
 
 Two production BGP sessions were affected at different times, and one
 showed errors every few days, the other weeks apart.  Both sessions
 were
 from route reflectors to other routers receiving full tables, and both
 traversed multiple hops. All other sessions of these routers were fine.
 Whilst investigating we identified that different MTUs were being used
 on the device interfaces at each end of the sessions.  The session on
 which we saw most errors also had lower MTUs on intervening links, so
 PMTUD was suspected to be a factor.
 
 I replaced one of the paths with a direct link, using identical MTUs,
 and that stopped the errors on that session (since PMTUD had nothing to
 do anymore).  Just to be sure we recreated a multiple hop topology from
 our production route reflectors to isolated lab routers, with low
 intervening link MTUs and ACLs to keep out other unwanted traffic -
 which also produced the same error on those sessions (but only once
 each
 over three months).
 
 After correcting all the MTUs in the production network the errors
 ceased completely.  Our test routers shared these links, but also used
 an additional link with a low mtu which we deliberately did not fix; as
 it turned out we not see it again there either so the trigger was not
 entirely clear.
 
 One other thing to note is that, at the time, we were seeing some other
 problems with these production routers, whichcisco believed may have
 been due to SNMP polling of BGP stats.  If you have been changing that
 recently I would also consider it a possibility.
 
 Paul.
 
 
 
 Mills, Charles wrote:
  I ran into exactly the same thing during a code upgrade a few weeks
 ago.
 
  I wrote it off as a bug in BGP and backed off the code until a new
 release was out.  I was also running 12.4(22)T
  On an NPE-G2.
 
  Chuck
 
  -Original Message-
  From: Renaud RAKOTOMALALA [mailto:ren...@rakotomalala.com]
  Sent: Tuesday, February 24, 2009 10:49 AM
  To: Matthew Huff; 'nanog@nanog.org'
  Subject: Re: Illegal header length in BGP error
 
  Hello Matthew,
 
  We changed the motherboard from cisco one of our from 7206VXR (NPE-
 G1)
  to 7206VXR (NPE-G2).
 
  Due to incompability with the IOS 12.3(4r)T3 we upgraded this IOS to
  12.4(12.2r)T. At the end we've got the same problem as you between
 one
  of our 7200 in 12.3 and the new one in 12.4 
 
  We solved the problem by upgrading the cisco withe the IOS from
  12.4(12.2r) to 12.4(4)XD10 and the BGP session came back alive 
 
  So now everything work fine between our 7200 (IOS 12.3) and the other
  7200 in IOS 12.4(4)XD10
 
  I hope it could help you ...
 
  Cheers,
  Renaud
 
 
  Matthew Huff a écrit :
 
  One of our upstream providers flapped this morning, and since then
 they are
  sending corrupted BPG data. I'm running 12.4(22)T on cisco 7200s.
 I'm
  getting no BGP errors from that providers and the number of routes
 and basic
  sanity check looks okay. However, when it tries to redistribute the
 bgp
  routes via iBGP to our other board routers, we get:
 
  003372: Feb 24 09:17:13.963 EST: %BGP-5-ADJCHANGE: neighbor x.x.x.x
 Down BGP
  Notification sent
  003373: Feb 24 09:17:13.963 EST: %BGP-3-NOTIFICATION: sent to
 neighbor
  x.x.x.x 1/2 (illegal header length) 2 bytes
 
 
  All routes have identical hardware and IOS versions. My google and
 cisco
  search fu leads me to the AS path length bug, but the interesting
 thing is
  that since we have bgp maxas-limit 75 configured and a recent IOS,
 we
  haven't had the problem before when other people were reporting
 issues. I've
  also looked at the path mtu issue, and although we haven't had a
 problem
  before I disabled bgp mtu path discovery, but have the same issues.
 
  Anyone seeing something like this today, and or does anyone have a
  suggestion on finding out more specific info (which as path for
 example so I
  can filter it)?
 
 
 
 
 
  This e-mail message and any files transmitted with it contain