On Jun 6, 2006, at 4:42 PM, Nick Burke wrote:
How many of you have actually use(d) Zebra/Linux as a routing
device (core and/or regional, I'd be interested in both) in a
production (read: 99.999% required, hsrp, bgp, dot1q, other
goodies) environment?
And, if you care to spend this much time, what pitfalls/benefits
did you find out about after implementation?
We started out on a FreeBSD/Zebra routing solution for our company
(content provider). While it did work acceptably for many years, it
wasn't what I'd call robust.
The "router" was a single P4 2.4GHz server. We had 4 GigE ports to 4
uplinks, each giving us a full BGP feed. Then two more GigE ports to
our switches. We could route over 750mbps easily, without any packet
loss or latency.
The biggest issue we'd have was Zebra's single-threadedness. After a
restart of bgpd, it would spend so much CPU time handling the BGP
updates that it would get very very behind in processing BGP
keepalives, and our sessions would time out before it had finished
handling the initial burst. I'd have to shut down all sessions, then
bring them up one at a time. It wasn't so much bgpd taking that much
CPU, but bgpd not having very much left after the server was handling
a few hundred mbps of traffic. Perhaps a dual CPU server would have
worked better, but we never tried.
There were also issues where you could get two zebra routers
deadlocked - they'd both have many megabytes of BGP updates to send
each other, and both would want to send a full update until
completion before accepting any data in. Mucking with the kernel to
allow TCP sockets to have a 16MB receive buffer helped, but still
wasn't a cure.
You're also giving up things like RIBs, fancy queuing/rate limiting,
and any kind of hardware acceleration. Doing hundreds of megabits is
easy, but software based routers seem to have trouble under DoS
situations (lots of tiny packets) quicker.
However, it was about as close to free as you could get. We re-used
an old server, and only had to buy some 2 port ethernet cards.
Support for Zebra is pretty iffy though. More often than not, I'd
post a message to the Zebra mailing list to report a bug, and would
get a "Yeah, known bug!" reply. The original author has all but
abandoned development, leading to a fork called Quagga. Quagga is
better (we still use it in a few places), but is still mostly a
polished up Zebra.
In the end, we needed to start pushing more traffic than we were able
get our Zebra box to do. A couple 20+ minute outages during peak
usage because of deadlocked bgpd processes helped my case that we
needed to buy some Junipers instead.
I know you're not giving specifics, but any kind of description of
just how much traffic you're intending to push and how many ports you
need would help in giving relevant advice. If you're talking about 1
BGP feed for 10mbps, I'd say go for it. If you're talking about a
dozen sessions, and 2gbps of traffic... no way. Where you are between
those is what really matters.