Hi all,

  We are experiencing a very strange problem and would like some help.
We have a Leaf based box (actually a Lince box kernel 2.4.26) running as
a bridge with 8 gigabit ethernets, PIV 3Ghz, 2GB RAM. 4 of them share
the same PCI Express and the other 4 a different PCI bus. We have NAPI
enabled on all ethernets and IRQ moderation enabled (dynamic)

  Some ASCII art before proceeding.

     Router 1               Router 2
        |                       |
        --------- Switch --------
                     |
                     |
                  Firewall

  Both routers use HSRP from Cisco to share information about who is
alive. This app uses multicast UDP packets to 224.0.0.1 address, port
1985.

  The problem is, after a while (1 or 2 minutes) the CPU reaches 100%
(0.99 load 99% System) with the process ksoftirqd_CPU0 reaching 99%.
Using iptraf we discover ethernets 4 to 7 (the ones that share the PCI
bus) are at full speed. The traffic is on port 1985 and comes from the 2
virtual IP from the redundant routers. It seems they enter an infinite
loop and completely kill the system. BTW, the only used ethernets are 0
and 1, both on the PCI-X bus, and eth2 and eth3 seem unaffected (no
traffic). Bear in mind, real traffic on eth0 and eth1 doesnt surpass
1Mbps. Also, no service is provided at this point, not even firewalling.

  The problem appears with and without STP activated and we have
verified there is not a loop in the network.

  If we disable ethernets from 4 to 7 (ip link set ethx down) the
problem seems to disappear, but we are not sure as we didnt want to
disturb the client more time (actually, for 15 minutes the problem didnt
appear, while the other way it appeared in much less than 5 minutes). In
this case, even activating things like a Netflow probe in eth0 dont
disturb at all the system.

  The same problem seems to appear with a Via 1Ghz box with 4 realtek
ethernets and around 4Mbps of traffic (this system was placed under
heavier load, and as the problem appeared, we tested with the big one).
When the problem appeared this box was so slow we could not even make a
ssh session so we dont know if this is the same problem (but bet it is).

  So, some questions:

  1) Is this related to running as a bridge? Would this problem
disappear if we used a pseudo bridge (proxy ARP)?

  2) Can such a beast sustain 8 ethernet as a single bridge? Bear in
mind they dont have gigabit traffic, they just use gigabit ethernets :)
Whats the limit for a linux bridge?

  3) As this traffic is only needed on both routers but doesnt need to
pass trough the firewall, will dropping it on eth0 solve the problem?
(That way there is no way the packets enter into other ethernet ports)
What would happen with other multicast based apps? Would they need to be
dropped too?

  Very thankful in advance. Regards.

-- 
Jaime Nebrera - [EMAIL PROTECTED]
Consultor TI - ENEO Tecnologia SL
Telf.- 95 455 40 62 - 619 04 55 18



-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
------------------------------------------------------------------------
leaf-user mailing list: leaf-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/leaf-user
Support Request -- http://leaf-project.org/

Reply via email to