OK, More information: I hope someone who is up on ipvs kernel side is listening!
If a backup machines receives an IPVS state update packet (the ones sent to 224.0.0.81) with a certain number of connections in it (somewhere between two and eight, inclusive, will trigger it) then SI goes to 100% on the backup immediately. Firewalling 224.0.0.81 insulates you from the problem (although, of course, is unsuitable for a live deployment). Feeding in only one connection at a time (slowly enough that the each have their own IPVS packet) doesn't trigger the problem. This happens with linux 2.6.35.4, but not 2.6.27.45. On 13 September 2010 11:40, JL <[email protected]> wrote: > On 13 September 2010 03:43, 楷子狐 <[email protected]> wrote: >> I had see this problem before : >> >> http://hi.baidu.com/higkoo/blog/item/f8943c60d16843d28cb10d17.html >> ------------------ > Looks like the same thing. > > I suspect that the LVS service receives updates from the master, and > then sticks them in some netfilter table, but with some error that > makes the table huge. Maybe multiple entries appear? > > 楷子狐, Are you using MARK firewall rules, or a different method to > select packets for LVS? > > If I change /proc/sys/net/ipv4/vs/sync_threshold to "3 100000", it > does *not* fix the problem. Which kind of throws any theory I have had > out the window. > > "ipvsadm -l -c" Gives a lot of kernel messages "Detected stall on CPU > x". Eventually, however we get the list (which is currently only about > a dozen entries). > > It was fine at linux 2.6.27.45. > > # /proc/sys/net/ipv4/vs# grep -H "" * > am_droprate:10 > amemthresh:1024 > cache_bypass:0 > drop_entry:0 > drop_packet:0 > expire_nodest_conn:0 > expire_quiescent_template:0 > nat_icmp_send:0 > secure_tcp:0 > sync_threshold:3 50 > > Does anyone have an idea what might be happening here? > >> ------------------ Original ------------------ >> From: "JL"<[email protected]>; >> Date: Sun, Sep 12, 2010 06:29 PM >> To: "LinuxVirtualServer.org users mailing >> list."<[email protected]>; >> >> Subject: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time >> >> >> Hi, >> >> I have recently upgraded from kernel 2.6.27.45 to 2.6.35.4. >> >> Now, any machine which is a backup (that is, receiving connection >> updates from another machine) goes to nearly 100% CPU time in Soft >> Interrupt. >> >> Profiling the kernel shows the largest portion of time is spent in >> nf_iterate. >> >> We are using FWMARK rules to specify traffic for LVS. >> >> Is this problem something people are aware of? Does anyone know of a >> fix or workaround? >> >> Thanks, >> -- >> Jarrod Lowe >> >> _______________________________________________ >> Please read the documentation before posting - it's available at: >> http://www.linuxvirtualserver.org/ >> >> LinuxVirtualServer.org mailing list - [email protected] >> Send requests to [email protected] >> or go to http://lists.graemef.net/mailman/listinfo/lvs-users >> _______________________________________________ >> Please read the documentation before posting - it's available at: >> http://www.linuxvirtualserver.org/ >> >> LinuxVirtualServer.org mailing list - [email protected] >> Send requests to [email protected] >> or go to http://lists.graemef.net/mailman/listinfo/lvs-users >> > > > > -- > Jarrod Lowe > -- Jarrod Lowe _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
