Re: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time

JL Mon, 13 Sep 2010 07:58:22 -0700

OK, More information:

I hope someone who is up on ipvs kernel side is listening!



If a backup machines receives an IPVS state update packet (the ones
sent to 224.0.0.81) with a certain number of connections in it
(somewhere between two and eight, inclusive, will trigger it) then SI
goes to 100% on the backup immediately.

Firewalling 224.0.0.81 insulates you from the problem (although, of
course, is unsuitable for a live deployment).

Feeding in only one connection at a time (slowly enough that the each
have their own IPVS packet) doesn't trigger the problem.

This happens with linux 2.6.35.4, but not 2.6.27.45.


On 13 September 2010 11:40, JL <[email protected]> wrote:
> On 13 September 2010 03:43, 楷子狐 <[email protected]> wrote:
>> I had see this problem before :
>>
>>  http://hi.baidu.com/higkoo/blog/item/f8943c60d16843d28cb10d17.html
>>  ------------------
> Looks like the same thing.
>
> I suspect that the LVS service receives updates from the master, and
> then sticks them in some netfilter table, but with some error that
> makes the table huge. Maybe multiple entries appear?
>
> 楷子狐, Are you using MARK firewall rules, or a different method to
> select packets for LVS?
>
> If I change /proc/sys/net/ipv4/vs/sync_threshold to "3 100000", it
> does *not* fix the problem. Which kind of throws any theory I have had
> out the window.
>
> "ipvsadm -l -c" Gives a lot of kernel messages "Detected stall on CPU
> x". Eventually, however we get the list (which is currently only about
> a dozen entries).
>
> It was fine at linux 2.6.27.45.
>
> # /proc/sys/net/ipv4/vs# grep -H "" *
> am_droprate:10
> amemthresh:1024
> cache_bypass:0
> drop_entry:0
> drop_packet:0
> expire_nodest_conn:0
> expire_quiescent_template:0
> nat_icmp_send:0
> secure_tcp:0
> sync_threshold:3        50
>
> Does anyone have an idea what might be happening here?
>
>>  ------------------ Original ------------------
>>  From:  "JL"<[email protected]>;
>>  Date:  Sun, Sep 12, 2010 06:29 PM
>>  To:  "LinuxVirtualServer.org users mailing 
>> list."<[email protected]>;
>>
>>  Subject:  [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time
>>
>>
>> Hi,
>>
>> I have recently upgraded from kernel 2.6.27.45 to 2.6.35.4.
>>
>> Now, any machine which is a backup (that is, receiving connection
>> updates from another machine) goes to nearly 100% CPU time in Soft
>> Interrupt.
>>
>> Profiling the kernel shows the largest portion of time is spent in 
>> nf_iterate.
>>
>> We are using FWMARK rules to specify traffic for LVS.
>>
>> Is this problem something people are aware of? Does anyone know of a
>> fix or workaround?
>>
>> Thanks,
>> --
>> Jarrod Lowe
>>
>> _______________________________________________
>> Please read the documentation before posting - it's available at:
>> http://www.linuxvirtualserver.org/
>>
>> LinuxVirtualServer.org mailing list - [email protected]
>> Send requests to [email protected]
>> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>> _______________________________________________
>> Please read the documentation before posting - it's available at:
>> http://www.linuxvirtualserver.org/
>>
>> LinuxVirtualServer.org mailing list - [email protected]
>> Send requests to [email protected]
>> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>>
>
>
>
> --
> Jarrod Lowe
>



-- 
Jarrod Lowe

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - [email protected]
Send requests to [email protected]
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

Re: [lvs-users] Kernel 2.6.35 and 100% S.I. CPU Time

Reply via email to