Hi,
First of all let me thank you for your help and attention and for pointing me to the LVS users mailing list. I wasn't aware of it. On Wed, Nov 9, 2011 at 12:58 PM, Graeme Fowler <[email protected]> wrote: > [copying in the LVS users list] > > On Wed, 2011-11-09 at 12:04 -0200, Rodrigo Severo wrote: > > I have been using keepalived for some years now. > > > > For some time now keepalived has started to fail when updating VS on > > the kernel. This kind of thing happens after some time where > > keepalived is working perfectly, i.e., failed servers been succesfully > > removed and returned servers successfully added to VSs. Just after > > keepalived is started everything works fine. After some time it starts > > to fail to update the VSs on the kernel. > > > > To make it work again I just have to restart keepalived. > > > > The error message I get on these failures are like: > > > > [Keepalived_healthcheckers] IPVS: Invalid operation. Possibly wrong > > module version, address not unicast, ... > > > > > > It's important to observe that the same exact operation that works > > fine just after keepalived is started will fail with the above error > > after some time (one or two hours) so the suggestions on the error > > message - wrong module version, wrong kind of address - can be safely > > discarded as causes of the problem. > > > > I'm using Gentoo with kernel 3.0.6 and keepalived 1.2.2. > > > > Any suggestions on how I can further debug this issue? > > Yes. Please grab the log lines which indicate keepalived starting, doing > stuff to servers, then failing to do stuff to servers and send it to > [email protected]. I think we need to see timing, the > number of operations done and so on. > Here is a example: http://pastebin.com/uwzKKGXh Please observe that all VS updates up to 11:51 worked fine. Both updates after 14:14 failed with the above error message. You will also see that there aren't many updates happening. > Your kernel is "out there" some way ahead of large numbers of the rest > of the world who lag behind on the 2.6.x branch. I suspect something > isn't quite right in the IPVS code in 3.0.x but I couldn't say what it > is. > If you believe the kernel version might be to blame, I can try some older one. Do you have a suggestion of version to test? Versions 2.6.39, 2.6.38 and 2.6.32 are specially easy to test but I can test any other version you believe is important. I forget to mention in my first message what I believe is causing the problem: some kind of timeout on the socket used by keepalived to communicate with the kernel. I don't have any particular info pointing to this except the fact that everything works for some time after keepalived is started and after some time it stops. Unfortunately I don't know how would I test this hypothesis. -- --------------------------------------------------------------------------------------- Rodrigo Severo Fábrica de Idéias SBS Quadra 2 - Bloco S - Ed. Empire Center - Sala 1.301 Brasília - DF - CEP 70070-904 Tel. (61) 3321-1357 Fax (61) 3223-1712 --------------------------------------------------------------------------------------- _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
