On 22/03/13 09:57, Roman Haefeli wrote: > > Actually, your post brings > up two options I've been wondering about for quite a while and which I > didn't know: > > * the --wait flag for vzctl start > * batch-limit > > It appears to me that those would solve the issues we've been > experiencing on our clusters. 'crm node online <node>' causes so many > containers to start simultaneously that the IO for the shared NFS where > our CTs are hosted is saturated for quite a while, which in some cases > even lead to some nodes being fenced. > > What you are suggesting would drastically mitigate the problems we're > experiencing and you're describing in your post.
OK, that's good. I suppose the only issue with changing the default to "vzctl start --wait" is that it might cause problems to people with existing setups, who then upgrade to a newer version of ManageVE. The possible problem scenarios would be: 1. A pre-existing start operation timeout is too short, and causes an error to be logged (I believe that in this case a second attempt to start the container will be a NOOP, as the monitor operation will already say that the resource is running in the case that a previous invocation of the start operation was timed out, and killed by the cluster daemons). Would be good to test this tho' (I can test with pacemaker - what else would need testing?). 2. Related to the above, the Debian 6.0 OpenVZ kernels have faulty --wait support (although OpenVZ upstream doesn't recommend you use these kernels, and the Debian OpenVZ maintainer has suggested trying the Debian 7.0 OpenVZ kernels from http://download.openvz.org/debian/ as an alternative workaround). Still, probably this situation will be relatively benign, and will only cause the same one-off timeout behaviour as in 1. above. > Also what you say about the default time for stopping the CT sounds > reasonable. Luckily, we already set a higher timeout in our clusters > (without having known about vzctl's behavior > OK, good to know. > In my opinion, the --wait option would improve the current situation > significantly enough to justify the change in the agent's behavior. Of > course, this needs to be tested first. However, the current RA has flaws > (at least in certain setups like in ours) and I like to help improve it. > OK, good to know there's someone else at least using it... > Are you also running your CTs on an NFS share? I could imagine some > problems we experience might be related to that. > Nope, we're using DRBD as the backing store. Each hardware node has a single Intel SSD, and pairs of cluster nodes co-host a particular DRBD resource on top of the SSD - approx 10 VEs per DRBD, and multiple DRBDs per SSD. Cheers, Tim. -- South East Open Source Solutions Limited Registered in England and Wales with company number 06134732. Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309 _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/