Re: Feature request /Discussion: JK loadbalancer improvements for high load

Rainer Jung Wed, 04 Jul 2007 07:42:31 -0700

Hi,

implementing a management communication between the lb and the backendis on the roadmap for jk3. It is somehow unlikely, that this will helpin your situation, because when doing a GC the jvm will no longerrespond to the management channel. Traditional Mark Sweep Compact GC isnot distinguishable from the outside from a halt in the backend. Ofcourse we could think of a webapp trying to use the JMX info on memoryconsumption to estimate GC activity in advance, but I doubt that thiswill be a stable solution. There are notifications, when GCs happen, butat the moment I'm not sure, if such events exist, before, or only aftera GC.

I think a first step (and a better solution) would be to use modern GCalgorithms like Concurrent Mark Sweep, which will most of the timereduce the GC stop times to some 10s or 100s of milliseconds (dependingon heap size). CMS comes with a cost, a little more memory needed and alittle more CPU needed, but the dramatically decreased stop times areworth it. Also it is quite robust since about 1-2 years.

Other components will not like long GC pauses as well, like for instancecluster replication. There you configure the longest pause you acceptfor missing heartbeat packets before assuming a node is dead. Assuming anode being dead because of GC pauses and then the node suddenly workswithout having noticed itself that it outer world has changed is a verybad situation too.

What we plan as a first step for jk3 is putting mod_jk on the basis ofthe apache APR libraries. Then we can relatively easily use our ownmanagement threads to monitor the backend status and influence thebalancing decisions. As long as we do everything on top of the requesthandling threads we can't do complex things in a stable way.

Getting jk3 out of the door will take some longer time (maybe 6 to 12months'for a release). People willing to help are welcome.

Concerning the SLAs: it always makes sense to put a percentage limit onthe maximum response times and error rates. A 100% below some limitclause will always be to expensive. But of course, if you can't reduceGC times and the GC runs to often, there will be no acceptablepercentage for long running requests.


Thank you for sharing your experiences at Langen with us!

Regards,

Rainer

Yefym Dmukh wrote:

Hi all,
sorry for the stress but it seems that it is a time to come back to thediscussion related to the load balancing for JVM (Tomcat).
Prehistory:
Recently we made benchmark and smoke tests of our product at the sun hightech centre in Langen (Germany).As the webserver apache2.2.4 has been used, container -10xTomcat 5.5.25and as load balancer - JK connector 1.2.23 with busyness algorithm.Under the high load the strange behaviour was observed: sometomcat workers temporary got the non-proportional load, often 10 timeshigher then the others for the relatively long periods. As the result theresponse times that usually stay under 500ms went up to 20+ sec, that inits turn made the overall test results almost two time worst asestimated.At the beginning we were quite confused, because we weresure that it was not the problem of JVM configuration and supposed thatthe reason is in LB logic of mod_jk, and the both suggestions were right.Actually the following was happening: the LB sends requests and gets thesession sticky, continuously sending the upcoming requests to the samecluster node. At the certain period of time the JVM started the majorgarbage collection (full gc) and spent, mentioned above, 20 seconds. Atthe same time jk continued to send new requests and the sticky to noderequests that led us to the situation where the one node broke the SLA onresponse times.I ^ve been searching the web for awhile to find the LoadBalancerimplementation that takes an account the GC activity and reduces the loadaccordingly case JVM is close to the major collection, but nothing found.
Once again the LB of JVMs under the load is really an issue for productionand with optimally distributed load you are able not only to lower thecosts, but also able to prevent bad customer experience, not to mentionbroken SLAs.
Feature request:
All lb algorithms have to be extended with the bidirectionalconnection with jvm:
             Jvm -> Lb: old gen size and the current occupancy
Lb -> Jvm: prevent node overload and advice gc on dependent onparameterized free old gen space in %.
All the ideas and comments are appreciated.

Regards,
Yefym.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Feature request /Discussion: JK loadbalancer improvements for high load

Reply via email to