Re: mod_proxy hooks for clustering and load balancing
On 06.05.2009 22:31, Jim Jagielski wrote: > > On May 6, 2009, at 4:20 PM, Graham Leggett wrote: > >> Jim Jagielski wrote: >> >>> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality. >>> >>> Not saying that releasing 2.4 isn't worth it, but there have been >>> stops and >>> starts all along the way, and I think we need to be clear on what we >>> expect 2.4 to be. Until then, we have no clear defining line on when >>> 2.4 is "done." >> >> Is there anything additional that we want v2.4 to do over and above what >> it does now? >> > > Well, that's the question, isn't it? I can't align the idea > of trunk being a candidate for 2.4 and trunk being a place for > people to experiment... > > What do we want 2.4 to be and do. And how. > > Once we define (and agree) to that, we know how close (or far) > trunk is. It sounds like we have some set that wants to break > trunk apart and totally refactor a lot of it, and that's a big +1. > It's also not a 3-4 month effort :) It also sounds like there > are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared > to 2.4, and a big +1 to that as well. But BOTH of these are using > the exact same dev branch, and there's no general agreement on which > we want... if you get my point ;) > > If we branch off 2.4 right now from trunk, and say that this becomes > our next main release, and the idea is to clean up what is there, > and, for new experimental stuff, develop on trunk 1st and then > backport to 2.4, I'll jump right on in, since it means code will > be out and released and *used* sooner! I expect there is already enough interesting new material in trunk right now and users will find it a valuable 2.4. So yes, focus should be on getting this out of the door. If any trunk experiments turn out to be ready before 2.4.0 and backport is low risk, we can backport (also if they are compatible with 2.4.x we can backport later than 2.4.0). To be clear: the OP (new mod_proxy hooks) might not be compatible with the momentary trunk (breaking configuration), so before branching, the question would also be: does anyone want to do something important soon, that he wants to be in 2.4 and will be incompatible API wise or configuration wise with the momentary trunk. Regards, Rainer
Re: mod_proxy hooks for clustering and load balancing
Graham Leggett wrote: > Jim Jagielski wrote: > >> Once we define (and agree) to that, we know how close (or far) >> trunk is. It sounds like we have some set that wants to break >> trunk apart and totally refactor a lot of it, and that's a big +1. >> It's also not a 3-4 month effort :) It also sounds like there >> are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared >> to 2.4, and a big +1 to that as well. But BOTH of these are using >> the exact same dev branch, and there's no general agreement on which >> we want... if you get my point ;) Understandable; when we have something suitable for consideration as an alpha, it's time to fork. If we have something in trunk that is clearly not slated for the next version, then we must fork, rm, and tag. Right now I'm not clear that anything in trunk is destined for svn rm from the 2.4 branch. E.g. the mod_proxy devs are just as keen as the aaa refactoring and core httpd devs to put their changes out there in the public sphere, and see what happens. But the moment we have anything that looks like a beta, there is going to need to be a fork. Do you intend that 2.3.x branch -> 2.4.x branch to be C-T-R or R-T-C? > I think the bit that divides these in two is APR v2.0. APR v2.0 is irrelevant, IMHO. It would be nice, but there is plenty of work to do there, and since there are plenty of interfaces in need of renaming and deprecation, out-of-sorts arg lists which can't decide if apr_pool_t * is the first or last arg, by convention (yes, there was a convention), and so on, I don't think we should 'wait' for apr 2. If it beats httpd to release, great. If not, oh well :)
Re: mod_proxy hooks for clustering and load balancing
Jim Jagielski wrote: > Well, that's the question, isn't it? I can't align the idea > of trunk being a candidate for 2.4 and trunk being a place for > people to experiment... > > What do we want 2.4 to be and do. And how. > > Once we define (and agree) to that, we know how close (or far) > trunk is. It sounds like we have some set that wants to break > trunk apart and totally refactor a lot of it, and that's a big +1. > It's also not a 3-4 month effort :) It also sounds like there > are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared > to 2.4, and a big +1 to that as well. But BOTH of these are using > the exact same dev branch, and there's no general agreement on which > we want... if you get my point ;) I think the bit that divides these in two is APR v2.0. People have begun refactoring APR to produce APR v2.0, and alongside this will be a corresponding refactoring of httpd, that I think should be httpd v3.0. I think httpd v2.4 should be what we have now, against the latest APR v1 we have now, which is the APR v1.4 branch. I think practically we should focus on getting httpd v2.4 out the door, and make httpd ready for v3.0 to happen, against APR v2.0. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: svn commit: r771998
Ruediger Pluem wrote: > > The issue happend on RHEL 4 64 Bit. FWIW - FC10 x86_64 is my own default testing schema.
Re: mod_proxy hooks for clustering and load balancing
On 05/06/2009 10:31 PM, Jim Jagielski wrote: > > On May 6, 2009, at 4:20 PM, Graham Leggett wrote: > >> Jim Jagielski wrote: >> >>> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality. >>> >>> Not saying that releasing 2.4 isn't worth it, but there have been >>> stops and >>> starts all along the way, and I think we need to be clear on what we >>> expect 2.4 to be. Until then, we have no clear defining line on when >>> 2.4 is "done." >> >> Is there anything additional that we want v2.4 to do over and above what >> it does now? >> > > Well, that's the question, isn't it? I can't align the idea > of trunk being a candidate for 2.4 and trunk being a place for > people to experiment... > > What do we want 2.4 to be and do. And how. > > Once we define (and agree) to that, we know how close (or far) > trunk is. It sounds like we have some set that wants to break > trunk apart and totally refactor a lot of it, and that's a big +1. > It's also not a 3-4 month effort :) It also sounds like there > are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared > to 2.4, and a big +1 to that as well. But BOTH of these are using > the exact same dev branch, and there's no general agreement on which > we want... if you get my point ;) > > If we branch off 2.4 right now from trunk, and say that this becomes > our next main release, and the idea is to clean up what is there, > and, for new experimental stuff, develop on trunk 1st and then > backport to 2.4, I'll jump right on in, since it means code will > be out and released and *used* sooner! Again a full +1 to all this. Regards Rüdiger
Re: mod_proxy hooks for clustering and load balancing
On May 6, 2009, at 4:20 PM, Graham Leggett wrote: Jim Jagielski wrote: I'll stop worrying about 2.2 when 2.4 comes closer to being a reality. Not saying that releasing 2.4 isn't worth it, but there have been stops and starts all along the way, and I think we need to be clear on what we expect 2.4 to be. Until then, we have no clear defining line on when 2.4 is "done." Is there anything additional that we want v2.4 to do over and above what it does now? Well, that's the question, isn't it? I can't align the idea of trunk being a candidate for 2.4 and trunk being a place for people to experiment... What do we want 2.4 to be and do. And how. Once we define (and agree) to that, we know how close (or far) trunk is. It sounds like we have some set that wants to break trunk apart and totally refactor a lot of it, and that's a big +1. It's also not a 3-4 month effort :) It also sounds like there are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared to 2.4, and a big +1 to that as well. But BOTH of these are using the exact same dev branch, and there's no general agreement on which we want... if you get my point ;) If we branch off 2.4 right now from trunk, and say that this becomes our next main release, and the idea is to clean up what is there, and, for new experimental stuff, develop on trunk 1st and then backport to 2.4, I'll jump right on in, since it means code will be out and released and *used* sooner!
Re: Calling usage() from the rewrite args hook?
On 05/06/2009 10:13 PM, Rainer Jung wrote: > On 06.05.2009 21:33, Rainer Jung wrote: >> While working on additional windows commandline options I noticed, that >> there is no consistent checking for validity of the "-k" arguments. >> >> Those arguments are handled by the rewrite args hook, and some MPMs seem >> to care somehow about invalid or duplicate "-k" arguments (e.g. Unix >> outputs a somewhat wrong error message about -k being unknown), windows >> seems to start the service when an unknown command is given. >> >> I would like to output a better error message for unknown or duplicate >> commands (easy) and then call usage(). At the moment usage() is static >> in main.c and there is not a single header file directly in the server/ >> directory. >> >> So I would like to add a private header file in the server directory, at >> the moment only containing usage() and switch usage() from static to >> AP_DECLARE. Would that be right? I guess we have no reason to include it >> in the public include files contained in includes/. >> >> Another possibility would be to let the rewrite args hook return a >> value, stop processing as soon as one module doesn't return OK and call >> the usage whenever rewrite args for a module do not return OK. I think >> that's a bit fragile, because in the long run, more return values might >> show up and cases, where usage() is not the right way to react, so >> making usage() available in the mpm-specific module argument handling >> seems to be the better way. >> >> Of course the usage message at the moment doesn't really reflect the MPM >> architecture. The commandline specialities are reflected by ifdefs in >> the usage, but that's another topic. For the above needed hardening of >> parsing the -k flags, the usage message as it exists today is enough. >> >> Comments? > > While experimenting with that: on Windows mpm_winnt.c can't use > something in main.c, only vice versa. So usage() would have to be moved > from main.c into any other file included in libhttpd (or a new one). > Still the question is: is this the right way to go? If so, should I add > a small new file with usage(), since the names of the existing ones do > not really make a good fit? How about the existing util.c? Regards Rüdiger
Re: mod_proxy hooks for clustering and load balancing
On 05/06/2009 10:09 PM, Jim Jagielski wrote: > > On May 6, 2009, at 3:32 PM, William A. Rowe, Jr. wrote: > >> >> We should experiment freely on trunk/ to come up with the right >> solutions, >> and also freely discard those solutions from the next release branch. >> But >> we shouldn't throw changes willy nilly over to 2.2, but as Paul says, >> let's >> focus on making 2.4 "the best available version of Apache" with all of >> the >> new facilities that come with it. >> > > What new facilities? Are we moving to serf, for example? > > It's for these reason that I suggested that before we start breaking > trunk by adding stuff willy-nilly we branch off a 2.4 tree... If we > would say "this is 2.4... clean it up and fix it" then we could get > 2.4 out soon. Instead, we have trunk being both a experimental > sandbox for new stuff and the *only* place these will see the light > anytime soon is in backporting to 2.2. > > If we are serious about 2.4, we branch now. trunk remains the dev branch > and we clean-up the 2.4 branch and backport from trunk to 2.4... > We cannot "experiment freely" on trunk and at the same time try > to focus down a 2.4 release... > > +1 Regards Rüdiger
Re: svn commit: r771998
On 05/06/2009 09:54 PM, William A. Rowe, Jr. wrote: > Plüm, Rüdiger, VF-Group wrote: >> This causes trunk to fail compilation with: >> >> make[1]: *** No rule to make target `modules/mappers/libmod_so.la', needed >> by `httpd'. Stop. >> make: *** [all-recursive] Error 1 > > Please don't do that, you have everyone chasing down if they owned 771998. > At least leave a log message with your reply to spare most folks attention. Sorry for that I didn't have the original svn mail at hand at this point of time. Will improve my communication next time in the same situation. > > ITMT; I'm reviewing. I didn't see this on retesting linux, will try clean > on Solaris just for fun (and for the fact that I think I have this VM ready > for some action). > > You did re-./buildconf, make clean and make, right? Sure. To be more precise I did make extraclean ./buidlconf ./config.nice make The issue happend on RHEL 4 64 Bit. The same happens on SuSE 10.2 32 Bit. Regards Rüdiger
Re: mod_proxy hooks for clustering and load balancing
Graham Leggett wrote: > Jim Jagielski wrote: > >> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality. >> >> Not saying that releasing 2.4 isn't worth it, but there have been stops and >> starts all along the way, and I think we need to be clear on what we >> expect 2.4 to be. Until then, we have no clear defining line on when >> 2.4 is "done." > > Is there anything additional that we want v2.4 to do over and above what > it does now? Conversely, anything that isn't done correctly yet in v2.4 that must be refactored, or anything that is done in v2.4 that we simply wouldn't want to release with 2.4.0?
Re: mod_proxy hooks for clustering and load balancing
Jim Jagielski wrote: > I'll stop worrying about 2.2 when 2.4 comes closer to being a reality. > > Not saying that releasing 2.4 isn't worth it, but there have been stops and > starts all along the way, and I think we need to be clear on what we > expect 2.4 to be. Until then, we have no clear defining line on when > 2.4 is "done." Is there anything additional that we want v2.4 to do over and above what it does now? Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: mod_proxy hooks for clustering and load balancing
Paul Querna wrote: > Stop worrying about 2.2, and just focus on doing it right -- then ship > 2.4 in 3-4 months imo, trunk really isn't that far off from being a > decent 2.4, it just needs some cleanup in a few areas. It has already > been 3.5 years since 2.2.0 came out, its time to move on in my > opinion. +1. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
Re: Calling usage() from the rewrite args hook?
On 06.05.2009 21:33, Rainer Jung wrote: > While working on additional windows commandline options I noticed, that > there is no consistent checking for validity of the "-k" arguments. > > Those arguments are handled by the rewrite args hook, and some MPMs seem > to care somehow about invalid or duplicate "-k" arguments (e.g. Unix > outputs a somewhat wrong error message about -k being unknown), windows > seems to start the service when an unknown command is given. > > I would like to output a better error message for unknown or duplicate > commands (easy) and then call usage(). At the moment usage() is static > in main.c and there is not a single header file directly in the server/ > directory. > > So I would like to add a private header file in the server directory, at > the moment only containing usage() and switch usage() from static to > AP_DECLARE. Would that be right? I guess we have no reason to include it > in the public include files contained in includes/. > > Another possibility would be to let the rewrite args hook return a > value, stop processing as soon as one module doesn't return OK and call > the usage whenever rewrite args for a module do not return OK. I think > that's a bit fragile, because in the long run, more return values might > show up and cases, where usage() is not the right way to react, so > making usage() available in the mpm-specific module argument handling > seems to be the better way. > > Of course the usage message at the moment doesn't really reflect the MPM > architecture. The commandline specialities are reflected by ifdefs in > the usage, but that's another topic. For the above needed hardening of > parsing the -k flags, the usage message as it exists today is enough. > > Comments? While experimenting with that: on Windows mpm_winnt.c can't use something in main.c, only vice versa. So usage() would have to be moved from main.c into any other file included in libhttpd (or a new one). Still the question is: is this the right way to go? If so, should I add a small new file with usage(), since the names of the existing ones do not really make a good fit? Regards, Rainer
Re: mod_proxy hooks for clustering and load balancing
On May 6, 2009, at 3:32 PM, William A. Rowe, Jr. wrote: We should experiment freely on trunk/ to come up with the right solutions, and also freely discard those solutions from the next release branch. But we shouldn't throw changes willy nilly over to 2.2, but as Paul says, let's focus on making 2.4 "the best available version of Apache" with all of the new facilities that come with it. What new facilities? Are we moving to serf, for example? It's for these reason that I suggested that before we start breaking trunk by adding stuff willy-nilly we branch off a 2.4 tree... If we would say "this is 2.4... clean it up and fix it" then we could get 2.4 out soon. Instead, we have trunk being both a experimental sandbox for new stuff and the *only* place these will see the light anytime soon is in backporting to 2.2. If we are serious about 2.4, we branch now. trunk remains the dev branch and we clean-up the 2.4 branch and backport from trunk to 2.4... We cannot "experiment freely" on trunk and at the same time try to focus down a 2.4 release...
Re: svn commit: r771998
Plüm, Rüdiger, VF-Group wrote: > > This causes trunk to fail compilation with: > > make[1]: *** No rule to make target `modules/mappers/libmod_so.la', needed by > `httpd'. Stop. > make: *** [all-recursive] Error 1 Please don't do that, you have everyone chasing down if they owned 771998. At least leave a log message with your reply to spare most folks attention. ITMT; I'm reviewing. I didn't see this on retesting linux, will try clean on Solaris just for fun (and for the fact that I think I have this VM ready for some action). You did re-./buildconf, make clean and make, right?
Calling usage() from the rewrite args hook?
While working on additional windows commandline options I noticed, that there is no consistent checking for validity of the "-k" arguments. Those arguments are handled by the rewrite args hook, and some MPMs seem to care somehow about invalid or duplicate "-k" arguments (e.g. Unix outputs a somewhat wrong error message about -k being unknown), windows seems to start the service when an unknown command is given. I would like to output a better error message for unknown or duplicate commands (easy) and then call usage(). At the moment usage() is static in main.c and there is not a single header file directly in the server/ directory. So I would like to add a private header file in the server directory, at the moment only containing usage() and switch usage() from static to AP_DECLARE. Would that be right? I guess we have no reason to include it in the public include files contained in includes/. Another possibility would be to let the rewrite args hook return a value, stop processing as soon as one module doesn't return OK and call the usage whenever rewrite args for a module do not return OK. I think that's a bit fragile, because in the long run, more return values might show up and cases, where usage() is not the right way to react, so making usage() available in the mpm-specific module argument handling seems to be the better way. Of course the usage message at the moment doesn't really reflect the MPM architecture. The commandline specialities are reflected by ifdefs in the usage, but that's another topic. For the above needed hardening of parsing the -k flags, the usage message as it exists today is enough. Comments? Regards, Rainer
Re: mod_proxy hooks for clustering and load balancing
Jim Jagielski wrote: > > I'll stop worrying about 2.2 when 2.4 comes closer to being a reality. > > Not saying that releasing 2.4 isn't worth it, but there have been stops and > starts all along the way, and I think we need to be clear on what we > expect 2.4 to be. Until then, we have no clear defining line on when > 2.4 is "done." Nice of the mod_proxy pot calling the httpd kettle black :-)
Re: mod_proxy hooks for clustering and load balancing
Paul Querna wrote: > On Wed, May 6, 2009 at 11:46 AM, Jim Jagielski wrote: >> >> The incremental changes are just so we can keep 2.2's proxy somewhat >> useful and flexible enough to survive until the next revamp. > > Stop worrying about 2.2, and just focus on doing it right -- then ship > 2.4 in 3-4 months imo, trunk really isn't that far off from being a > decent 2.4, it just needs some cleanup in a few areas. It has already > been 3.5 years since 2.2.0 came out, its time to move on in my > opinion. +1. If there is a 2.2 /bug/ let's fix it. If the mod_proxy crew wants to offer a proof-of concept, 2.4 preview as its own download, terrific! But I'm quite -1 to fundamentally modifying the structure or feature set that ships for 2.2-stable. The recent history in 2.2 of de-stable-izing changes to the released branch, and incomplete features, leaves me pretty frustrated with the lack of review. Before /not/ voting -1 to bringing in these major changes, I'd need to see issues.apache.org seriously purged of its 95 incidents, many of them new and most of them actively triaged by our users@ community and some dedicated d...@s. I'd need to see balancer /actually documented/ in a way that users can read and that is partitioned by the module that is loaded. We should experiment freely on trunk/ to come up with the right solutions, and also freely discard those solutions from the next release branch. But we shouldn't throw changes willy nilly over to 2.2, but as Paul says, let's focus on making 2.4 "the best available version of Apache" with all of the new facilities that come with it. What is especially scary is that the providers are still all using mod_proxy for their entire config schema, for directives which make no sense to any other proxy provider. It would be great to see some serious cleanup, in addition to all the enthusiasm to expand mod_proxy.
Re: mod_proxy hooks for clustering and load balancing
On 06.05.2009 20:26, Paul Querna wrote: > There is lots of discussion about fixing mod_proxy and > mod_proxy_balancer, to try to make it do things that the APIs are just > broken for, and right now, it seems from the outside to be turning > into a ball of mud. > > I think the right way to frame the discussion is, how should the API > optimally be structured -- then change the existing one to be closer > to it, rather than the barrage of incremental changes that seem to be > creating lots of cruft, and ending up with something that still > doesn't do what we want. > > I think mod_proxy's decisions on what to proxy to, and where, should > be designed as a series of hooks/providers, specifically: > > 1) Provider for a list of backends -- This provider does nothing with > balancing, just provides a list of Backend Definition (preferably just > keep it apr_sockaddr_t?) that a Connection is able to use. -- Backend > status via multicast or other methods go here. > 2) Provider that _sorts_ the list of backends. Input is a list, > output is a new ordered list. -- Sticky sesions go here, along with > any load based balancing. > 3) Provider that given a Backend Definition, returns a connection. > (pools connections, or open new one, whatever) -- Some of the > proxy_util and massive worker objects go here. > > Using this structure, you can implement a dynamic load balancer > without having to modify the core. I think the key is to _stop_ > passing around the gigantic monolithic proxy_worker structures, and go > to having providers that do simple operations: get a list, sort the > list, get me a connection. > > Thoughts? Sounds good. The provider in 2) needs a second functionality/API to feed back the results of a request: - whether the backend was detected as being broken - when using piggybacked load data or (as we do today) locally generated load data the updates must be done against the provider 2) after the response has been received (or in the case of busyness once before forwarding the request and once after receiving the response) Those providers look similar to what I called "topology manager" and "state manager" and you want to include the balancing/stickyness decision into the state manager. My remark above somehow indicates, that the provider 2) needs to make the decision *and* to update the data ón which the decision is based. This data update could happen behind the scenes, but in most cases it will need an API to be driven by the request handling component. Regards, Rainer
Re: mod_proxy hooks for clustering and load balancing
On Wed, May 6, 2009 at 12:04 PM, Jim Jagielski wrote: > > On May 6, 2009, at 2:53 PM, Paul Querna wrote: > >> >> Stop worrying about 2.2, and just focus on doing it right -- then ship >> 2.4 in 3-4 months imo, trunk really isn't that far off from being a >> decent 2.4, it just needs some cleanup in a few areas. It has already >> been 3.5 years since 2.2.0 came out, its time to move on in my >> opinion. >> > > I'll stop worrying about 2.2 when 2.4 comes closer to being a reality. > > Not saying that releasing 2.4 isn't worth it, but there have been stops and > starts all along the way, and I think we need to be clear on what we > expect 2.4 to be. Until then, we have no clear defining line on when > 2.4 is "done." it can be done today, just start cutting alpha releases :) i'm pretty close to ENOTIME though, so if someone wants to step up and start RMing alphas that would be very nice...
Re: mod_proxy hooks for clustering and load balancing
On May 6, 2009, at 2:53 PM, Paul Querna wrote: Stop worrying about 2.2, and just focus on doing it right -- then ship 2.4 in 3-4 months imo, trunk really isn't that far off from being a decent 2.4, it just needs some cleanup in a few areas. It has already been 3.5 years since 2.2.0 came out, its time to move on in my opinion. I'll stop worrying about 2.2 when 2.4 comes closer to being a reality. Not saying that releasing 2.4 isn't worth it, but there have been stops and starts all along the way, and I think we need to be clear on what we expect 2.4 to be. Until then, we have no clear defining line on when 2.4 is "done."
Re: mod_proxy hooks for clustering and load balancing
On Wed, May 6, 2009 at 11:46 AM, Jim Jagielski wrote: > > On May 6, 2009, at 2:26 PM, Paul Querna wrote: > >> Hi, >> >> I think the right way to frame the discussion is, how should the API >> optimally be structured -- then change the existing one to be closer >> to it, rather than the barrage of incremental changes that seem to be >> creating lots of cruft, and ending up with something that still >> doesn't do what we want. >> >> I think mod_proxy's decisions on what to proxy to, and where, should >> be designed as a series of hooks/providers, specifically: >> >> 1) Provider for a list of backends -- This provider does nothing with >> balancing, just provides a list of Backend Definition (preferably just >> keep it apr_sockaddr_t?) that a Connection is able to use. -- Backend >> status via multicast or other methods go here. >> 2) Provider that _sorts_ the list of backends. Input is a list, >> output is a new ordered list. -- Sticky sesions go here, along with >> any load based balancing. >> 3) Provider that given a Backend Definition, returns a connection. >> (pools connections, or open new one, whatever) -- Some of the >> proxy_util and massive worker objects go here. >> > > I recall at one of the hackthons this being proposed and I > think it's the right one... It's a clear separation of functions > similar to the changes we've done in authn/authz, moving from > monolithic to more structured and defined concerns. > > The incremental changes are just so we can keep 2.2's proxy somewhat > useful and flexible enough to survive until the next revamp. Stop worrying about 2.2, and just focus on doing it right -- then ship 2.4 in 3-4 months imo, trunk really isn't that far off from being a decent 2.4, it just needs some cleanup in a few areas. It has already been 3.5 years since 2.2.0 came out, its time to move on in my opinion.
Re: mod_proxy hooks for clustering and load balancing
On May 6, 2009, at 2:26 PM, Paul Querna wrote: Hi, I think the right way to frame the discussion is, how should the API optimally be structured -- then change the existing one to be closer to it, rather than the barrage of incremental changes that seem to be creating lots of cruft, and ending up with something that still doesn't do what we want. I think mod_proxy's decisions on what to proxy to, and where, should be designed as a series of hooks/providers, specifically: 1) Provider for a list of backends -- This provider does nothing with balancing, just provides a list of Backend Definition (preferably just keep it apr_sockaddr_t?) that a Connection is able to use. -- Backend status via multicast or other methods go here. 2) Provider that _sorts_ the list of backends. Input is a list, output is a new ordered list. -- Sticky sesions go here, along with any load based balancing. 3) Provider that given a Backend Definition, returns a connection. (pools connections, or open new one, whatever) -- Some of the proxy_util and massive worker objects go here. I recall at one of the hackthons this being proposed and I think it's the right one... It's a clear separation of functions similar to the changes we've done in authn/authz, moving from monolithic to more structured and defined concerns. The incremental changes are just so we can keep 2.2's proxy somewhat useful and flexible enough to survive until the next revamp.
Re: mod_proxy hooks for clustering and load balancing
Paul Querna wrote: > Using this structure, you can implement a dynamic load balancer > without having to modify the core. I think the key is to _stop_ > passing around the gigantic monolithic proxy_worker structures, and go > to having providers that do simple operations: get a list, sort the > list, get me a connection. > > Thoughts? +1 on all of it. Regards, Graham -- smime.p7s Description: S/MIME Cryptographic Signature
mod_proxy hooks for clustering and load balancing
Hi, There is lots of discussion about fixing mod_proxy and mod_proxy_balancer, to try to make it do things that the APIs are just broken for, and right now, it seems from the outside to be turning into a ball of mud. I think the right way to frame the discussion is, how should the API optimally be structured -- then change the existing one to be closer to it, rather than the barrage of incremental changes that seem to be creating lots of cruft, and ending up with something that still doesn't do what we want. I think mod_proxy's decisions on what to proxy to, and where, should be designed as a series of hooks/providers, specifically: 1) Provider for a list of backends -- This provider does nothing with balancing, just provides a list of Backend Definition (preferably just keep it apr_sockaddr_t?) that a Connection is able to use. -- Backend status via multicast or other methods go here. 2) Provider that _sorts_ the list of backends. Input is a list, output is a new ordered list. -- Sticky sesions go here, along with any load based balancing. 3) Provider that given a Backend Definition, returns a connection. (pools connections, or open new one, whatever) -- Some of the proxy_util and massive worker objects go here. Using this structure, you can implement a dynamic load balancer without having to modify the core. I think the key is to _stop_ passing around the gigantic monolithic proxy_worker structures, and go to having providers that do simple operations: get a list, sort the list, get me a connection. Thoughts? Thanks, Paul
Re: mod_proxy / mod_proxy_balancer
On May 6, 2009, at 1:00 PM, William A. Rowe, Jr. wrote: Jim Jagielski wrote: That's why oob-like health-and-status chatter is nice, because it doesn't interfere with the normal reverse-proxy/host logic. +1, for a backend of unknown status (let's just say it's a few minutes old, effectively useless information now) ping/pong is the right first approach. But... An idea: Instead of asking for this info before sending the request, what about the backend sending it as part of the response, as a response header. You don't know that status of the machine "now", but you do know the status of it right after it handled the last request (the last time you saw it) and, assuming nothing else touched it, that status is likely still "good". Yup; that seems like the only sane approach, add an X-Backend-Status or whatnot to report the load or other health data. For example, how long it took me (the backend server) to handle this request... would be useful to know *that* in additional to the typical "round-trip" time :)
Re: mod_proxy / mod_proxy_balancer
Jim Jagielski wrote: > > That's why oob-like health-and-status chatter is nice, because > it doesn't interfere with the normal reverse-proxy/host logic. +1, for a backend of unknown status (let's just say it's a few minutes old, effectively useless information now) ping/pong is the right first approach. But... > An idea: Instead of asking for this info before sending the > request, what about the backend sending it as part of the response, > as a response header. You don't know that status of the machine > "now", but you do know the status of it right after it handled the last > request (the last time you saw it) and, assuming nothing else touched > it, that status is likely still "good". Yup; that seems like the only sane approach, add an X-Backend-Status or whatnot to report the load or other health data. It's easily consumed (erased) from the front end response. If done correctly in a backend server, it can convey information from the ultimate back end resources that actually cause the congestion (DB servers or whatnot) rather than the default response (CPU or whatnot) at the middle tier..
Re: Backports from trunk to 2.2 proxy-balancers
On May 6, 2009, at 11:15 AM, jean-frederic clere wrote: Jim Jagielski wrote: In that case, we could keep the trunk dir structure for any "extra" balancers we may add in the 2.2 tree and move the old balancer code back into mod_proxy_balancers.c (or, even better, as sep files that aren't sub-modules :) ) +1 This is now done... if we add any other balancers, we can put them in the ./balancers/ subdir; the current 3 are in sep files for ease of backporting but are not sub-modules but rather linked support files ;)
Re: mod_proxy / mod_proxy_balancer
Jess Holle wrote: jean-frederic clere wrote: Should general support for a query URL be provided in mod_proxy_balancer? Or should this be left to mod_cluster? Can you explain more? I don't get the question. What I mean is 1. Should mod_proxy_balancer be extended to provide a balancer algorithm in which one specifies a backend URL that will provide a single numeric health metric, throttle the number of such requests via a time-to-live associated with this information, and balance on this basis or 2. Should mod_cluster handle this issue? 3. Or both? * For instance, mod_cluster might leverage special nuances in AJP, JBoss, and Tomcat, whereas mod_proxy_balancer might provide more generic support for helath checks on any back end server that can expose a health metric URL. >From your response below, it sounds like you're saying it's #2, which is /largely /fine and good -- but this raises questions: 1. How general is the health check metric in mod_cluster? * I only care about Tomcat backends myself, but control over the metric would be good. 2. Does this require special JBoss nuggets in Tomcat? * I'd hope not, i.e. that this is a simple matter of a pre-designated URL or a very simple standalone socket protocol. 3. When will mod_cluster support health metric based balancing of Tomcat? 4. How "disruptive" to an existing configuration using mod_proxy_balancer/mod_proxy_ajp is mod_cluster? * How much needs to be changed? 5. How portable is the mod_cluster code? * Does it build on Windows? HPUX? AIX? Please ask the mod_cluster questions in the mod_cluster-...@lists.jboss.org list. I will answer there. Cheers Jean-Frederic I say this is largely fine and good as I'd like to see just the health-metric based balancing algorithm in Apache 2.2.x itself. Does mod_cluster provide yet another approach top to bottom (separate than mod_jk and mod_proxy/mod_proxy_ajp)? Mod_cluster is just a balancer for mod_proxy but due to the dynamic creation of balancers and workers it can't get in the httpd-trunk code right now. It would seem nice to me if mod_jk and/or mod_proxy_balancer could do health checks, but you have to draw the line somewhere on growing any given module and if mod_jk and mod_proxy_balancer are not going in that direction at some point mod_cluster may be in my future. -- Jess Holle
Re: Backports from trunk to 2.2 proxy-balancers
Jim Jagielski wrote: On May 6, 2009, at 11:04 AM, Plüm, Rüdiger, VF-Group wrote: -Ursprüngliche Nachricht- Von: Jim Jagielski Gesendet: Mittwoch, 6. Mai 2009 16:59 An: dev@httpd.apache.org Betreff: Re: Backports from trunk to 2.2 proxy-balancers On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote: The problem is that this breaks existing configurations for 2.2.x as the balancers are now in separate modules. How so (the breakage, that is)?? You mean they requirement for them to LoadModule them? Exactly. This is an unpleasant surprise for someone updating an existing installation from 2.2.a to 2.2.b. In that case, we could keep the trunk dir structure for any "extra" balancers we may add in the 2.2 tree and move the old balancer code back into mod_proxy_balancers.c (or, even better, as sep files that aren't sub-modules :) ) +1 Cheers Jean-Frederic
Re: Backports from trunk to 2.2 proxy-balancers
Just an update: Currently httpd-2.2-proxy contains the latest trunk proxy code and the new sub-module layout and passes all framework tests... I'll start, maybe after lunch, movement to sub-files, not sub-modules, for the balancers. On May 6, 2009, at 11:10 AM, Jim Jagielski wrote: In that case, we could keep the trunk dir structure for any "extra" balancers we may add in the 2.2 tree and move the old balancer code back into mod_proxy_balancers.c (or, even better, as sep files that aren't sub-modules :) )
Re: Backports from trunk to 2.2 proxy-balancers
> -Ursprüngliche Nachricht- > Von: Jim Jagielski > Gesendet: Mittwoch, 6. Mai 2009 17:10 > An: dev@httpd.apache.org > Betreff: Re: Backports from trunk to 2.2 proxy-balancers > > > On May 6, 2009, at 11:04 AM, Plüm, Rüdiger, VF-Group wrote: > > > > > > >> -Ursprüngliche Nachricht- > >> Von: Jim Jagielski > >> Gesendet: Mittwoch, 6. Mai 2009 16:59 > >> An: dev@httpd.apache.org > >> Betreff: Re: Backports from trunk to 2.2 proxy-balancers > >> > >> > >> On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote: > >> > >>> > >>> The problem is that this breaks existing configurations for 2.2.x > >>> as the balancers are now in separate modules. > >> > >> How so (the breakage, that is)?? You mean they requirement for > >> them to LoadModule them? > > > > Exactly. This is an unpleasant surprise for someone updating an > > existing installation from 2.2.a to 2.2.b. > > > > In that case, we could keep the trunk dir structure for any > "extra" balancers we may add in the 2.2 tree and move the old > balancer code back into mod_proxy_balancers.c (or, even better, > as sep files that aren't sub-modules :) ) Sounds fine to me. I am not opposed to the directory structure and separate files per se. Maybe we just link them into mod_proxy_balancer on 2.2.x whereas we keep them as separate modules on trunk. Regards Rüdiger
Re: Backports from trunk to 2.2 proxy-balancers
On May 6, 2009, at 11:04 AM, Plüm, Rüdiger, VF-Group wrote: -Ursprüngliche Nachricht- Von: Jim Jagielski Gesendet: Mittwoch, 6. Mai 2009 16:59 An: dev@httpd.apache.org Betreff: Re: Backports from trunk to 2.2 proxy-balancers On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote: The problem is that this breaks existing configurations for 2.2.x as the balancers are now in separate modules. How so (the breakage, that is)?? You mean they requirement for them to LoadModule them? Exactly. This is an unpleasant surprise for someone updating an existing installation from 2.2.a to 2.2.b. In that case, we could keep the trunk dir structure for any "extra" balancers we may add in the 2.2 tree and move the old balancer code back into mod_proxy_balancers.c (or, even better, as sep files that aren't sub-modules :) )
Re: Backports from trunk to 2.2 proxy-balancers
> -Ursprüngliche Nachricht- > Von: Jim Jagielski > Gesendet: Mittwoch, 6. Mai 2009 16:59 > An: dev@httpd.apache.org > Betreff: Re: Backports from trunk to 2.2 proxy-balancers > > > On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote: > > > > > The problem is that this breaks existing configurations for 2.2.x > > as the balancers are now in separate modules. > > How so (the breakage, that is)?? You mean they requirement for > them to LoadModule them? Exactly. This is an unpleasant surprise for someone updating an existing installation from 2.2.a to 2.2.b. Regards Rüdiger
Re: mod_proxy / mod_proxy_balancer
Jim Jagielski wrote: On May 6, 2009, at 9:07 AM, Jess Holle wrote: jean-frederic clere wrote: Should general support for a query URL be provided in mod_proxy_balancer? Or should this be left to mod_cluster? Can you explain more? I don't get the question. What I mean is • Should mod_proxy_balancer be extended to provide a balancer algorithm in which one specifies a backend URL that will provide a single numeric health metric, throttle the number of such requests via a time-to-live associated with this information, and balance on this basis or • Should mod_cluster handle this issue? • Or both? Please recall that, afaik, mod_cluster is not AL nor is it part of Apache. So asking for direction for what is basically an external project on the Apache httpd dev list is kinda weird :) Yep there is a JBoss list for that: mod_cluster-...@lists.jboss.org In any case, I think the hope of the ASF is that this capability is part of httpd, and you can see, with mod_heartbeat and the like, efforts in the direction. Yes I am experimenting there too. Cheers Jean-Frederic
Re: Backports from trunk to 2.2 proxy-balancers
> -Ursprüngliche Nachricht- > Von: jean-frederic clere > Gesendet: Mittwoch, 6. Mai 2009 16:40 > An: dev@httpd.apache.org > Betreff: Re: Backports from trunk to 2.2 proxy-balancers > > Plüm, Rüdiger, VF-Group wrote: > > > > > >> -Ursprüngliche Nachricht- > >> Von: Rainer Jung > >> Gesendet: Mittwoch, 6. Mai 2009 15:10 > >> An: dev@httpd.apache.org > >> Betreff: Re: Backports from trunk to 2.2 proxy-balancers > >> > >> On 06.05.2009 14:39, Jim Jagielski wrote: > >>> It would certainly be easier to maintain a 2.2-proxy > >> branch, with the > >>> intent of it actually being folded *into* 2.2, if the > >> branch used the > >>> same dir structure as trunk, that is, a separate directory > >> that includes > >>> the balancer methods (as well as the config magic > >> associated with it). > >>> However, if that will be a impediment to actually *getting* these > >>> backports into 2.2, then I'm willing to keep the old structure... > >>> > >>> So my question is: if to be able to easily backport the > >> various trunk > >>> proxy improvements into 2.2, we also need to backport the dir > >>> structure as well, is that OK? I don't want to work down that > >>> path only to have it wasted work because people think that such a > >>> directory restructure doesn't make sense within a 2.2.x release. > >>> > >>> PS: NO, I am not considering this for 2.2.12! :) > >> I guess at the heart of this is the question, how likely > we break some > >> part of the users build process for 2.2.x. My feeling is, that the > >> additional sub directory for the balancing method > implementations is a > >> small change and users build process should not break due to this > >> additional one directory. > >> > >> On the positive side apart from easier backports: the new > subdirectory > >> might make people more curious on how to add a custom > >> balancing method, > >> so we get a slightly better visibility for the existing > >> provider interface. > > > > The problem is that this breaks existing configurations for 2.2.x > > as the balancers are now in separate modules. Thus I am -0.5 on > > backporting this directory structure to 2.2.x. > > May be we could keep the file structure but change the logic > to the new one. > For the external proxy_balancer_method we could detect old > and new ones > no? (We have the NULL for that). How so? The new structure makes them separate modules which require separate LoadModule lines for each them. Thus existing configurations simply get broken. IMHO the logic structure (them being providers) is not different between 2.2.x and trunk. Regards Rüdiger
Re: Backports from trunk to 2.2 proxy-balancers
On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote: The problem is that this breaks existing configurations for 2.2.x as the balancers are now in separate modules. How so (the breakage, that is)?? You mean they requirement for them to LoadModule them?
Re: Backports from trunk to 2.2 proxy-balancers
Plüm, Rüdiger, VF-Group wrote: -Ursprüngliche Nachricht- Von: Rainer Jung Gesendet: Mittwoch, 6. Mai 2009 15:10 An: dev@httpd.apache.org Betreff: Re: Backports from trunk to 2.2 proxy-balancers On 06.05.2009 14:39, Jim Jagielski wrote: It would certainly be easier to maintain a 2.2-proxy branch, with the intent of it actually being folded *into* 2.2, if the branch used the same dir structure as trunk, that is, a separate directory that includes the balancer methods (as well as the config magic associated with it). However, if that will be a impediment to actually *getting* these backports into 2.2, then I'm willing to keep the old structure... So my question is: if to be able to easily backport the various trunk proxy improvements into 2.2, we also need to backport the dir structure as well, is that OK? I don't want to work down that path only to have it wasted work because people think that such a directory restructure doesn't make sense within a 2.2.x release. PS: NO, I am not considering this for 2.2.12! :) I guess at the heart of this is the question, how likely we break some part of the users build process for 2.2.x. My feeling is, that the additional sub directory for the balancing method implementations is a small change and users build process should not break due to this additional one directory. On the positive side apart from easier backports: the new subdirectory might make people more curious on how to add a custom balancing method, so we get a slightly better visibility for the existing provider interface. The problem is that this breaks existing configurations for 2.2.x as the balancers are now in separate modules. Thus I am -0.5 on backporting this directory structure to 2.2.x. May be we could keep the file structure but change the logic to the new one. For the external proxy_balancer_method we could detect old and new ones no? (We have the NULL for that). Cheers Jean-Frederic Regards Rüdiger
Re: Backports from trunk to 2.2 proxy-balancers
> -Ursprüngliche Nachricht- > Von: Rainer Jung > Gesendet: Mittwoch, 6. Mai 2009 15:10 > An: dev@httpd.apache.org > Betreff: Re: Backports from trunk to 2.2 proxy-balancers > > On 06.05.2009 14:39, Jim Jagielski wrote: > > It would certainly be easier to maintain a 2.2-proxy > branch, with the > > intent of it actually being folded *into* 2.2, if the > branch used the > > same dir structure as trunk, that is, a separate directory > that includes > > the balancer methods (as well as the config magic > associated with it). > > > > However, if that will be a impediment to actually *getting* these > > backports into 2.2, then I'm willing to keep the old structure... > > > > So my question is: if to be able to easily backport the > various trunk > > proxy improvements into 2.2, we also need to backport the dir > > structure as well, is that OK? I don't want to work down that > > path only to have it wasted work because people think that such a > > directory restructure doesn't make sense within a 2.2.x release. > > > > PS: NO, I am not considering this for 2.2.12! :) > > I guess at the heart of this is the question, how likely we break some > part of the users build process for 2.2.x. My feeling is, that the > additional sub directory for the balancing method implementations is a > small change and users build process should not break due to this > additional one directory. > > On the positive side apart from easier backports: the new subdirectory > might make people more curious on how to add a custom > balancing method, > so we get a slightly better visibility for the existing > provider interface. The problem is that this breaks existing configurations for 2.2.x as the balancers are now in separate modules. Thus I am -0.5 on backporting this directory structure to 2.2.x. Regards Rüdiger
Re: mod_proxy / mod_proxy_balancer
On May 6, 2009, at 9:23 AM, Jess Holle wrote: You're right -- I was being weird. Sorry. No apology needed :) I guess part of the reason for my asking was whether the ASF was basically saying "we're not chasing this problem, see mod_cluster folk if you need it solved" -- and, if so, hoping to get a little starting info as to what I'd be getting into chasing mod_cluster. I'd like to see this capability in httpd itself -- or at least have it very easy to add in a very seamless fashion via a pluggable custom balancer algorithm (without other larger configuration side effects) -- and thus would hope the ASF sees this as within the scope of httpd's core suite of modules. I think it's safe to say that there is enough interest here in the httpd dev team for this capability to be part of httpd itself... What we like to do is provide the basic implementation and capability and allow others to build on top of that if needed.
Re: mod_proxy / mod_proxy_balancer
Jim Jagielski wrote: On May 6, 2009, at 9:07 AM, Jess Holle wrote: jean-frederic clere wrote: Should general support for a query URL be provided in mod_proxy_balancer? Or should this be left to mod_cluster? Can you explain more? I don't get the question. What I mean is • Should mod_proxy_balancer be extended to provide a balancer algorithm in which one specifies a backend URL that will provide a single numeric health metric, throttle the number of such requests via a time-to-live associated with this information, and balance on this basis or • Should mod_cluster handle this issue? • Or both? Please recall that, afaik, mod_cluster is not AL nor is it part of Apache. So asking for direction for what is basically an external project on the Apache httpd dev list is kinda weird :) In any case, I think the hope of the ASF is that this capability is part of httpd, and you can see, with mod_heartbeat and the like, efforts in the direction. But the world is big enough for different implementations... You're right -- I was being weird. Sorry. I guess part of the reason for my asking was whether the ASF was basically saying "we're not chasing this problem, see mod_cluster folk if you need it solved" -- and, if so, hoping to get a little starting info as to what I'd be getting into chasing mod_cluster. I'd like to see this capability in httpd itself -- or at least have it very easy to add in a very seamless fashion via a pluggable custom balancer algorithm (without other larger configuration side effects) -- and thus would hope the ASF sees this as within the scope of httpd's core suite of modules. -- Jess Holle
Re: mod_proxy / mod_proxy_balancer
On 06.05.2009 15:08, Jim Jagielski wrote: > > On May 6, 2009, at 4:35 AM, Jess Holle wrote: > >> >> Of course that redoes what a servlet engine would be doing and does so >> with lower fidelity. An ability to ask a backend for its current >> session count and load balance new requests on that basis would be >> really helpful. Whether this ability is buried into AJP, for >> instance, or is simply a separate request to a designated URL is >> another question, but the latter approach seems fairly general and the >> number of such requests could be throttled by a time-to-live setting >> on the last such count obtained. >> >> Actually this could and should be generalized beyond active sessions >> to a back-end health metric. Each backend could compute and respond >> with a relative measure of busyness/health and respond and the load >> balancer could then balance new (session-less) requests to the least >> busy / most healthy backend. This would seem to be *huge* step >> forward in load balancing capability/fidelity. >> > > The trick, of course, at least with HTTP, is that the querying of > the backend is, of course, a request, and so one needs to worry about > such things as keepalives and persistent connections, and how long > do we wait for responses, etc... > > That's why oob-like health-and-status chatter is nice, because > it doesn't interfere with the normal reverse-proxy/host logic. > > An idea: Instead of asking for this info before sending the > request, what about the backend sending it as part of the response, > as a response header. You don't know that status of the machine > "now", but you do know the status of it right after it handled the last > request (the last time you saw it) and, assuming nothing else touched > it, that status is likely still "good". Latency will be an issue, > of course... Overlapping requests where you don't have the response > from req1 before you send req2 means that both requests think the > server is at the same state, whereas of course, they aren't, but it > may even out since req3, for example, (which happens after req1 is done) > thinks that the backend has 2 concurrent requests, instead of the 1 > (req2) and so maybe isn't selected... The hysteresis would be interesting > to model :) I think asking each time before sending data is to much overhead in general. Of course it depends on how accurate you try to distribute load. I would expect, that in most situations the overhead for a per request accurate decision does not pay off, especially when under high load there is always a time window between getting the data and handling the request, and a lot of concurrent requests will already again have changed the data. I expect in most cases a granularity of status data between once per second and once per minute will be appropriate (still a factor of 60 to decide or configure). When sending the data back as part of the response: some load numbers might be to expensive to retrieve like 500 times a second. Other load numbers might not really make sense as a snapshot (per request), only as an average value (like: what does CPU load as a snapshot mean? Since your load data collecting code is on the CPU, a one CPU system will be 100% busy at this point in time. So CPU measurement mostly makes sense as average values over relatively short intervals). So we should already expect the backend to send data with is not necessarily up-to-date w.r.t. each request. I would assume, that when data comes with each response, one would use some sort of floating average. Piggybacking will be easier to implement (no real protocol needed etc.), out of band communication will be more flexible. Regards, Rainer
Re: Backports from trunk to 2.2 proxy-balancers
On May 6, 2009, at 9:09 AM, Rainer Jung wrote: On 06.05.2009 14:39, Jim Jagielski wrote: It would certainly be easier to maintain a 2.2-proxy branch, with the intent of it actually being folded *into* 2.2, if the branch used the same dir structure as trunk, that is, a separate directory that includes the balancer methods (as well as the config magic associated with it). However, if that will be a impediment to actually *getting* these backports into 2.2, then I'm willing to keep the old structure... So my question is: if to be able to easily backport the various trunk proxy improvements into 2.2, we also need to backport the dir structure as well, is that OK? I don't want to work down that path only to have it wasted work because people think that such a directory restructure doesn't make sense within a 2.2.x release. PS: NO, I am not considering this for 2.2.12! :) I guess at the heart of this is the question, how likely we break some part of the users build process for 2.2.x. My feeling is, that the additional sub directory for the balancing method implementations is a small change and users build process should not break due to this additional one directory. On the positive side apart from easier backports: the new subdirectory might make people more curious on how to add a custom balancing method, so we get a slightly better visibility for the existing provider interface. My thoughts as well... :)
Re: mod_proxy / mod_proxy_balancer
Jim Jagielski wrote: On May 6, 2009, at 4:35 AM, Jess Holle wrote: Of course that redoes what a servlet engine would be doing and does so with lower fidelity. An ability to ask a backend for its current session count and load balance new requests on that basis would be really helpful. Whether this ability is buried into AJP, for instance, or is simply a separate request to a designated URL is another question, but the latter approach seems fairly general and the number of such requests could be throttled by a time-to-live setting on the last such count obtained. Actually this could and should be generalized beyond active sessions to a back-end health metric. Each backend could compute and respond with a relative measure of busyness/health and respond and the load balancer could then balance new (session-less) requests to the least busy / most healthy backend. This would seem to be *huge* step forward in load balancing capability/fidelity. The trick, of course, at least with HTTP, is that the querying of the backend is, of course, a request, and so one needs to worry about such things as keepalives and persistent connections, and how long do we wait for responses, etc... That's why oob-like health-and-status chatter is nice, because it doesn't interfere with the normal reverse-proxy/host logic. An idea: Instead of asking for this info before sending the request, what about the backend sending it as part of the response, as a response header. You don't know that status of the machine "now", but you do know the status of it right after it handled the last request (the last time you saw it) and, assuming nothing else touched it, that status is likely still "good". Latency will be an issue, of course... Overlapping requests where you don't have the response from req1 before you send req2 means that both requests think the server is at the same state, whereas of course, they aren't, but it may even out since req3, for example, (which happens after req1 is done) thinks that the backend has 2 concurrent requests, instead of the 1 (req2) and so maybe isn't selected... The hysteresis would be interesting to model :) There's inherent hysteresis in this sort of thing. Including health information (e.g. via a custom response header) on all responses is an interesting notion. Exposing a URL on Apache through which the backend can push its health information (e.g. upon starting a new session or invalidating a session or detecting a low memory condition) also makes sense. If these do not suffice a watchdog thread (as in mod_jk) could do periodic health checks on the backends in a separate thread or requests could pre-request health information for a backend if that backend's health information is sufficiently old. There's lots of possibilities here. -- Jess Holle
Re: mod_proxy / mod_proxy_balancer
On May 6, 2009, at 9:07 AM, Jess Holle wrote: jean-frederic clere wrote: Should general support for a query URL be provided in mod_proxy_balancer? Or should this be left to mod_cluster? Can you explain more? I don't get the question. What I mean is • Should mod_proxy_balancer be extended to provide a balancer algorithm in which one specifies a backend URL that will provide a single numeric health metric, throttle the number of such requests via a time-to-live associated with this information, and balance on this basis or • Should mod_cluster handle this issue? • Or both? Please recall that, afaik, mod_cluster is not AL nor is it part of Apache. So asking for direction for what is basically an external project on the Apache httpd dev list is kinda weird :) In any case, I think the hope of the ASF is that this capability is part of httpd, and you can see, with mod_heartbeat and the like, efforts in the direction. But the world is big enough for different implementations...
Re: mod_proxy / mod_proxy_balancer
Rainer Jung wrote: On 06.05.2009 14:35, jean-frederic clere wrote: Jess Holle wrote: Rainer Jung wrote: Yes, I think the counter/aging discussion is for the baseline, i.e. when we do not have any information channel to or from the backend nodes. As soon as mod_cluster comes into play, we can use more up-to-date real data and only need to decide how to interprete it and how to interpolate during the update interval. Should general support for a query URL be provided in mod_proxy_balancer? Or should this be left to mod_cluster? Can you explain more? I don't get the question. Does mod_cluster provide yet another approach top to bottom (separate than mod_jk and mod_proxy/mod_proxy_ajp)? Mod_cluster is just a balancer for mod_proxy but due to the dynamic creation of balancers and workers it can't get in the httpd-trunk code right now. It would seem nice to me if mod_jk and/or mod_proxy_balancer could do health checks, but you have to draw the line somewhere on growing any given module and if mod_jk and mod_proxy_balancer are not going in that direction at some point mod_cluster may be in my future. Cool :-) There are at several different sub systems, and as I understood mod_cluster it already carefully separates them: 1) Dynamic topology detection (optional) What are our backend nodes? If you do not want to statically configure them, you need some mechanism based on either - registration: backend nodes register at one or multiple topology management nodes; the addresses of those are either configured, or they announce themselves on the network via broad- or multicast). - detection: topology manager receives broad- or multicast packets of the backend nodes. They do not need to know the topology manager, only the multicast address More enhanced would be to already learn the forwarding rules (e.g. URLs to map) from the backend nodes. In the simpler case, the topology would be configured statically. 2) Dynamic state detection a) Livelyness b) Load numbers Both could be either polled by (maybe scalability issues) or pushed to a state manager. Push could be done by tcp (the address could be sent to the backend, once it was detected in 1) or defined statically). Maybe one would use both ways, e.g. push for active state changes, like when an admin stops a node, poll for state manager driven things. Not sure. 3) Balancing Would be done based on the data collected by the state manager. It's not clear at all, whether those three should be glued together tightly, or kept in different pieces. I had the impression the general direction is more about separating them and to allow multiple experiments, like mod_cluster and mod_heartbeat. The interaction would be done via some common data container, e.g. slotmem or in a distributed (multiple Apaches) situation memcache or similar. Does this make sense? Yes. I've been working around #1 by using pre-designated port ranges for backends, e.g. configuring for balancing over a port range of 10 and only having a couple of servers running in this range at most given times. That's fine as long as one quiets Apache's error logging so that it only complains about backends that are *newly* unreachable rather than complaining each time a backend is retried. I supplied a patch for this some time back. #2 and #3 are huge, however, and it would be good to see something firm rather than experimental in these areas sooner than later. -- Jess Holle
Re: Backports from trunk to 2.2 proxy-balancers
On 06.05.2009 14:39, Jim Jagielski wrote: > It would certainly be easier to maintain a 2.2-proxy branch, with the > intent of it actually being folded *into* 2.2, if the branch used the > same dir structure as trunk, that is, a separate directory that includes > the balancer methods (as well as the config magic associated with it). > > However, if that will be a impediment to actually *getting* these > backports into 2.2, then I'm willing to keep the old structure... > > So my question is: if to be able to easily backport the various trunk > proxy improvements into 2.2, we also need to backport the dir > structure as well, is that OK? I don't want to work down that > path only to have it wasted work because people think that such a > directory restructure doesn't make sense within a 2.2.x release. > > PS: NO, I am not considering this for 2.2.12! :) I guess at the heart of this is the question, how likely we break some part of the users build process for 2.2.x. My feeling is, that the additional sub directory for the balancing method implementations is a small change and users build process should not break due to this additional one directory. On the positive side apart from easier backports: the new subdirectory might make people more curious on how to add a custom balancing method, so we get a slightly better visibility for the existing provider interface. Regards, Rainer
Re: mod_proxy / mod_proxy_balancer
FWIW, I've been looking into using Tribes for httpd.
Re: mod_proxy / mod_proxy_balancer
On May 6, 2009, at 4:35 AM, Jess Holle wrote: Of course that redoes what a servlet engine would be doing and does so with lower fidelity. An ability to ask a backend for its current session count and load balance new requests on that basis would be really helpful. Whether this ability is buried into AJP, for instance, or is simply a separate request to a designated URL is another question, but the latter approach seems fairly general and the number of such requests could be throttled by a time-to-live setting on the last such count obtained. Actually this could and should be generalized beyond active sessions to a back-end health metric. Each backend could compute and respond with a relative measure of busyness/health and respond and the load balancer could then balance new (session-less) requests to the least busy / most healthy backend. This would seem to be *huge* step forward in load balancing capability/fidelity. The trick, of course, at least with HTTP, is that the querying of the backend is, of course, a request, and so one needs to worry about such things as keepalives and persistent connections, and how long do we wait for responses, etc... That's why oob-like health-and-status chatter is nice, because it doesn't interfere with the normal reverse-proxy/host logic. An idea: Instead of asking for this info before sending the request, what about the backend sending it as part of the response, as a response header. You don't know that status of the machine "now", but you do know the status of it right after it handled the last request (the last time you saw it) and, assuming nothing else touched it, that status is likely still "good". Latency will be an issue, of course... Overlapping requests where you don't have the response from req1 before you send req2 means that both requests think the server is at the same state, whereas of course, they aren't, but it may even out since req3, for example, (which happens after req1 is done) thinks that the backend has 2 concurrent requests, instead of the 1 (req2) and so maybe isn't selected... The hysteresis would be interesting to model :)
Re: mod_proxy / mod_proxy_balancer
jean-frederic clere wrote: Should general support for a query URL be provided in mod_proxy_balancer? Or should this be left to mod_cluster? Can you explain more? I don't get the question. What I mean is 1. Should mod_proxy_balancer be extended to provide a balancer algorithm in which one specifies a backend URL that will provide a single numeric health metric, throttle the number of such requests via a time-to-live associated with this information, and balance on this basis or 2. Should mod_cluster handle this issue? 3. Or both? * For instance, mod_cluster might leverage special nuances in AJP, JBoss, and Tomcat, whereas mod_proxy_balancer might provide more generic support for helath checks on any back end server that can expose a health metric URL. From your response below, it sounds like you're saying it's #2, which is /largely /fine and good -- but this raises questions: 1. How general is the health check metric in mod_cluster? * I only care about Tomcat backends myself, but control over the metric would be good. 2. Does this require special JBoss nuggets in Tomcat? * I'd hope not, i.e. that this is a simple matter of a pre-designated URL or a very simple standalone socket protocol. 3. When will mod_cluster support health metric based balancing of Tomcat? 4. How "disruptive" to an existing configuration using mod_proxy_balancer/mod_proxy_ajp is mod_cluster? * How much needs to be changed? 5. How portable is the mod_cluster code? * Does it build on Windows? HPUX? AIX? I say this is largely fine and good as I'd like to see just the health-metric based balancing algorithm in Apache 2.2.x itself. Does mod_cluster provide yet another approach top to bottom (separate than mod_jk and mod_proxy/mod_proxy_ajp)? Mod_cluster is just a balancer for mod_proxy but due to the dynamic creation of balancers and workers it can't get in the httpd-trunk code right now. It would seem nice to me if mod_jk and/or mod_proxy_balancer could do health checks, but you have to draw the line somewhere on growing any given module and if mod_jk and mod_proxy_balancer are not going in that direction at some point mod_cluster may be in my future. -- Jess Holle
Re: mod_proxy / mod_proxy_balancer
On 06.05.2009 14:35, jean-frederic clere wrote: > Jess Holle wrote: >> Rainer Jung wrote: >>> Yes, I think the counter/aging discussion is for the baseline, i.e. when >>> we do not have any information channel to or from the backend nodes. >>> >>> As soon as mod_cluster comes into play, we can use more up-to-date real >>> data and only need to decide how to interprete it and how to interpolate >>> during the update interval. >>> >> Should general support for a query URL be provided in >> mod_proxy_balancer? Or should this be left to mod_cluster? > > Can you explain more? I don't get the question. > >> Does mod_cluster provide yet another approach top to bottom (separate >> than mod_jk and mod_proxy/mod_proxy_ajp)? > > Mod_cluster is just a balancer for mod_proxy but due to the dynamic > creation of balancers and workers it can't get in the httpd-trunk code > right now. > >> It would seem nice to me if mod_jk and/or mod_proxy_balancer could do >> health checks, but you have to draw the line somewhere on growing any >> given module and if mod_jk and mod_proxy_balancer are not going in >> that direction at some point mod_cluster may be in my future. > > Cool :-) There are at several different sub systems, and as I understood mod_cluster it already carefully separates them: 1) Dynamic topology detection (optional) What are our backend nodes? If you do not want to statically configure them, you need some mechanism based on either - registration: backend nodes register at one or multiple topology management nodes; the addresses of those are either configured, or they announce themselves on the network via broad- or multicast). - detection: topology manager receives broad- or multicast packets of the backend nodes. They do not need to know the topology manager, only the multicast address More enhanced would be to already learn the forwarding rules (e.g. URLs to map) from the backend nodes. In the simpler case, the topology would be configured statically. 2) Dynamic state detection a) Livelyness b) Load numbers Both could be either polled by (maybe scalability issues) or pushed to a state manager. Push could be done by tcp (the address could be sent to the backend, once it was detected in 1) or defined statically). Maybe one would use both ways, e.g. push for active state changes, like when an admin stops a node, poll for state manager driven things. Not sure. 3) Balancing Would be done based on the data collected by the state manager. It's not clear at all, whether those three should be glued together tightly, or kept in different pieces. I had the impression the general direction is more about separating them and to allow multiple experiments, like mod_cluster and mod_heartbeat. The interaction would be done via some common data container, e.g. slotmem or in a distributed (multiple Apaches) situation memcache or similar. Does this make sense? Regards, Rainer
Backports from trunk to 2.2 proxy-balancers
It would certainly be easier to maintain a 2.2-proxy branch, with the intent of it actually being folded *into* 2.2, if the branch used the same dir structure as trunk, that is, a separate directory that includes the balancer methods (as well as the config magic associated with it). However, if that will be a impediment to actually *getting* these backports into 2.2, then I'm willing to keep the old structure... So my question is: if to be able to easily backport the various trunk proxy improvements into 2.2, we also need to backport the dir structure as well, is that OK? I don't want to work down that path only to have it wasted work because people think that such a directory restructure doesn't make sense within a 2.2.x release. PS: NO, I am not considering this for 2.2.12! :)
Re: mod_proxy / mod_proxy_balancer
Jess Holle wrote: Rainer Jung wrote: An ability to balance based on new sessions with an idle time out on such sessions would be close enough to reality in cases where sessions expire rather than being explicitly invalidated (e.g. by a logout). But then we end up in a stateful situation. This is a serious design decision. If we want to track idleness for sessions, we need to track a list of sessions (session ids) the balancer has seen. This makes things much more complex. Combined with the non-ability to track logouts and the errors coming in form a global situation (more than one Apache instance), I think it will be more of a problem than a solution. The more I think about this the more I agree. >From the start I preferred the session/health query to the back-end with a time-to-live, on further consideration I *greatly* prefer this approach. Of course that redoes what a servlet engine would be doing and does so with lower fidelity. An ability to ask a backend for its current session count and load balance new requests on that basis would be really helpful. Seems much nicer. Agreed. Actually this could and should be generalized beyond active sessions to a back-end health metric. Each backend could compute and respond with a relative measure of busyness/health and respond and the load balancer could then balance new (session-less) requests to the least busy / most healthy backend. This would seem to be *huge* step forward in load balancing capability/fidelity. It's my understanding that mod_cluster is pursuing just this sort of thing to some degree -- but currently only works for JBoss backends. Yes, I think the counter/aging discussion is for the baseline, i.e. when we do not have any information channel to or from the backend nodes. As soon as mod_cluster comes into play, we can use more up-to-date real data and only need to decide how to interprete it and how to interpolate during the update interval. Should general support for a query URL be provided in mod_proxy_balancer? Or should this be left to mod_cluster? Can you explain more? I don't get the question. Does mod_cluster provide yet another approach top to bottom (separate than mod_jk and mod_proxy/mod_proxy_ajp)? Mod_cluster is just a balancer for mod_proxy but due to the dynamic creation of balancers and workers it can't get in the httpd-trunk code right now. It would seem nice to me if mod_jk and/or mod_proxy_balancer could do health checks, but you have to draw the line somewhere on growing any given module and if mod_jk and mod_proxy_balancer are not going in that direction at some point mod_cluster may be in my future. Cool :-) Cheers Jean-Frederic
Re: mod_proxy / mod_proxy_balancer
Jess Holle wrote: jean-frederic clere wrote: Jess Holle wrote: An ability to balance based on new sessions with an idle time out on such sessions would be close enough to reality in cases where sessions expire rather than being explicitly invalidated (e.g. by a logout). Storing the sessionid to share the load depending on the number of active sessions, brings a problem of security, no? To the degree that you consider Apache vulnerable to attack to retrieve these, yes. I prefer the health check request approach below for this and other reasons (amount of required bookkeeping, etc). Of course that redoes what a servlet engine would be doing and does so with lower fidelity. An ability to ask a backend for its current session count and load balance new requests on that basis would be really helpful. Whether this ability is buried into AJP, for instance, or is simply a separate request to a designated URL is another question, but the latter approach seems fairly general and the number of such requests could be throttled by a time-to-live setting on the last such count obtained. Actually this could and should be generalized beyond active sessions to a back-end health metric. Each backend could compute and respond with a relative measure of busyness/health and respond and the load balancer could then balance new (session-less) requests to the least busy / most healthy backend. This would seem to be *huge* step forward in load balancing capability/fidelity. It's my understanding that mod_cluster is pursuing just this sort of thing to some degree -- but currently only works for JBoss backends. This wrong it works with Tomcat too. mod_cluster works with Tomcat, but according to the docs I've seen the dynamic (health/session metric based rather than static) load balancing only worked with JBoss backends. Or has this changed? No it is still like that but the singleton logic used in JBossAS requires a JBoss clustering logic but it should be available in the next version. Cheers Jean-Frederic
Re: mod_proxy / mod_proxy_balancer
jean-frederic clere wrote: Jess Holle wrote: An ability to balance based on new sessions with an idle time out on such sessions would be close enough to reality in cases where sessions expire rather than being explicitly invalidated (e.g. by a logout). Storing the sessionid to share the load depending on the number of active sessions, brings a problem of security, no? To the degree that you consider Apache vulnerable to attack to retrieve these, yes. I prefer the health check request approach below for this and other reasons (amount of required bookkeeping, etc). Of course that redoes what a servlet engine would be doing and does so with lower fidelity. An ability to ask a backend for its current session count and load balance new requests on that basis would be really helpful. Whether this ability is buried into AJP, for instance, or is simply a separate request to a designated URL is another question, but the latter approach seems fairly general and the number of such requests could be throttled by a time-to-live setting on the last such count obtained. Actually this could and should be generalized beyond active sessions to a back-end health metric. Each backend could compute and respond with a relative measure of busyness/health and respond and the load balancer could then balance new (session-less) requests to the least busy / most healthy backend. This would seem to be *huge* step forward in load balancing capability/fidelity. It's my understanding that mod_cluster is pursuing just this sort of thing to some degree -- but currently only works for JBoss backends. This wrong it works with Tomcat too. mod_cluster works with Tomcat, but according to the docs I've seen the dynamic (health/session metric based rather than static) load balancing only worked with JBoss backends. Or has this changed? -- Jess Holle
Re: mod_proxy / mod_proxy_balancer
Rainer Jung wrote: An ability to balance based on new sessions with an idle time out on such sessions would be close enough to reality in cases where sessions expire rather than being explicitly invalidated (e.g. by a logout). But then we end up in a stateful situation. This is a serious design decision. If we want to track idleness for sessions, we need to track a list of sessions (session ids) the balancer has seen. This makes things much more complex. Combined with the non-ability to track logouts and the errors coming in form a global situation (more than one Apache instance), I think it will be more of a problem than a solution. The more I think about this the more I agree. From the start I preferred the session/health query to the back-end with a time-to-live, on further consideration I *greatly* prefer this approach. Of course that redoes what a servlet engine would be doing and does so with lower fidelity. An ability to ask a backend for its current session count and load balance new requests on that basis would be really helpful. Seems much nicer. Agreed. Actually this could and should be generalized beyond active sessions to a back-end health metric. Each backend could compute and respond with a relative measure of busyness/health and respond and the load balancer could then balance new (session-less) requests to the least busy / most healthy backend. This would seem to be *huge* step forward in load balancing capability/fidelity. It's my understanding that mod_cluster is pursuing just this sort of thing to some degree -- but currently only works for JBoss backends. Yes, I think the counter/aging discussion is for the baseline, i.e. when we do not have any information channel to or from the backend nodes. As soon as mod_cluster comes into play, we can use more up-to-date real data and only need to decide how to interprete it and how to interpolate during the update interval. Should general support for a query URL be provided in mod_proxy_balancer? Or should this be left to mod_cluster? Does mod_cluster provide yet another approach top to bottom (separate than mod_jk and mod_proxy/mod_proxy_ajp)? It would seem nice to me if mod_jk and/or mod_proxy_balancer could do health checks, but you have to draw the line somewhere on growing any given module and if mod_jk and mod_proxy_balancer are not going in that direction at some point mod_cluster may be in my future. -- Jess Holle
Re: mod_dav WebDAVFS/1.7 and Transfer-Encoding
On Wed, May 6, 2009 at 4:08 AM, paul wrote: > > But reading mod_dav.c around line 2393: > if (tenc) { >if (strcasecmp(tenc, "chunked")) { >/* Use this instead of Apache's default error string */ >ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, > "Unknown Transfer-Encoding %s", tenc); >return HTTP_NOT_IMPLEMENTED; >} > > uses strcasecmp() so the above theory seems wrong. We're using 2.2.9 and > the above snipplet is from 2.2.11. Any chance this has been fixed after > 2.2.9? no; it has used strcasecmp() for eons > Does anyone have encountered the issue or is this a known problem? > did you check bugzilla ? (issues.apache.org/bugzilla)
Re: svn commit: r771998
This causes trunk to fail compilation with: make[1]: *** No rule to make target `modules/mappers/libmod_so.la', needed by `httpd'. Stop. make: *** [all-recursive] Error 1 Reverting it fixes the problem. Regards Rüdiger
Re: mod_proxy / mod_proxy_balancer
On 06.05.2009 10:35, Jess Holle wrote: > Rainer Jung wrote: >> In most situations aplications need stickyness. So balancing will not >> happen in an ideal situation, instead it tries to keep load equal >> although most requests are sticky. >> >> Because of the influence of sticky requests it can happen that >> accumulated load distributes very uneven between the nodes. Should the >> balancer try to correct such accumulated differences? >> Other applications are memory bound. Memory is needed by request >> handling but also by session handling. Data accumulation is mor >> eimportant here, because of the sessions. Again, we can not be perfect, >> because we don't get a notification, when a session expires or a user >> logs out. So we can only count the "new" sessions. This counter in my >> opinion also needs some aging, so that we won't compensate historic >> inequality without bounds. I must confess, that I don't have an example >> here, how this inequality can happen for sessions when balancing new >> session requests (stickyness doesn't influence this), but I think >> balancing based on old data is the wrong model here too. >> > An ability to balance based on new sessions with an idle time out on > such sessions would be close enough to reality in cases where sessions > expire rather than being explicitly invalidated (e.g. by a logout). But then we end up in a stateful situation. This is a serious design decision. If we want to track idleness for sessions, we need to track a list of sessions (session ids) the balancer has seen. This makes things much more complex. Combined with the non-ability to track logouts and the errors coming in form a global situation (more than one Apache instance), I think it will be more of a problem than a solution. > Of course that redoes what a servlet engine would be doing and does so > with lower fidelity. An ability to ask a backend for its current > session count and load balance new requests on that basis would be > really helpful. Seems much nicer. > Actually this could and should be generalized beyond active sessions to > a back-end health metric. Each backend could compute and respond with a > relative measure of busyness/health and respond and the load balancer > could then balance new (session-less) requests to the least busy / most > healthy backend. This would seem to be *huge* step forward in load > balancing capability/fidelity. > > It's my understanding that mod_cluster is pursuing just this sort of > thing to some degree -- but currently only works for JBoss backends. Yes, I think the counter/aging discussion is for the baseline, i.e. when we do not have any information channel to or from the backend nodes. As soon as mod_cluster comes into play, we can use more up-to-date real data and only need to decide how to interprete it and how to interpolate during the update interval. Regards, Rainer
Re: mod_proxy / mod_proxy_balancer
Jess Holle wrote: Rainer Jung wrote: In most situations aplications need stickyness. So balancing will not happen in an ideal situation, instead it tries to keep load equal although most requests are sticky. Because of the influence of sticky requests it can happen that accumulated load distributes very uneven between the nodes. Should the balancer try to correct such accumulated differences? Other applications are memory bound. Memory is needed by request handling but also by session handling. Data accumulation is mor eimportant here, because of the sessions. Again, we can not be perfect, because we don't get a notification, when a session expires or a user logs out. So we can only count the "new" sessions. This counter in my opinion also needs some aging, so that we won't compensate historic inequality without bounds. I must confess, that I don't have an example here, how this inequality can happen for sessions when balancing new session requests (stickyness doesn't influence this), but I think balancing based on old data is the wrong model here too. An ability to balance based on new sessions with an idle time out on such sessions would be close enough to reality in cases where sessions expire rather than being explicitly invalidated (e.g. by a logout). Storing the sessionid to share the load depending on the number of active sessions, brings a problem of security, no? Of course that redoes what a servlet engine would be doing and does so with lower fidelity. An ability to ask a backend for its current session count and load balance new requests on that basis would be really helpful. Whether this ability is buried into AJP, for instance, or is simply a separate request to a designated URL is another question, but the latter approach seems fairly general and the number of such requests could be throttled by a time-to-live setting on the last such count obtained. Actually this could and should be generalized beyond active sessions to a back-end health metric. Each backend could compute and respond with a relative measure of busyness/health and respond and the load balancer could then balance new (session-less) requests to the least busy / most healthy backend. This would seem to be *huge* step forward in load balancing capability/fidelity. It's my understanding that mod_cluster is pursuing just this sort of thing to some degree -- but currently only works for JBoss backends. This wrong it works with Tomcat too. Cheers Jean-Frederic
mod_dav WebDAVFS/1.7 and Transfer-Encoding
Hi folks, I hope this is the correct list to ask, please redirect if not. Recent Versions of OS X (WebDAVFS/1.7) have problems accessing webdav (mod_dav). Upload creates zero-byte files and the clients gets disconnected. According to: http://discussions.apple.com/thread.jspa?messageID=8101932� the issue is: We too ran into this issue with our software (Jungle Disk) and customers. As far as we've been able to tell Apple changed how Finder does PUT requests -- they now do them as Transfer-Encoding: chunked, which apparently a number of WebDAV implementations don't support. By adding support for this we have been able to interoperate with Finder again. Fix was simple. Finder is sending the Transfer-Encoding header as Chunked (note the upper case 'c'). Apache is sensitive to case. I think mod_dav would have the same issue. Minor patch to mod_dav required : it already supports chunking i think. But reading mod_dav.c around line 2393: if (tenc) { if (strcasecmp(tenc, "chunked")) { /* Use this instead of Apache's default error string */ ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, "Unknown Transfer-Encoding %s", tenc); return HTTP_NOT_IMPLEMENTED; } uses strcasecmp() so the above theory seems wrong. We're using 2.2.9 and the above snipplet is from 2.2.11. Any chance this has been fixed after 2.2.9? Does anyone have encountered the issue or is this a known problem? thanks Paul
Re: mod_proxy / mod_proxy_balancer
Rainer Jung wrote: In most situations aplications need stickyness. So balancing will not happen in an ideal situation, instead it tries to keep load equal although most requests are sticky. Because of the influence of sticky requests it can happen that accumulated load distributes very uneven between the nodes. Should the balancer try to correct such accumulated differences? Other applications are memory bound. Memory is needed by request handling but also by session handling. Data accumulation is mor eimportant here, because of the sessions. Again, we can not be perfect, because we don't get a notification, when a session expires or a user logs out. So we can only count the "new" sessions. This counter in my opinion also needs some aging, so that we won't compensate historic inequality without bounds. I must confess, that I don't have an example here, how this inequality can happen for sessions when balancing new session requests (stickyness doesn't influence this), but I think balancing based on old data is the wrong model here too. An ability to balance based on new sessions with an idle time out on such sessions would be close enough to reality in cases where sessions expire rather than being explicitly invalidated (e.g. by a logout). Of course that redoes what a servlet engine would be doing and does so with lower fidelity. An ability to ask a backend for its current session count and load balance new requests on that basis would be really helpful. Whether this ability is buried into AJP, for instance, or is simply a separate request to a designated URL is another question, but the latter approach seems fairly general and the number of such requests could be throttled by a time-to-live setting on the last such count obtained. Actually this could and should be generalized beyond active sessions to a back-end health metric. Each backend could compute and respond with a relative measure of busyness/health and respond and the load balancer could then balance new (session-less) requests to the least busy / most healthy backend. This would seem to be *huge* step forward in load balancing capability/fidelity. It's my understanding that mod_cluster is pursuing just this sort of thing to some degree -- but currently only works for JBoss backends. -- Jess Holle
Re: mod_proxy / mod_proxy_balancer
Caution: long response! On 05.05.2009 22:41, jean-frederic clere wrote: > Jim Jagielski wrote: >> >> On May 5, 2009, at 3:02 PM, jean-frederic clere wrote: >> >>> Jim Jagielski wrote: On May 5, 2009, at 1:18 PM, jean-frederic clere wrote: > Jim Jagielski wrote: >> On May 5, 2009, at 12:07 PM, jean-frederic clere wrote: >>> Jim Jagielski wrote: On May 5, 2009, at 11:13 AM, jean-frederic clere wrote: > > I am trying to get the worker->id and the scoreboard associated > logic moved in the reset() when using a balancer, those workers > need a different handling if we want to have a shared > information area for them. > The thing is that those workers are not really handled by the balancer itself (nor should be), so the reset() shouldn;'t apply. IMO, mod_proxy inits the generic forward/reverse workers and m_p_b should handle the balancer-related ones. >>> >>> Ok by running first the m_p_b child_init() the worker is >>> initialised by the m_p_b logic and mod_proxy won't change it later. >>> >> Yeah... a quick test indicates, at least as far as the perl >> framework is considered, changing to that m_p_b runs 1st in >> child_init >> results in normal and expected behavior Need to do some more >> tracing to see if we can copy the pointer instead of the whole >> data set with this ordering. > > I have committed the code... It works for my tests. > Beat me to it :) BTW: I did create a proxy-sandbox from 2.2.x in hopes that a lot of what we do in trunk we can backport to 2.2.x >>> >>> Yep but I think we should first have the reset()/age() stuff working >>> in trunk before backporting to httpd-2.2-proxy :-) >>> >> >> For sure!! >> >> BTW: it seems to me that aging is only really needed when the >> environment changes, >> mostly when a worker comes back, or when the actual limits are changed >> in real-time during runtime. Except for these, aging doesn't seem to >> really add much... long-term steady state only gets invalid when the >> steady-state changes, after all :) >> >> Comments? >> >> > > I think we need it for few reasons: > - When a worker is idle the information about its load is irrelevant. > - Being able to calculate throughput and load balance using that > information is only valid if you have a kind of ticker. > - In some tests I have made with a mixture of long sessions and single > request "sessions" you need to "forget" the load caused by the long > sessions. Balancing and stickyness are conflicting goals. Stickyness dictates the node, once a session is created, balancing tries to distribute load equally, so needs to choose the least loaded node. In most situations aplications need stickyness. So balancing will not happen in an ideal situation, instead it tries to keep load equal although most requests are sticky. Because of the influence of sticky requests it can happen that accumulated load distributes very uneven between the nodes. Should the balancer try to correct such accumulated differences? It depends (yeah, as always): what we actually mean by "load" is varying on the application. Abstractly we are talking about resource usage. The backend nodes have limited resources and we want to make optimal use of them by distributing the resource usage equally. For some applications CPU is the limiting resource. This resource is typically coupled to actual requests in flight and not to longer living objects like sessions. Of course not all requests need an equal amount of CPU, but as long as we can't actually measure the CPU load, balancing the number of requests in the sense of "busyness" (parallel requests) should be best for CPU. Because CPU monitoring is often done on the basis of averages (and not the maximum short term use per interval), some request count acumulation as a basis for the balancing will result in better measured numbers (not necessarily in better "smallest maximum load"). If we do not age, then strongly unequal historic distribution (caused by stickyness) will result in an opposite unequal distribution as soon as a lot of non-sticky requests come in. I think that's not optimal. Other applications are memory bound. Memory is needed by request handling but also by session handling. Data accumulation is mor eimportant here, because of the sessions. Again, we can not be perfect, because we don't get a notification, when a session expires or a user logs out. So we can only count the "new" sessions. This counter in my opinion also needs some aging, so that we won't compensate historic inequality without bounds. I must confess, that I don't have an example here, how this inequality can happen for sessions when balancing new session requests (stickyness doesn't influence this), but I think balancing based on old data is the wrong model here too. Then another important resource is bandwith. Her