Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Rainer Jung
On 06.05.2009 22:31, Jim Jagielski wrote:
> 
> On May 6, 2009, at 4:20 PM, Graham Leggett wrote:
> 
>> Jim Jagielski wrote:
>>
>>> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality.
>>>
>>> Not saying that releasing 2.4 isn't worth it, but there have been
>>> stops and
>>> starts all along the way, and I think we need to be clear on what we
>>> expect 2.4 to be. Until then, we have no clear defining line on when
>>> 2.4 is "done."
>>
>> Is there anything additional that we want v2.4 to do over and above what
>> it does now?
>>
> 
> Well, that's the question, isn't it? I can't align the idea
> of trunk being a candidate for 2.4 and trunk being a place for
> people to experiment...
> 
> What do we want 2.4 to be and do. And how.
> 
> Once we define (and agree) to that, we know how close (or far)
> trunk is. It sounds like we have some set that wants to break
> trunk apart and totally refactor a lot of it, and that's a big +1.
> It's also not a 3-4 month effort :) It also sounds like there
> are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared
> to 2.4, and a big +1 to that as well. But BOTH of these are using
> the exact same dev branch, and there's no general agreement on which
> we want... if you get my point ;)
> 
> If we branch off 2.4 right now from trunk, and say that this becomes
> our next main release, and the idea is to clean up what is there,
> and, for new experimental stuff, develop on trunk 1st and then
> backport to 2.4, I'll jump right on in, since it means code will
> be out and released and *used* sooner!

I expect there is already enough interesting new material in trunk right
now and users will find it a valuable 2.4. So yes, focus should be on
getting this out of the door.

If any trunk experiments turn out to be ready before 2.4.0 and backport
is low risk, we can backport (also if they are compatible with 2.4.x we
can backport later than 2.4.0).

To be clear: the OP (new mod_proxy hooks) might not be compatible with
the momentary trunk (breaking configuration), so before branching, the
question would also be: does anyone want to do something important soon,
that he wants to be in 2.4 and will be incompatible API wise or
configuration wise with the momentary trunk.

Regards,

Rainer



Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread William A. Rowe, Jr.
Graham Leggett wrote:
> Jim Jagielski wrote:
> 
>> Once we define (and agree) to that, we know how close (or far)
>> trunk is. It sounds like we have some set that wants to break
>> trunk apart and totally refactor a lot of it, and that's a big +1.
>> It's also not a 3-4 month effort :) It also sounds like there
>> are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared
>> to 2.4, and a big +1 to that as well. But BOTH of these are using
>> the exact same dev branch, and there's no general agreement on which
>> we want... if you get my point ;)

Understandable; when we have something suitable for consideration as an
alpha, it's time to fork.  If we have something in trunk that is clearly
not slated for the next version, then we must fork, rm, and tag.

Right now I'm not clear that anything in trunk is destined for svn rm
from the 2.4 branch.  E.g. the mod_proxy devs are just as keen as the
aaa  refactoring and core httpd devs to put their changes
out there in the public sphere, and see what happens.

But the moment we have anything that looks like a beta, there is going
to need to be a fork.  Do you intend that 2.3.x branch -> 2.4.x branch
to be C-T-R or R-T-C?

> I think the bit that divides these in two is APR v2.0.

APR v2.0 is irrelevant, IMHO.  It would be nice, but there is plenty of
work to do there, and since there are plenty of interfaces in need of
renaming and deprecation, out-of-sorts arg lists which can't decide if
apr_pool_t * is the first or last arg, by convention (yes, there was a
convention), and so on, I don't think we should 'wait' for apr 2.

If it beats httpd to release, great.  If not, oh well :)



Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Graham Leggett
Jim Jagielski wrote:

> Well, that's the question, isn't it? I can't align the idea
> of trunk being a candidate for 2.4 and trunk being a place for
> people to experiment...
> 
> What do we want 2.4 to be and do. And how.
> 
> Once we define (and agree) to that, we know how close (or far)
> trunk is. It sounds like we have some set that wants to break
> trunk apart and totally refactor a lot of it, and that's a big +1.
> It's also not a 3-4 month effort :) It also sounds like there
> are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared
> to 2.4, and a big +1 to that as well. But BOTH of these are using
> the exact same dev branch, and there's no general agreement on which
> we want... if you get my point ;)

I think the bit that divides these in two is APR v2.0.

People have begun refactoring APR to produce APR v2.0, and alongside
this will be a corresponding refactoring of httpd, that I think should
be httpd v3.0.

I think httpd v2.4 should be what we have now, against the latest APR v1
we have now, which is the APR v1.4 branch.

I think practically we should focus on getting httpd v2.4 out the door,
and make httpd ready for v3.0 to happen, against APR v2.0.

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: svn commit: r771998

2009-05-06 Thread William A. Rowe, Jr.
Ruediger Pluem wrote:
> 
> The issue happend on RHEL 4 64 Bit.

FWIW - FC10 x86_64 is my own default testing schema.


Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Ruediger Pluem


On 05/06/2009 10:31 PM, Jim Jagielski wrote:
> 
> On May 6, 2009, at 4:20 PM, Graham Leggett wrote:
> 
>> Jim Jagielski wrote:
>>
>>> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality.
>>>
>>> Not saying that releasing 2.4 isn't worth it, but there have been
>>> stops and
>>> starts all along the way, and I think we need to be clear on what we
>>> expect 2.4 to be. Until then, we have no clear defining line on when
>>> 2.4 is "done."
>>
>> Is there anything additional that we want v2.4 to do over and above what
>> it does now?
>>
> 
> Well, that's the question, isn't it? I can't align the idea
> of trunk being a candidate for 2.4 and trunk being a place for
> people to experiment...
> 
> What do we want 2.4 to be and do. And how.
> 
> Once we define (and agree) to that, we know how close (or far)
> trunk is. It sounds like we have some set that wants to break
> trunk apart and totally refactor a lot of it, and that's a big +1.
> It's also not a 3-4 month effort :) It also sounds like there
> are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared
> to 2.4, and a big +1 to that as well. But BOTH of these are using
> the exact same dev branch, and there's no general agreement on which
> we want... if you get my point ;)
> 
> If we branch off 2.4 right now from trunk, and say that this becomes
> our next main release, and the idea is to clean up what is there,
> and, for new experimental stuff, develop on trunk 1st and then
> backport to 2.4, I'll jump right on in, since it means code will
> be out and released and *used* sooner!

Again a full +1 to all this.

Regards

Rüdiger



Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 4:20 PM, Graham Leggett wrote:


Jim Jagielski wrote:

I'll stop worrying about 2.2 when 2.4 comes closer to being a  
reality.


Not saying that releasing 2.4 isn't worth it, but there have been  
stops and

starts all along the way, and I think we need to be clear on what we
expect 2.4 to be. Until then, we have no clear defining line on when
2.4 is "done."


Is there anything additional that we want v2.4 to do over and above  
what

it does now?



Well, that's the question, isn't it? I can't align the idea
of trunk being a candidate for 2.4 and trunk being a place for
people to experiment...

What do we want 2.4 to be and do. And how.

Once we define (and agree) to that, we know how close (or far)
trunk is. It sounds like we have some set that wants to break
trunk apart and totally refactor a lot of it, and that's a big +1.
It's also not a 3-4 month effort :) It also sounds like there
are people who want 2.4 to be an upgrade to 2.2 as 2.2 was compared
to 2.4, and a big +1 to that as well. But BOTH of these are using
the exact same dev branch, and there's no general agreement on which
we want... if you get my point ;)

If we branch off 2.4 right now from trunk, and say that this becomes
our next main release, and the idea is to clean up what is there,
and, for new experimental stuff, develop on trunk 1st and then
backport to 2.4, I'll jump right on in, since it means code will
be out and released and *used* sooner!



Re: Calling usage() from the rewrite args hook?

2009-05-06 Thread Ruediger Pluem


On 05/06/2009 10:13 PM, Rainer Jung wrote:
> On 06.05.2009 21:33, Rainer Jung wrote:
>> While working on additional windows commandline options I noticed, that
>> there is no consistent checking for validity of the "-k" arguments.
>>
>> Those arguments are handled by the rewrite args hook, and some MPMs seem
>> to care somehow about invalid or duplicate "-k" arguments (e.g. Unix
>> outputs a somewhat wrong error message about -k being unknown), windows
>> seems to start the service when an unknown command is given.
>>
>> I would like to output a better error message for unknown or duplicate
>> commands (easy) and then call usage(). At the moment usage() is static
>> in main.c and there is not a single header file directly in the server/
>> directory.
>>
>> So I would like to add a private header file in the server directory, at
>> the moment only containing usage() and switch usage() from static to
>> AP_DECLARE. Would that be right? I guess we have no reason to include it
>> in the public include files contained in includes/.
>>
>> Another possibility would be to let the rewrite args hook return a
>> value, stop processing as soon as one module doesn't return OK and call
>> the usage whenever rewrite args for a module do not return OK. I think
>> that's a bit fragile, because in the long run, more return values might
>> show up and cases, where usage() is not the right way to react, so
>> making usage() available in the mpm-specific module argument handling
>> seems to be the better way.
>>
>> Of course the usage message at the moment doesn't really reflect the MPM
>> architecture. The commandline specialities are reflected by ifdefs in
>> the usage, but that's another topic. For the above needed hardening of
>> parsing the -k flags, the usage message as it exists today is enough.
>>
>> Comments?
> 
> While experimenting with that: on Windows mpm_winnt.c can't use
> something in main.c, only vice versa. So usage() would have to be moved
> from main.c into any other file included in libhttpd (or a new one).
> Still the question is: is this the right way to go? If so, should I add
> a small new file with usage(), since the names of the existing ones do
> not really make a good fit?

How about the existing util.c?

Regards

Rüdiger



Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Ruediger Pluem


On 05/06/2009 10:09 PM, Jim Jagielski wrote:
> 
> On May 6, 2009, at 3:32 PM, William A. Rowe, Jr. wrote:
> 
>>
>> We should experiment freely on trunk/ to come up with the right
>> solutions,
>> and also freely discard those solutions from the next release branch. 
>> But
>> we shouldn't throw changes willy nilly over to 2.2, but as Paul says,
>> let's
>> focus on making 2.4 "the best available version of Apache" with all of
>> the
>> new facilities that come with it.
>>
> 
> What new facilities? Are we moving to serf, for example?
> 
> It's for these reason that I suggested that before we start breaking
> trunk by adding stuff willy-nilly we branch off a 2.4 tree... If we
> would say "this is 2.4... clean it up and fix it" then we could get
> 2.4 out soon. Instead, we have trunk being both a experimental
> sandbox for new stuff and the *only* place these will see the light
> anytime soon is in backporting to 2.2.
> 
> If we are serious about 2.4, we branch now. trunk remains the dev branch
> and we clean-up the 2.4 branch and backport from trunk to 2.4...
> We cannot "experiment freely" on trunk and at the same time try
> to focus down a 2.4 release...
> 
> 

+1

Regards

Rüdiger


Re: svn commit: r771998

2009-05-06 Thread Ruediger Pluem


On 05/06/2009 09:54 PM, William A. Rowe, Jr. wrote:
> Plüm, Rüdiger, VF-Group wrote:
>> This causes trunk to fail compilation with:
>>
>> make[1]: *** No rule to make target `modules/mappers/libmod_so.la', needed 
>> by `httpd'.  Stop.
>> make: *** [all-recursive] Error 1
> 
> Please don't do that, you have everyone chasing down if they owned 771998.
> At least leave a log message with your reply to spare most folks attention.

Sorry for that I didn't have the original svn mail at hand at this point of 
time.
Will improve my communication next time in the same situation.

> 
> ITMT; I'm reviewing.  I didn't see this on retesting linux, will try clean
> on Solaris just for fun (and for the fact that I think I have this VM ready
> for some action).
> 
> You did re-./buildconf, make clean and make, right?

Sure. To be more precise I did

make extraclean
./buidlconf
./config.nice
make

The issue happend on RHEL 4 64 Bit.
The same happens on SuSE 10.2 32 Bit.

Regards

Rüdiger




Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread William A. Rowe, Jr.
Graham Leggett wrote:
> Jim Jagielski wrote:
> 
>> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality.
>>
>> Not saying that releasing 2.4 isn't worth it, but there have been stops and
>> starts all along the way, and I think we need to be clear on what we
>> expect 2.4 to be. Until then, we have no clear defining line on when
>> 2.4 is "done."
> 
> Is there anything additional that we want v2.4 to do over and above what
> it does now?

Conversely, anything that isn't done correctly yet in v2.4 that must be
refactored, or anything that is done in v2.4 that we simply wouldn't want
to release with 2.4.0?




Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Graham Leggett
Jim Jagielski wrote:

> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality.
> 
> Not saying that releasing 2.4 isn't worth it, but there have been stops and
> starts all along the way, and I think we need to be clear on what we
> expect 2.4 to be. Until then, we have no clear defining line on when
> 2.4 is "done."

Is there anything additional that we want v2.4 to do over and above what
it does now?

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Graham Leggett
Paul Querna wrote:

> Stop worrying about 2.2, and just focus on doing it right -- then ship
> 2.4 in 3-4 months imo, trunk really isn't that far off from being a
> decent 2.4, it just needs some cleanup in a few areas. It has already
> been 3.5 years since 2.2.0 came out, its time to move on in my
> opinion.

+1.

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Calling usage() from the rewrite args hook?

2009-05-06 Thread Rainer Jung
On 06.05.2009 21:33, Rainer Jung wrote:
> While working on additional windows commandline options I noticed, that
> there is no consistent checking for validity of the "-k" arguments.
> 
> Those arguments are handled by the rewrite args hook, and some MPMs seem
> to care somehow about invalid or duplicate "-k" arguments (e.g. Unix
> outputs a somewhat wrong error message about -k being unknown), windows
> seems to start the service when an unknown command is given.
> 
> I would like to output a better error message for unknown or duplicate
> commands (easy) and then call usage(). At the moment usage() is static
> in main.c and there is not a single header file directly in the server/
> directory.
> 
> So I would like to add a private header file in the server directory, at
> the moment only containing usage() and switch usage() from static to
> AP_DECLARE. Would that be right? I guess we have no reason to include it
> in the public include files contained in includes/.
> 
> Another possibility would be to let the rewrite args hook return a
> value, stop processing as soon as one module doesn't return OK and call
> the usage whenever rewrite args for a module do not return OK. I think
> that's a bit fragile, because in the long run, more return values might
> show up and cases, where usage() is not the right way to react, so
> making usage() available in the mpm-specific module argument handling
> seems to be the better way.
> 
> Of course the usage message at the moment doesn't really reflect the MPM
> architecture. The commandline specialities are reflected by ifdefs in
> the usage, but that's another topic. For the above needed hardening of
> parsing the -k flags, the usage message as it exists today is enough.
> 
> Comments?

While experimenting with that: on Windows mpm_winnt.c can't use
something in main.c, only vice versa. So usage() would have to be moved
from main.c into any other file included in libhttpd (or a new one).
Still the question is: is this the right way to go? If so, should I add
a small new file with usage(), since the names of the existing ones do
not really make a good fit?

Regards,

Rainer


Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 3:32 PM, William A. Rowe, Jr. wrote:



We should experiment freely on trunk/ to come up with the right  
solutions,
and also freely discard those solutions from the next release  
branch.  But
we shouldn't throw changes willy nilly over to 2.2, but as Paul  
says, let's
focus on making 2.4 "the best available version of Apache" with all  
of the

new facilities that come with it.



What new facilities? Are we moving to serf, for example?

It's for these reason that I suggested that before we start breaking
trunk by adding stuff willy-nilly we branch off a 2.4 tree... If we
would say "this is 2.4... clean it up and fix it" then we could get
2.4 out soon. Instead, we have trunk being both a experimental
sandbox for new stuff and the *only* place these will see the light
anytime soon is in backporting to 2.2.

If we are serious about 2.4, we branch now. trunk remains the dev branch
and we clean-up the 2.4 branch and backport from trunk to 2.4...
We cannot "experiment freely" on trunk and at the same time try
to focus down a 2.4 release...



Re: svn commit: r771998

2009-05-06 Thread William A. Rowe, Jr.
Plüm, Rüdiger, VF-Group wrote:
> 
> This causes trunk to fail compilation with:
> 
> make[1]: *** No rule to make target `modules/mappers/libmod_so.la', needed by 
> `httpd'.  Stop.
> make: *** [all-recursive] Error 1

Please don't do that, you have everyone chasing down if they owned 771998.
At least leave a log message with your reply to spare most folks attention.

ITMT; I'm reviewing.  I didn't see this on retesting linux, will try clean
on Solaris just for fun (and for the fact that I think I have this VM ready
for some action).

You did re-./buildconf, make clean and make, right?




Calling usage() from the rewrite args hook?

2009-05-06 Thread Rainer Jung
While working on additional windows commandline options I noticed, that
there is no consistent checking for validity of the "-k" arguments.

Those arguments are handled by the rewrite args hook, and some MPMs seem
to care somehow about invalid or duplicate "-k" arguments (e.g. Unix
outputs a somewhat wrong error message about -k being unknown), windows
seems to start the service when an unknown command is given.

I would like to output a better error message for unknown or duplicate
commands (easy) and then call usage(). At the moment usage() is static
in main.c and there is not a single header file directly in the server/
directory.

So I would like to add a private header file in the server directory, at
the moment only containing usage() and switch usage() from static to
AP_DECLARE. Would that be right? I guess we have no reason to include it
in the public include files contained in includes/.

Another possibility would be to let the rewrite args hook return a
value, stop processing as soon as one module doesn't return OK and call
the usage whenever rewrite args for a module do not return OK. I think
that's a bit fragile, because in the long run, more return values might
show up and cases, where usage() is not the right way to react, so
making usage() available in the mpm-specific module argument handling
seems to be the better way.

Of course the usage message at the moment doesn't really reflect the MPM
architecture. The commandline specialities are reflected by ifdefs in
the usage, but that's another topic. For the above needed hardening of
parsing the -k flags, the usage message as it exists today is enough.

Comments?

Regards,

Rainer



Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread William A. Rowe, Jr.
Jim Jagielski wrote:
> 
> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality.
> 
> Not saying that releasing 2.4 isn't worth it, but there have been stops and
> starts all along the way, and I think we need to be clear on what we
> expect 2.4 to be. Until then, we have no clear defining line on when
> 2.4 is "done."

Nice of the mod_proxy pot calling the httpd kettle black :-)


Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread William A. Rowe, Jr.
Paul Querna wrote:
> On Wed, May 6, 2009 at 11:46 AM, Jim Jagielski  wrote:
>>
>> The incremental changes are just so we can keep 2.2's proxy somewhat
>> useful and flexible enough to survive until the next revamp.
> 
> Stop worrying about 2.2, and just focus on doing it right -- then ship
> 2.4 in 3-4 months imo, trunk really isn't that far off from being a
> decent 2.4, it just needs some cleanup in a few areas. It has already
> been 3.5 years since 2.2.0 came out, its time to move on in my
> opinion.

+1.  If there is a 2.2 /bug/ let's fix it.  If the mod_proxy crew wants to
offer a proof-of concept, 2.4 preview as its own download, terrific!  But
I'm quite -1 to fundamentally modifying the structure or feature set that
ships for 2.2-stable.  The recent history in 2.2 of de-stable-izing changes
to the released branch, and incomplete features, leaves me pretty frustrated
with the lack of review.

Before /not/ voting -1 to bringing in these major changes, I'd need to see
issues.apache.org seriously purged of its 95 incidents, many of them new and
most of them actively triaged by our users@ community and some dedicated
d...@s.  I'd need to see balancer /actually documented/ in a way that users
can read and that is partitioned by the module that is loaded.

We should experiment freely on trunk/ to come up with the right solutions,
and also freely discard those solutions from the next release branch.  But
we shouldn't throw changes willy nilly over to 2.2, but as Paul says, let's
focus on making 2.4 "the best available version of Apache" with all of the
new facilities that come with it.

What is especially scary is that the providers are still all using mod_proxy
for their entire config schema, for directives which make no sense to any
other proxy provider.  It would be great to see some serious cleanup, in
addition to all the enthusiasm to expand mod_proxy.



Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Rainer Jung
On 06.05.2009 20:26, Paul Querna wrote:
> There is lots of discussion about fixing mod_proxy and
> mod_proxy_balancer, to try to make it do things that the APIs are just
> broken for, and right now, it seems from the outside to be turning
> into a ball of mud.
> 
> I think the right way to frame the discussion is, how should the API
> optimally be structured -- then change the existing one to be closer
> to it, rather than the barrage of incremental changes that seem to be
> creating lots of cruft, and ending up with something that still
> doesn't do what we want.
> 
> I think mod_proxy's decisions on what to proxy to, and where, should
> be designed as a series of hooks/providers, specifically:
> 
> 1) Provider for a list of backends -- This provider does nothing with
> balancing, just provides a list of Backend Definition (preferably just
> keep it apr_sockaddr_t?) that a Connection is able to use.  -- Backend
> status via multicast or other methods go here.
> 2) Provider that _sorts_ the list of backends.  Input is a list,
> output is a new ordered list.  -- Sticky sesions go here, along with
> any load based balancing.
> 3) Provider that given a Backend Definition, returns a connection.
> (pools connections, or open new one, whatever)  -- Some of the
> proxy_util and massive worker objects go here.
> 
> Using this structure, you can implement a dynamic load balancer
> without having to modify the core.  I think the key is to _stop_
> passing around the gigantic monolithic proxy_worker structures, and go
> to having providers that do simple operations: get a list, sort the
> list, get me a connection.
> 
> Thoughts?

Sounds good. The provider in 2) needs a second functionality/API to feed
back the results of a request:

- whether the backend was detected as being broken

- when using piggybacked load data or (as we do today) locally generated
load data the updates must be done against the provider 2) after the
response has been received (or in the case of busyness once before
forwarding the request and once after receiving the response)

Those providers look similar to what I called "topology manager" and
"state manager" and you want to include the balancing/stickyness
decision into the state manager. My remark above somehow indicates, that
the provider 2) needs to make the decision *and* to update the data ón
which the decision is based. This data update could happen behind the
scenes, but in most cases it will need an API to be driven by the
request handling component.

Regards,

Rainer



Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Paul Querna
On Wed, May 6, 2009 at 12:04 PM, Jim Jagielski  wrote:
>
> On May 6, 2009, at 2:53 PM, Paul Querna wrote:
>
>>
>> Stop worrying about 2.2, and just focus on doing it right -- then ship
>> 2.4 in 3-4 months imo, trunk really isn't that far off from being a
>> decent 2.4, it just needs some cleanup in a few areas. It has already
>> been 3.5 years since 2.2.0 came out, its time to move on in my
>> opinion.
>>
>
> I'll stop worrying about 2.2 when 2.4 comes closer to being a reality.
>
> Not saying that releasing 2.4 isn't worth it, but there have been stops and
> starts all along the way, and I think we need to be clear on what we
> expect 2.4 to be. Until then, we have no clear defining line on when
> 2.4 is "done."

it can be done today, just start cutting alpha releases :)

i'm pretty close to ENOTIME though, so if someone wants to step up and
start RMing alphas that would be very nice...


Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 2:53 PM, Paul Querna wrote:



Stop worrying about 2.2, and just focus on doing it right -- then ship
2.4 in 3-4 months imo, trunk really isn't that far off from being a
decent 2.4, it just needs some cleanup in a few areas. It has already
been 3.5 years since 2.2.0 came out, its time to move on in my
opinion.



I'll stop worrying about 2.2 when 2.4 comes closer to being a reality.

Not saying that releasing 2.4 isn't worth it, but there have been  
stops and

starts all along the way, and I think we need to be clear on what we
expect 2.4 to be. Until then, we have no clear defining line on when
2.4 is "done."


Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Paul Querna
On Wed, May 6, 2009 at 11:46 AM, Jim Jagielski  wrote:
>
> On May 6, 2009, at 2:26 PM, Paul Querna wrote:
>
>> Hi,
>>
>> I think the right way to frame the discussion is, how should the API
>> optimally be structured -- then change the existing one to be closer
>> to it, rather than the barrage of incremental changes that seem to be
>> creating lots of cruft, and ending up with something that still
>> doesn't do what we want.
>>
>> I think mod_proxy's decisions on what to proxy to, and where, should
>> be designed as a series of hooks/providers, specifically:
>>
>> 1) Provider for a list of backends -- This provider does nothing with
>> balancing, just provides a list of Backend Definition (preferably just
>> keep it apr_sockaddr_t?) that a Connection is able to use.  -- Backend
>> status via multicast or other methods go here.
>> 2) Provider that _sorts_ the list of backends.  Input is a list,
>> output is a new ordered list.  -- Sticky sesions go here, along with
>> any load based balancing.
>> 3) Provider that given a Backend Definition, returns a connection.
>> (pools connections, or open new one, whatever)  -- Some of the
>> proxy_util and massive worker objects go here.
>>
>
> I recall at one of the hackthons this being proposed and I
> think it's the right one... It's a clear separation of functions
> similar to the changes we've done in authn/authz, moving from
> monolithic to more structured and defined concerns.
>
> The incremental changes are just so we can keep 2.2's proxy somewhat
> useful and flexible enough to survive until the next revamp.

Stop worrying about 2.2, and just focus on doing it right -- then ship
2.4 in 3-4 months imo, trunk really isn't that far off from being a
decent 2.4, it just needs some cleanup in a few areas. It has already
been 3.5 years since 2.2.0 came out, its time to move on in my
opinion.


Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 2:26 PM, Paul Querna wrote:


Hi,

I think the right way to frame the discussion is, how should the API
optimally be structured -- then change the existing one to be closer
to it, rather than the barrage of incremental changes that seem to be
creating lots of cruft, and ending up with something that still
doesn't do what we want.

I think mod_proxy's decisions on what to proxy to, and where, should
be designed as a series of hooks/providers, specifically:

1) Provider for a list of backends -- This provider does nothing with
balancing, just provides a list of Backend Definition (preferably just
keep it apr_sockaddr_t?) that a Connection is able to use.  -- Backend
status via multicast or other methods go here.
2) Provider that _sorts_ the list of backends.  Input is a list,
output is a new ordered list.  -- Sticky sesions go here, along with
any load based balancing.
3) Provider that given a Backend Definition, returns a connection.
(pools connections, or open new one, whatever)  -- Some of the
proxy_util and massive worker objects go here.



I recall at one of the hackthons this being proposed and I
think it's the right one... It's a clear separation of functions
similar to the changes we've done in authn/authz, moving from
monolithic to more structured and defined concerns.

The incremental changes are just so we can keep 2.2's proxy somewhat
useful and flexible enough to survive until the next revamp.



Re: mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Graham Leggett
Paul Querna wrote:

> Using this structure, you can implement a dynamic load balancer
> without having to modify the core.  I think the key is to _stop_
> passing around the gigantic monolithic proxy_worker structures, and go
> to having providers that do simple operations: get a list, sort the
> list, get me a connection.
> 
> Thoughts?

+1 on all of it.

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


mod_proxy hooks for clustering and load balancing

2009-05-06 Thread Paul Querna
Hi,

There is lots of discussion about fixing mod_proxy and
mod_proxy_balancer, to try to make it do things that the APIs are just
broken for, and right now, it seems from the outside to be turning
into a ball of mud.

I think the right way to frame the discussion is, how should the API
optimally be structured -- then change the existing one to be closer
to it, rather than the barrage of incremental changes that seem to be
creating lots of cruft, and ending up with something that still
doesn't do what we want.

I think mod_proxy's decisions on what to proxy to, and where, should
be designed as a series of hooks/providers, specifically:

1) Provider for a list of backends -- This provider does nothing with
balancing, just provides a list of Backend Definition (preferably just
keep it apr_sockaddr_t?) that a Connection is able to use.  -- Backend
status via multicast or other methods go here.
2) Provider that _sorts_ the list of backends.  Input is a list,
output is a new ordered list.  -- Sticky sesions go here, along with
any load based balancing.
3) Provider that given a Backend Definition, returns a connection.
(pools connections, or open new one, whatever)  -- Some of the
proxy_util and massive worker objects go here.

Using this structure, you can implement a dynamic load balancer
without having to modify the core.  I think the key is to _stop_
passing around the gigantic monolithic proxy_worker structures, and go
to having providers that do simple operations: get a list, sort the
list, get me a connection.

Thoughts?

Thanks,

Paul


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 1:00 PM, William A. Rowe, Jr. wrote:


Jim Jagielski wrote:


That's why oob-like health-and-status chatter is nice, because
it doesn't interfere with the normal reverse-proxy/host logic.


+1, for a backend of unknown status (let's just say it's a few minutes
old, effectively useless information now) ping/pong is the right first
approach.  But...


An idea: Instead of asking for this info before sending the
request, what about the backend sending it as part of the response,
as a response header. You don't know that status of the machine
"now", but you do know the status of it right after it handled the  
last

request (the last time you saw it) and, assuming nothing else touched
it, that status is likely still "good".


Yup; that seems like the only sane approach, add an X-Backend-Status  
or

whatnot to report the load or other health data.


For example, how long it took me (the backend server) to handle
this request... would be useful to know *that* in additional to
the typical "round-trip" time :)



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread William A. Rowe, Jr.
Jim Jagielski wrote:
> 
> That's why oob-like health-and-status chatter is nice, because
> it doesn't interfere with the normal reverse-proxy/host logic.

+1, for a backend of unknown status (let's just say it's a few minutes
old, effectively useless information now) ping/pong is the right first
approach.  But...

> An idea: Instead of asking for this info before sending the
> request, what about the backend sending it as part of the response,
> as a response header. You don't know that status of the machine
> "now", but you do know the status of it right after it handled the last
> request (the last time you saw it) and, assuming nothing else touched
> it, that status is likely still "good".

Yup; that seems like the only sane approach, add an X-Backend-Status or
whatnot to report the load or other health data.  It's easily consumed
(erased) from the front end response.  If done correctly in a backend
server, it can convey information from the ultimate back end resources
that actually cause the congestion (DB servers or whatnot) rather than
the default response (CPU or whatnot) at the middle tier..




Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 11:15 AM, jean-frederic clere wrote:


Jim Jagielski wrote:




In that case, we could keep the trunk dir structure for any
"extra" balancers we may add in the 2.2 tree and move the old
balancer code back into mod_proxy_balancers.c (or, even better,
as sep files that aren't sub-modules :) )


+1



This is now done... if we add any other balancers, we can put
them in the ./balancers/ subdir; the current 3 are in sep files
for ease of backporting but are not sub-modules but rather
linked support files ;)



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread jean-frederic clere

Jess Holle wrote:

jean-frederic clere wrote:
Should general support for a query URL be provided in 
mod_proxy_balancer?  Or should this be left to mod_cluster?

Can you explain more? I don't get the question.

What I mean is

   1. Should mod_proxy_balancer be extended to provide a balancer
  algorithm in which one specifies a backend URL that will provide a
  single numeric health metric, throttle the number of such requests
  via a time-to-live associated with this information, and balance
  on this basis or
   2. Should mod_cluster handle this issue?
   3. Or both?
  * For instance, mod_cluster might leverage special nuances in
AJP, JBoss, and Tomcat, whereas mod_proxy_balancer might
provide more generic support for helath checks on any back
end server that can expose a health metric URL.

 >From your response below, it sounds like you're saying it's #2, which 
is /largely /fine and good -- but this raises questions:


   1. How general is the health check metric in mod_cluster?
  * I only care about Tomcat backends myself, but control over
the metric would be good.
   2. Does this require special JBoss nuggets in Tomcat?
  * I'd hope not, i.e. that this is a simple matter of a
pre-designated URL or a very simple standalone socket protocol.
   3. When will mod_cluster support health metric based balancing of Tomcat?
   4. How "disruptive" to an existing configuration using
  mod_proxy_balancer/mod_proxy_ajp is mod_cluster?
  * How much needs to be changed?
   5. How portable is the mod_cluster code?
  * Does it build on Windows?  HPUX?  AIX?


Please ask the mod_cluster questions in the 
mod_cluster-...@lists.jboss.org list. I will answer there.


Cheers

Jean-Frederic



I say this is largely fine and good as I'd like to see just the 
health-metric based balancing algorithm in Apache 2.2.x itself.
Does mod_cluster provide yet another approach top to bottom (separate 
than mod_jk and mod_proxy/mod_proxy_ajp)?
Mod_cluster is just a balancer for mod_proxy but due to the dynamic 
creation of balancers and workers it can't get in the httpd-trunk code 
right now.
 It would seem nice to me if mod_jk and/or mod_proxy_balancer could 
do health checks, but you have to draw the line somewhere on growing 
any given module and if mod_jk and mod_proxy_balancer are not going 
in that direction at some point mod_cluster may be in my future. 

--
Jess Holle





Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread jean-frederic clere

Jim Jagielski wrote:


On May 6, 2009, at 11:04 AM, Plüm, Rüdiger, VF-Group wrote:





-Ursprüngliche Nachricht-
Von: Jim Jagielski
Gesendet: Mittwoch, 6. Mai 2009 16:59
An: dev@httpd.apache.org
Betreff: Re: Backports from trunk to 2.2 proxy-balancers


On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote:



The problem is that this breaks existing configurations for 2.2.x
as the balancers are now in separate modules.


How so (the breakage, that is)?? You mean they requirement for
them to LoadModule them?


Exactly. This is an unpleasant surprise for someone updating an
existing installation from 2.2.a to 2.2.b.



In that case, we could keep the trunk dir structure for any
"extra" balancers we may add in the 2.2 tree and move the old
balancer code back into mod_proxy_balancers.c (or, even better,
as sep files that aren't sub-modules :) )


+1

Cheers

Jean-Frederic


Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Jim Jagielski

Just an update: Currently httpd-2.2-proxy contains the
latest trunk proxy code and the new sub-module layout and
passes all framework tests... I'll start, maybe after lunch,
movement to sub-files, not sub-modules, for the balancers.

On May 6, 2009, at 11:10 AM, Jim Jagielski wrote:


In that case, we could keep the trunk dir structure for any
"extra" balancers we may add in the 2.2 tree and move the old
balancer code back into mod_proxy_balancers.c (or, even better,
as sep files that aren't sub-modules :) )





Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Plüm, Rüdiger, VF-Group
 

> -Ursprüngliche Nachricht-
> Von: Jim Jagielski  
> Gesendet: Mittwoch, 6. Mai 2009 17:10
> An: dev@httpd.apache.org
> Betreff: Re: Backports from trunk to 2.2 proxy-balancers
> 
> 
> On May 6, 2009, at 11:04 AM, Plüm, Rüdiger, VF-Group wrote:
> 
> >
> >
> >> -Ursprüngliche Nachricht-
> >> Von: Jim Jagielski
> >> Gesendet: Mittwoch, 6. Mai 2009 16:59
> >> An: dev@httpd.apache.org
> >> Betreff: Re: Backports from trunk to 2.2 proxy-balancers
> >>
> >>
> >> On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote:
> >>
> >>>
> >>> The problem is that this breaks existing configurations for 2.2.x
> >>> as the balancers are now in separate modules.
> >>
> >> How so (the breakage, that is)?? You mean they requirement for
> >> them to LoadModule them?
> >
> > Exactly. This is an unpleasant surprise for someone updating an
> > existing installation from 2.2.a to 2.2.b.
> >
> 
> In that case, we could keep the trunk dir structure for any
> "extra" balancers we may add in the 2.2 tree and move the old
> balancer code back into mod_proxy_balancers.c (or, even better,
> as sep files that aren't sub-modules :) )
 
Sounds fine to me. I am not opposed to the directory structure and
separate files per se. Maybe we just link them into mod_proxy_balancer
on 2.2.x whereas we keep them as separate modules on trunk.

Regards

Rüdiger


Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 11:04 AM, Plüm, Rüdiger, VF-Group wrote:





-Ursprüngliche Nachricht-
Von: Jim Jagielski
Gesendet: Mittwoch, 6. Mai 2009 16:59
An: dev@httpd.apache.org
Betreff: Re: Backports from trunk to 2.2 proxy-balancers


On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote:



The problem is that this breaks existing configurations for 2.2.x
as the balancers are now in separate modules.


How so (the breakage, that is)?? You mean they requirement for
them to LoadModule them?


Exactly. This is an unpleasant surprise for someone updating an
existing installation from 2.2.a to 2.2.b.



In that case, we could keep the trunk dir structure for any
"extra" balancers we may add in the 2.2 tree and move the old
balancer code back into mod_proxy_balancers.c (or, even better,
as sep files that aren't sub-modules :) )



Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Plüm, Rüdiger, VF-Group
 

> -Ursprüngliche Nachricht-
> Von: Jim Jagielski 
> Gesendet: Mittwoch, 6. Mai 2009 16:59
> An: dev@httpd.apache.org
> Betreff: Re: Backports from trunk to 2.2 proxy-balancers
> 
> 
> On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote:
> 
> >
> > The problem is that this breaks existing configurations for 2.2.x
> > as the balancers are now in separate modules.
> 
> How so (the breakage, that is)?? You mean they requirement for
> them to LoadModule them?

Exactly. This is an unpleasant surprise for someone updating an
existing installation from 2.2.a to 2.2.b.

Regards

Rüdiger


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread jean-frederic clere

Jim Jagielski wrote:


On May 6, 2009, at 9:07 AM, Jess Holle wrote:


jean-frederic clere wrote:


Should general support for a query URL be provided in 
mod_proxy_balancer?  Or should this be left to mod_cluster?

Can you explain more? I don't get the question.

What I mean is
• Should mod_proxy_balancer be extended to provide a balancer 
algorithm in which one specifies a backend URL that will provide a 
single numeric health metric, throttle the number of such requests via 
a time-to-live associated with this information, and balance on this 
basis or

• Should mod_cluster handle this issue?
• Or both?


Please recall that, afaik, mod_cluster is not AL nor is it part
of Apache. So asking for direction for what is basically an external
project on the Apache httpd dev list is kinda weird :)


Yep there is a JBoss list for that: mod_cluster-...@lists.jboss.org



In any case, I think the hope of the ASF is that this capability is
part of httpd, and you can see, with mod_heartbeat and the like,
efforts in the direction.


Yes I am experimenting there too.

Cheers

Jean-Frederic


Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Plüm, Rüdiger, VF-Group
 

> -Ursprüngliche Nachricht-
> Von: jean-frederic clere 
> Gesendet: Mittwoch, 6. Mai 2009 16:40
> An: dev@httpd.apache.org
> Betreff: Re: Backports from trunk to 2.2 proxy-balancers
> 
> Plüm, Rüdiger, VF-Group wrote:
> >  
> > 
> >> -Ursprüngliche Nachricht-
> >> Von: Rainer Jung 
> >> Gesendet: Mittwoch, 6. Mai 2009 15:10
> >> An: dev@httpd.apache.org
> >> Betreff: Re: Backports from trunk to 2.2 proxy-balancers
> >>
> >> On 06.05.2009 14:39, Jim Jagielski wrote:
> >>> It would certainly be easier to maintain a 2.2-proxy 
> >> branch, with the
> >>> intent of it actually being folded *into* 2.2, if the 
> >> branch used the
> >>> same dir structure as trunk, that is, a separate directory 
> >> that includes
> >>> the balancer methods (as well as the config magic 
> >> associated with it).
> >>> However, if that will be a impediment to actually *getting* these
> >>> backports into 2.2, then I'm willing to keep the old structure...
> >>>
> >>> So my question is: if to be able to easily backport the 
> >> various trunk
> >>> proxy improvements into 2.2, we also need to backport the dir
> >>> structure as well, is that OK? I don't want to work down that
> >>> path only to have it wasted work because people think that such a
> >>> directory restructure doesn't make sense within a 2.2.x release.
> >>>
> >>> PS: NO, I am not considering this for 2.2.12! :)
> >> I guess at the heart of this is the question, how likely 
> we break some
> >> part of the users build process for 2.2.x. My feeling is, that the
> >> additional sub directory for the balancing method 
> implementations is a
> >> small change and users build process should not break due to this
> >> additional one directory.
> >>
> >> On the positive side apart from easier backports: the new 
> subdirectory
> >> might make people more curious on how to add a custom 
> >> balancing method,
> >> so we get a slightly better visibility for the existing 
> >> provider interface.
> > 
> > The problem is that this breaks existing configurations for 2.2.x
> > as the balancers are now in separate modules. Thus I am -0.5 on
> > backporting this directory structure to 2.2.x.
> 
> May be we could keep the file structure but change the logic 
> to the new one.
> For the external proxy_balancer_method we could detect old 
> and new ones 
> no? (We have the NULL for that).

How so?
The new structure makes them separate modules which require separate
LoadModule lines for each them. Thus existing configurations simply
get broken. IMHO the logic structure (them being providers) is not
different between 2.2.x and trunk.

Regards

Rüdiger



Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 9:54 AM, Plüm, Rüdiger, VF-Group wrote:



The problem is that this breaks existing configurations for 2.2.x
as the balancers are now in separate modules.


How so (the breakage, that is)?? You mean they requirement for
them to LoadModule them?




Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread jean-frederic clere

Plüm, Rüdiger, VF-Group wrote:
 


-Ursprüngliche Nachricht-
Von: Rainer Jung 
Gesendet: Mittwoch, 6. Mai 2009 15:10

An: dev@httpd.apache.org
Betreff: Re: Backports from trunk to 2.2 proxy-balancers

On 06.05.2009 14:39, Jim Jagielski wrote:
It would certainly be easier to maintain a 2.2-proxy 

branch, with the
intent of it actually being folded *into* 2.2, if the 

branch used the
same dir structure as trunk, that is, a separate directory 

that includes
the balancer methods (as well as the config magic 

associated with it).

However, if that will be a impediment to actually *getting* these
backports into 2.2, then I'm willing to keep the old structure...

So my question is: if to be able to easily backport the 

various trunk

proxy improvements into 2.2, we also need to backport the dir
structure as well, is that OK? I don't want to work down that
path only to have it wasted work because people think that such a
directory restructure doesn't make sense within a 2.2.x release.

PS: NO, I am not considering this for 2.2.12! :)

I guess at the heart of this is the question, how likely we break some
part of the users build process for 2.2.x. My feeling is, that the
additional sub directory for the balancing method implementations is a
small change and users build process should not break due to this
additional one directory.

On the positive side apart from easier backports: the new subdirectory
might make people more curious on how to add a custom 
balancing method,
so we get a slightly better visibility for the existing 
provider interface.


The problem is that this breaks existing configurations for 2.2.x
as the balancers are now in separate modules. Thus I am -0.5 on
backporting this directory structure to 2.2.x.


May be we could keep the file structure but change the logic to the new one.
For the external proxy_balancer_method we could detect old and new ones 
no? (We have the NULL for that).


Cheers

Jean-Frederic



Regards

Rüdiger





Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Plüm, Rüdiger, VF-Group
 

> -Ursprüngliche Nachricht-
> Von: Rainer Jung 
> Gesendet: Mittwoch, 6. Mai 2009 15:10
> An: dev@httpd.apache.org
> Betreff: Re: Backports from trunk to 2.2 proxy-balancers
> 
> On 06.05.2009 14:39, Jim Jagielski wrote:
> > It would certainly be easier to maintain a 2.2-proxy 
> branch, with the
> > intent of it actually being folded *into* 2.2, if the 
> branch used the
> > same dir structure as trunk, that is, a separate directory 
> that includes
> > the balancer methods (as well as the config magic 
> associated with it).
> > 
> > However, if that will be a impediment to actually *getting* these
> > backports into 2.2, then I'm willing to keep the old structure...
> > 
> > So my question is: if to be able to easily backport the 
> various trunk
> > proxy improvements into 2.2, we also need to backport the dir
> > structure as well, is that OK? I don't want to work down that
> > path only to have it wasted work because people think that such a
> > directory restructure doesn't make sense within a 2.2.x release.
> > 
> > PS: NO, I am not considering this for 2.2.12! :)
> 
> I guess at the heart of this is the question, how likely we break some
> part of the users build process for 2.2.x. My feeling is, that the
> additional sub directory for the balancing method implementations is a
> small change and users build process should not break due to this
> additional one directory.
> 
> On the positive side apart from easier backports: the new subdirectory
> might make people more curious on how to add a custom 
> balancing method,
> so we get a slightly better visibility for the existing 
> provider interface.

The problem is that this breaks existing configurations for 2.2.x
as the balancers are now in separate modules. Thus I am -0.5 on
backporting this directory structure to 2.2.x.

Regards

Rüdiger


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 9:23 AM, Jess Holle wrote:


You're right -- I was being weird.  Sorry.



No apology needed :)

I guess part of the reason for my asking was whether the ASF was  
basically saying "we're not chasing this problem, see mod_cluster  
folk if you need it solved" -- and, if so, hoping to get a little  
starting info as to what I'd be getting into chasing mod_cluster.


I'd like to see this capability in httpd itself -- or at least have  
it very easy to add in a very seamless fashion via a pluggable  
custom balancer algorithm (without other larger configuration side  
effects) -- and thus would hope the ASF sees this as within the  
scope of httpd's core suite of modules.




I think it's safe to say that there is enough interest here in the
httpd dev team for this capability to be part of httpd itself...

What we like to do is provide the basic implementation and capability
and allow others to build on top of that if needed.



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jess Holle

Jim Jagielski wrote:

On May 6, 2009, at 9:07 AM, Jess Holle wrote:

jean-frederic clere wrote:
Should general support for a query URL be provided in 
mod_proxy_balancer?  Or should this be left to mod_cluster?

Can you explain more? I don't get the question.

What I mean is
• Should mod_proxy_balancer be extended to provide a balancer 
algorithm in which one specifies a backend URL that will provide a 
single numeric health metric, throttle the number of such requests 
via a time-to-live associated with this information, and balance on 
this basis or

• Should mod_cluster handle this issue?
• Or both?

Please recall that, afaik, mod_cluster is not AL nor is it part
of Apache. So asking for direction for what is basically an external
project on the Apache httpd dev list is kinda weird :)

In any case, I think the hope of the ASF is that this capability is
part of httpd, and you can see, with mod_heartbeat and the like,
efforts in the direction.

But the world is big enough for different implementations...

You're right -- I was being weird.  Sorry.

I guess part of the reason for my asking was whether the ASF was 
basically saying "we're not chasing this problem, see mod_cluster folk 
if you need it solved" -- and, if so, hoping to get a little starting 
info as to what I'd be getting into chasing mod_cluster.


I'd like to see this capability in httpd itself -- or at least have it 
very easy to add in a very seamless fashion via a pluggable custom 
balancer algorithm (without other larger configuration side effects) -- 
and thus would hope the ASF sees this as within the scope of httpd's 
core suite of modules.


--
Jess Holle



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Rainer Jung
On 06.05.2009 15:08, Jim Jagielski wrote:
> 
> On May 6, 2009, at 4:35 AM, Jess Holle wrote:
> 
>>
>> Of course that redoes what a servlet engine would be doing and does so
>> with lower fidelity.  An ability to ask a backend for its current
>> session count and load balance new requests on that basis would be
>> really helpful.  Whether this ability is buried into AJP, for
>> instance, or is simply a separate request to a designated URL is
>> another question, but the latter approach seems fairly general and the
>> number of such requests could be throttled by a time-to-live setting
>> on the last such count obtained.
>>
>> Actually this could and should be generalized beyond active sessions
>> to a back-end health metric.  Each backend could compute and respond
>> with a relative measure of busyness/health and respond and the load
>> balancer could then balance new (session-less) requests to the least
>> busy / most healthy backend.  This would seem to be *huge* step
>> forward in load balancing capability/fidelity.
>>
> 
> The trick, of course, at least with HTTP, is that the querying of
> the backend is, of course, a request, and so one needs to worry about
> such things as keepalives and persistent connections, and how long
> do we wait for responses, etc...
> 
> That's why oob-like health-and-status chatter is nice, because
> it doesn't interfere with the normal reverse-proxy/host logic.
> 
> An idea: Instead of asking for this info before sending the
> request, what about the backend sending it as part of the response,
> as a response header. You don't know that status of the machine
> "now", but you do know the status of it right after it handled the last
> request (the last time you saw it) and, assuming nothing else touched
> it, that status is likely still "good". Latency will be an issue,
> of course... Overlapping requests where you don't have the response
> from req1 before you send req2 means that both requests think the
> server is at the same state, whereas of course, they aren't, but it
> may even out since req3, for example, (which happens after req1 is done)
> thinks that the backend has 2 concurrent requests, instead of the 1
> (req2) and so maybe isn't selected... The hysteresis would be interesting
> to model :)

I think asking each time before sending data is to much overhead in
general. Of course it depends on how accurate you try to distribute
load. I would expect, that in most situations the overhead for a per
request accurate decision does not pay off, especially when under high
load there is always a time window between getting the data and handling
the request, and a lot of concurrent requests will already again have
changed the data.

I expect in most cases a granularity of status data between once per
second and once per minute will be appropriate (still a factor of 60 to
decide or configure).

When sending the data back as part of the response: some load numbers
might be to expensive to retrieve like 500 times a second. Other load
numbers might not really make sense as a snapshot (per request), only as
an average value (like: what does CPU load as a snapshot mean? Since
your load data collecting code is on the CPU, a one CPU system will be
100% busy at this point in time. So CPU measurement mostly makes sense
as average values over relatively short intervals).

So we should already expect the backend to send data with is not
necessarily up-to-date w.r.t. each request. I would assume, that when
data comes with each response, one would use some sort of floating average.

Piggybacking will be easier to implement (no real protocol needed etc.),
out of band communication will be more flexible.

Regards,

Rainer


Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 9:09 AM, Rainer Jung wrote:


On 06.05.2009 14:39, Jim Jagielski wrote:

It would certainly be easier to maintain a 2.2-proxy branch, with the
intent of it actually being folded *into* 2.2, if the branch used the
same dir structure as trunk, that is, a separate directory that  
includes
the balancer methods (as well as the config magic associated with  
it).


However, if that will be a impediment to actually *getting* these
backports into 2.2, then I'm willing to keep the old structure...

So my question is: if to be able to easily backport the various trunk
proxy improvements into 2.2, we also need to backport the dir
structure as well, is that OK? I don't want to work down that
path only to have it wasted work because people think that such a
directory restructure doesn't make sense within a 2.2.x release.

PS: NO, I am not considering this for 2.2.12! :)


I guess at the heart of this is the question, how likely we break some
part of the users build process for 2.2.x. My feeling is, that the
additional sub directory for the balancing method implementations is a
small change and users build process should not break due to this
additional one directory.

On the positive side apart from easier backports: the new subdirectory
might make people more curious on how to add a custom balancing  
method,
so we get a slightly better visibility for the existing provider  
interface.




My thoughts as well... :)



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jess Holle

Jim Jagielski wrote:

On May 6, 2009, at 4:35 AM, Jess Holle wrote:
Of course that redoes what a servlet engine would be doing and does 
so with lower fidelity.  An ability to ask a backend for its current 
session count and load balance new requests on that basis would be 
really helpful.  Whether this ability is buried into AJP, for 
instance, or is simply a separate request to a designated URL is 
another question, but the latter approach seems fairly general and 
the number of such requests could be throttled by a time-to-live 
setting on the last such count obtained.


Actually this could and should be generalized beyond active sessions 
to a back-end health metric.  Each backend could compute and respond 
with a relative measure of busyness/health and respond and the load 
balancer could then balance new (session-less) requests to the least 
busy / most healthy backend.  This would seem to be *huge* step 
forward in load balancing capability/fidelity.

The trick, of course, at least with HTTP, is that the querying of
the backend is, of course, a request, and so one needs to worry about
such things as keepalives and persistent connections, and how long
do we wait for responses, etc...

That's why oob-like health-and-status chatter is nice, because
it doesn't interfere with the normal reverse-proxy/host logic.

An idea: Instead of asking for this info before sending the
request, what about the backend sending it as part of the response,
as a response header. You don't know that status of the machine
"now", but you do know the status of it right after it handled the last
request (the last time you saw it) and, assuming nothing else touched
it, that status is likely still "good". Latency will be an issue,
of course... Overlapping requests where you don't have the response
from req1 before you send req2 means that both requests think the
server is at the same state, whereas of course, they aren't, but it
may even out since req3, for example, (which happens after req1 is done)
thinks that the backend has 2 concurrent requests, instead of the 1
(req2) and so maybe isn't selected... The hysteresis would be interesting
to model :)

There's inherent hysteresis in this sort of thing.

Including health information (e.g. via a custom response header) on all 
responses is an interesting notion.


Exposing a URL on Apache through which the backend can push its health 
information (e.g. upon starting a new session or invalidating a session 
or detecting a low memory condition) also makes sense.


If these do not suffice a watchdog thread (as in mod_jk) could do 
periodic health checks on the backends in a separate thread or requests 
could pre-request health information for a backend if that backend's 
health information is sufficiently old.


There's lots of possibilities here.

--
Jess Holle



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 9:07 AM, Jess Holle wrote:


jean-frederic clere wrote:


Should general support for a query URL be provided in  
mod_proxy_balancer?  Or should this be left to mod_cluster?

Can you explain more? I don't get the question.

What I mean is
	• Should mod_proxy_balancer be extended to provide a balancer  
algorithm in which one specifies a backend URL that will provide a  
single numeric health metric, throttle the number of such requests  
via a time-to-live associated with this information, and balance on  
this basis or

• Should mod_cluster handle this issue?
• Or both?


Please recall that, afaik, mod_cluster is not AL nor is it part
of Apache. So asking for direction for what is basically an external
project on the Apache httpd dev list is kinda weird :)

In any case, I think the hope of the ASF is that this capability is
part of httpd, and you can see, with mod_heartbeat and the like,
efforts in the direction.

But the world is big enough for different implementations...

Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jess Holle

Rainer Jung wrote:

On 06.05.2009 14:35, jean-frederic clere wrote:
  

Jess Holle wrote:


Rainer Jung wrote:
  

Yes, I think the counter/aging discussion is for the baseline, i.e. when
we do not have any information channel to or from the backend nodes.

As soon as mod_cluster comes into play, we can use more up-to-date real
data and only need to decide how to interprete it and how to interpolate
during the update interval.
  


Should general support for a query URL be provided in
mod_proxy_balancer?  Or should this be left to mod_cluster?
  

Can you explain more? I don't get the question.



 Does mod_cluster provide yet another approach top to bottom (separate
than mod_jk and mod_proxy/mod_proxy_ajp)?
  

Mod_cluster is just a balancer for mod_proxy but due to the dynamic
creation of balancers and workers it can't get in the httpd-trunk code
right now.



 It would seem nice to me if mod_jk and/or mod_proxy_balancer could do
health checks, but you have to draw the line somewhere on growing any
given module and if mod_jk and mod_proxy_balancer are not going in
that direction at some point mod_cluster may be in my future.
  

Cool :-)



There are at several different sub systems, and as I understood
mod_cluster it already carefully separates them:

1) Dynamic topology detection (optional)

What are our backend nodes? If you do not want to statically configure
them, you need some mechanism based on either

- registration: backend nodes register at one or multiple topology
management nodes; the addresses of those are either configured, or they
announce themselves on the network via broad- or multicast).

- detection: topology manager receives broad- or multicast packets of
the backend nodes. They do not need to know the topology manager, only
the multicast address

More enhanced would be to already learn the forwarding rules (e.g. URLs
to map) from the backend nodes.

In the simpler case, the topology would be configured statically.

2) Dynamic state detection

a) Livelyness
b) Load numbers

Both could be either polled by (maybe scalability issues) or pushed to a
state manager. Push could be done by tcp (the address could be sent to
the backend, once it was detected in 1) or defined statically). Maybe
one would use both ways, e.g. push for active state changes, like when
an admin stops a node, poll for state manager driven things. Not sure.

3) Balancing

Would be done based on the data collected by the state manager.

It's not clear at all, whether those three should be glued together
tightly, or kept in different pieces. I had the impression the general
direction is more about separating them and to allow multiple
experiments, like mod_cluster and mod_heartbeat.

The interaction would be done via some common data container, e.g.
slotmem or in a distributed (multiple Apaches) situation memcache or
similar.

Does this make sense?
  

Yes.

I've been working around #1 by using pre-designated port ranges for 
backends, e.g. configuring for balancing over a port range of 10 and 
only having a couple of servers running in this range at most given 
times.  That's fine as long as one quiets Apache's error logging so that 
it only complains about backends that are *newly* unreachable rather 
than complaining each time a backend is retried.  I supplied a patch for 
this some time back.


#2 and #3 are huge, however, and it would be good to see something firm 
rather than experimental in these areas sooner than later.


--
Jess Holle



Re: Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Rainer Jung
On 06.05.2009 14:39, Jim Jagielski wrote:
> It would certainly be easier to maintain a 2.2-proxy branch, with the
> intent of it actually being folded *into* 2.2, if the branch used the
> same dir structure as trunk, that is, a separate directory that includes
> the balancer methods (as well as the config magic associated with it).
> 
> However, if that will be a impediment to actually *getting* these
> backports into 2.2, then I'm willing to keep the old structure...
> 
> So my question is: if to be able to easily backport the various trunk
> proxy improvements into 2.2, we also need to backport the dir
> structure as well, is that OK? I don't want to work down that
> path only to have it wasted work because people think that such a
> directory restructure doesn't make sense within a 2.2.x release.
> 
> PS: NO, I am not considering this for 2.2.12! :)

I guess at the heart of this is the question, how likely we break some
part of the users build process for 2.2.x. My feeling is, that the
additional sub directory for the balancing method implementations is a
small change and users build process should not break due to this
additional one directory.

On the positive side apart from easier backports: the new subdirectory
might make people more curious on how to add a custom balancing method,
so we get a slightly better visibility for the existing provider interface.

Regards,

Rainer



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jim Jagielski

FWIW, I've been looking into using Tribes for httpd.


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jim Jagielski


On May 6, 2009, at 4:35 AM, Jess Holle wrote:



Of course that redoes what a servlet engine would be doing and does  
so with lower fidelity.  An ability to ask a backend for its current  
session count and load balance new requests on that basis would be  
really helpful.  Whether this ability is buried into AJP, for  
instance, or is simply a separate request to a designated URL is  
another question, but the latter approach seems fairly general and  
the number of such requests could be throttled by a time-to-live  
setting on the last such count obtained.


Actually this could and should be generalized beyond active sessions  
to a back-end health metric.  Each backend could compute and respond  
with a relative measure of busyness/health and respond and the load  
balancer could then balance new (session-less) requests to the least  
busy / most healthy backend.  This would seem to be *huge* step  
forward in load balancing capability/fidelity.




The trick, of course, at least with HTTP, is that the querying of
the backend is, of course, a request, and so one needs to worry about
such things as keepalives and persistent connections, and how long
do we wait for responses, etc...

That's why oob-like health-and-status chatter is nice, because
it doesn't interfere with the normal reverse-proxy/host logic.

An idea: Instead of asking for this info before sending the
request, what about the backend sending it as part of the response,
as a response header. You don't know that status of the machine
"now", but you do know the status of it right after it handled the last
request (the last time you saw it) and, assuming nothing else touched
it, that status is likely still "good". Latency will be an issue,
of course... Overlapping requests where you don't have the response
from req1 before you send req2 means that both requests think the
server is at the same state, whereas of course, they aren't, but it
may even out since req3, for example, (which happens after req1 is done)
thinks that the backend has 2 concurrent requests, instead of the 1
(req2) and so maybe isn't selected... The hysteresis would be  
interesting

to model :)


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jess Holle

jean-frederic clere wrote:
Should general support for a query URL be provided in 
mod_proxy_balancer?  Or should this be left to mod_cluster?

Can you explain more? I don't get the question.

What I mean is

  1. Should mod_proxy_balancer be extended to provide a balancer
 algorithm in which one specifies a backend URL that will provide a
 single numeric health metric, throttle the number of such requests
 via a time-to-live associated with this information, and balance
 on this basis or
  2. Should mod_cluster handle this issue?
  3. Or both?
 * For instance, mod_cluster might leverage special nuances in
   AJP, JBoss, and Tomcat, whereas mod_proxy_balancer might
   provide more generic support for helath checks on any back
   end server that can expose a health metric URL.

From your response below, it sounds like you're saying it's #2, which 
is /largely /fine and good -- but this raises questions:


  1. How general is the health check metric in mod_cluster?
 * I only care about Tomcat backends myself, but control over
   the metric would be good.
  2. Does this require special JBoss nuggets in Tomcat?
 * I'd hope not, i.e. that this is a simple matter of a
   pre-designated URL or a very simple standalone socket protocol.
  3. When will mod_cluster support health metric based balancing of Tomcat?
  4. How "disruptive" to an existing configuration using
 mod_proxy_balancer/mod_proxy_ajp is mod_cluster?
 * How much needs to be changed?
  5. How portable is the mod_cluster code?
 * Does it build on Windows?  HPUX?  AIX?

I say this is largely fine and good as I'd like to see just the 
health-metric based balancing algorithm in Apache 2.2.x itself.
Does mod_cluster provide yet another approach top to bottom (separate 
than mod_jk and mod_proxy/mod_proxy_ajp)?
Mod_cluster is just a balancer for mod_proxy but due to the dynamic 
creation of balancers and workers it can't get in the httpd-trunk code 
right now.
 It would seem nice to me if mod_jk and/or mod_proxy_balancer could 
do health checks, but you have to draw the line somewhere on growing 
any given module and if mod_jk and mod_proxy_balancer are not going 
in that direction at some point mod_cluster may be in my future. 

--
Jess Holle



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Rainer Jung
On 06.05.2009 14:35, jean-frederic clere wrote:
> Jess Holle wrote:
>> Rainer Jung wrote:
>>> Yes, I think the counter/aging discussion is for the baseline, i.e. when
>>> we do not have any information channel to or from the backend nodes.
>>>
>>> As soon as mod_cluster comes into play, we can use more up-to-date real
>>> data and only need to decide how to interprete it and how to interpolate
>>> during the update interval.
>>>   
>> Should general support for a query URL be provided in
>> mod_proxy_balancer?  Or should this be left to mod_cluster?
> 
> Can you explain more? I don't get the question.
> 
>>  Does mod_cluster provide yet another approach top to bottom (separate
>> than mod_jk and mod_proxy/mod_proxy_ajp)?
> 
> Mod_cluster is just a balancer for mod_proxy but due to the dynamic
> creation of balancers and workers it can't get in the httpd-trunk code
> right now.
> 
>>  It would seem nice to me if mod_jk and/or mod_proxy_balancer could do
>> health checks, but you have to draw the line somewhere on growing any
>> given module and if mod_jk and mod_proxy_balancer are not going in
>> that direction at some point mod_cluster may be in my future.
> 
> Cool :-)

There are at several different sub systems, and as I understood
mod_cluster it already carefully separates them:

1) Dynamic topology detection (optional)

What are our backend nodes? If you do not want to statically configure
them, you need some mechanism based on either

- registration: backend nodes register at one or multiple topology
management nodes; the addresses of those are either configured, or they
announce themselves on the network via broad- or multicast).

- detection: topology manager receives broad- or multicast packets of
the backend nodes. They do not need to know the topology manager, only
the multicast address

More enhanced would be to already learn the forwarding rules (e.g. URLs
to map) from the backend nodes.

In the simpler case, the topology would be configured statically.

2) Dynamic state detection

a) Livelyness
b) Load numbers

Both could be either polled by (maybe scalability issues) or pushed to a
state manager. Push could be done by tcp (the address could be sent to
the backend, once it was detected in 1) or defined statically). Maybe
one would use both ways, e.g. push for active state changes, like when
an admin stops a node, poll for state manager driven things. Not sure.

3) Balancing

Would be done based on the data collected by the state manager.

It's not clear at all, whether those three should be glued together
tightly, or kept in different pieces. I had the impression the general
direction is more about separating them and to allow multiple
experiments, like mod_cluster and mod_heartbeat.

The interaction would be done via some common data container, e.g.
slotmem or in a distributed (multiple Apaches) situation memcache or
similar.

Does this make sense?

Regards,

Rainer


Backports from trunk to 2.2 proxy-balancers

2009-05-06 Thread Jim Jagielski

It would certainly be easier to maintain a 2.2-proxy branch, with the
intent of it actually being folded *into* 2.2, if the branch used the
same dir structure as trunk, that is, a separate directory that includes
the balancer methods (as well as the config magic associated with it).

However, if that will be a impediment to actually *getting* these
backports into 2.2, then I'm willing to keep the old structure...

So my question is: if to be able to easily backport the various trunk
proxy improvements into 2.2, we also need to backport the dir
structure as well, is that OK? I don't want to work down that
path only to have it wasted work because people think that such a
directory restructure doesn't make sense within a 2.2.x release.

PS: NO, I am not considering this for 2.2.12! :)


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread jean-frederic clere

Jess Holle wrote:

Rainer Jung wrote:

An ability to balance based on new sessions with an idle time out on
such sessions would be close enough to reality in cases where sessions
expire rather than being explicitly invalidated (e.g. by a logout).


But then we end up in a stateful situation. This is a serious design
decision. If we want to track idleness for sessions, we need to track a
list of sessions (session ids) the balancer has seen. This makes things
much more complex. Combined with the non-ability to track logouts and
the errors coming in form a global situation (more than one Apache
instance), I think it will be more of a problem than a solution.
  

The more I think about this the more I agree.

 >From the start I preferred the session/health query to the back-end 
with a time-to-live, on further consideration I *greatly* prefer this 
approach.

Of course that redoes what a servlet engine would be doing and does so
with lower fidelity.  An ability to ask a backend for its current
session count and load balance new requests on that basis would be
really helpful.


Seems much nicer.
  

Agreed.

Actually this could and should be generalized beyond active sessions to
a back-end health metric.  Each backend could compute and respond with a
relative measure of busyness/health and respond and the load balancer
could then balance new (session-less) requests to the least busy / most
healthy backend.  This would seem to be *huge* step forward in load
balancing capability/fidelity.

It's my understanding that mod_cluster is pursuing just this sort of
thing to some degree -- but currently only works for JBoss backends.


Yes, I think the counter/aging discussion is for the baseline, i.e. when
we do not have any information channel to or from the backend nodes.

As soon as mod_cluster comes into play, we can use more up-to-date real
data and only need to decide how to interprete it and how to interpolate
during the update interval.
  
Should general support for a query URL be provided in 
mod_proxy_balancer?  Or should this be left to mod_cluster?


Can you explain more? I don't get the question.

 Does 
mod_cluster provide yet another approach top to bottom (separate than 
mod_jk and mod_proxy/mod_proxy_ajp)?


Mod_cluster is just a balancer for mod_proxy but due to the dynamic 
creation of balancers and workers it can't get in the httpd-trunk code 
right now.


 It would seem nice to me if mod_jk 
and/or mod_proxy_balancer could do health checks, but you have to draw 
the line somewhere on growing any given module and if mod_jk and 
mod_proxy_balancer are not going in that direction at some point 
mod_cluster may be in my future.


Cool :-)

Cheers

Jean-Frederic


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread jean-frederic clere

Jess Holle wrote:

jean-frederic clere wrote:

Jess Holle wrote:
An ability to balance based on new sessions with an idle time out on 
such sessions would be close enough to reality in cases where 
sessions expire rather than being explicitly invalidated (e.g. by a 
logout).
Storing the sessionid to share the load depending on the number of 
active sessions, brings a problem of security, no?
To the degree that you consider Apache vulnerable to attack to retrieve 
these, yes.


I prefer the health check request approach below for this and other 
reasons (amount of required bookkeeping, etc).
Of course that redoes what a servlet engine would be doing and does 
so with lower fidelity.  An ability to ask a backend for its current 
session count and load balance new requests on that basis would be 
really helpful.  Whether this ability is buried into AJP, for 
instance, or is simply a separate request to a designated URL is 
another question, but the latter approach seems fairly general and 
the number of such requests could be throttled by a time-to-live 
setting on the last such count obtained.


Actually this could and should be generalized beyond active sessions 
to a back-end health metric.  Each backend could compute and respond 
with a relative measure of busyness/health and respond and the load 
balancer could then balance new (session-less) requests to the least 
busy / most healthy backend.  This would seem to be *huge* step 
forward in load balancing capability/fidelity.


It's my understanding that mod_cluster is pursuing just this sort of 
thing to some degree -- but currently only works for JBoss backends.

This wrong it works with Tomcat too.
mod_cluster works with Tomcat, but according to the docs I've seen the 
dynamic (health/session metric based rather than static) load balancing 
only worked with JBoss backends.


Or has this changed?


No it is still like that but the singleton logic used in JBossAS 
requires a JBoss clustering logic but it should be available in the next 
version.


Cheers

Jean-Frederic


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jess Holle

jean-frederic clere wrote:

Jess Holle wrote:
An ability to balance based on new sessions with an idle time out on 
such sessions would be close enough to reality in cases where 
sessions expire rather than being explicitly invalidated (e.g. by a 
logout).
Storing the sessionid to share the load depending on the number of 
active sessions, brings a problem of security, no?
To the degree that you consider Apache vulnerable to attack to retrieve 
these, yes.


I prefer the health check request approach below for this and other 
reasons (amount of required bookkeeping, etc).
Of course that redoes what a servlet engine would be doing and does 
so with lower fidelity.  An ability to ask a backend for its current 
session count and load balance new requests on that basis would be 
really helpful.  Whether this ability is buried into AJP, for 
instance, or is simply a separate request to a designated URL is 
another question, but the latter approach seems fairly general and 
the number of such requests could be throttled by a time-to-live 
setting on the last such count obtained.


Actually this could and should be generalized beyond active sessions 
to a back-end health metric.  Each backend could compute and respond 
with a relative measure of busyness/health and respond and the load 
balancer could then balance new (session-less) requests to the least 
busy / most healthy backend.  This would seem to be *huge* step 
forward in load balancing capability/fidelity.


It's my understanding that mod_cluster is pursuing just this sort of 
thing to some degree -- but currently only works for JBoss backends.

This wrong it works with Tomcat too.
mod_cluster works with Tomcat, but according to the docs I've seen the 
dynamic (health/session metric based rather than static) load balancing 
only worked with JBoss backends.


Or has this changed?

--
Jess Holle



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jess Holle

Rainer Jung wrote:

An ability to balance based on new sessions with an idle time out on
such sessions would be close enough to reality in cases where sessions
expire rather than being explicitly invalidated (e.g. by a logout).


But then we end up in a stateful situation. This is a serious design
decision. If we want to track idleness for sessions, we need to track a
list of sessions (session ids) the balancer has seen. This makes things
much more complex. Combined with the non-ability to track logouts and
the errors coming in form a global situation (more than one Apache
instance), I think it will be more of a problem than a solution.
  

The more I think about this the more I agree.

From the start I preferred the session/health query to the back-end 
with a time-to-live, on further consideration I *greatly* prefer this 
approach.

Of course that redoes what a servlet engine would be doing and does so
with lower fidelity.  An ability to ask a backend for its current
session count and load balance new requests on that basis would be
really helpful.


Seems much nicer.
  

Agreed.

Actually this could and should be generalized beyond active sessions to
a back-end health metric.  Each backend could compute and respond with a
relative measure of busyness/health and respond and the load balancer
could then balance new (session-less) requests to the least busy / most
healthy backend.  This would seem to be *huge* step forward in load
balancing capability/fidelity.

It's my understanding that mod_cluster is pursuing just this sort of
thing to some degree -- but currently only works for JBoss backends.


Yes, I think the counter/aging discussion is for the baseline, i.e. when
we do not have any information channel to or from the backend nodes.

As soon as mod_cluster comes into play, we can use more up-to-date real
data and only need to decide how to interprete it and how to interpolate
during the update interval.
  
Should general support for a query URL be provided in 
mod_proxy_balancer?  Or should this be left to mod_cluster?  Does 
mod_cluster provide yet another approach top to bottom (separate than 
mod_jk and mod_proxy/mod_proxy_ajp)?  It would seem nice to me if mod_jk 
and/or mod_proxy_balancer could do health checks, but you have to draw 
the line somewhere on growing any given module and if mod_jk and 
mod_proxy_balancer are not going in that direction at some point 
mod_cluster may be in my future.


--
Jess Holle



Re: mod_dav WebDAVFS/1.7 and Transfer-Encoding

2009-05-06 Thread Jeff Trawick
On Wed, May 6, 2009 at 4:08 AM, paul  wrote:

>
> But reading mod_dav.c around line 2393:
>  if (tenc) {
>if (strcasecmp(tenc, "chunked")) {
>/* Use this instead of Apache's default error string */
>ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
>  "Unknown Transfer-Encoding %s", tenc);
>return HTTP_NOT_IMPLEMENTED;
>}
>
> uses strcasecmp() so the above theory seems wrong. We're using 2.2.9 and
> the above snipplet is from 2.2.11. Any chance this has been fixed after
> 2.2.9?


no; it has used strcasecmp() for eons


> Does anyone have encountered the issue or is this a known problem?
>

did you check bugzilla ?  (issues.apache.org/bugzilla)


Re: svn commit: r771998

2009-05-06 Thread Plüm, Rüdiger, VF-Group


This causes trunk to fail compilation with:

make[1]: *** No rule to make target `modules/mappers/libmod_so.la', needed by 
`httpd'.  Stop.
make: *** [all-recursive] Error 1

Reverting it fixes the problem.

Regards

Rüdiger


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Rainer Jung
On 06.05.2009 10:35, Jess Holle wrote:
> Rainer Jung wrote:
>> In most situations aplications need stickyness. So balancing will not
>> happen in an ideal situation, instead it tries to keep load equal
>> although most requests are sticky.
>>
>> Because of the influence of sticky requests it can happen that
>> accumulated load distributes very uneven between the nodes. Should the
>> balancer try to correct such accumulated differences?
>>   Other applications are memory bound. Memory is needed by request
>> handling but also by session handling. Data accumulation is mor
>> eimportant here, because of the sessions. Again, we can not be perfect,
>> because we don't get a notification, when a session expires or a user
>> logs out. So we can only count the "new" sessions. This counter in my
>> opinion also needs some aging, so that we won't compensate historic
>> inequality without bounds. I must confess, that I don't have an example
>> here, how this inequality can happen for sessions when balancing new
>> session requests (stickyness doesn't influence this), but I think
>> balancing based on old data is the wrong model here too.
>>   
> An ability to balance based on new sessions with an idle time out on
> such sessions would be close enough to reality in cases where sessions
> expire rather than being explicitly invalidated (e.g. by a logout).

But then we end up in a stateful situation. This is a serious design
decision. If we want to track idleness for sessions, we need to track a
list of sessions (session ids) the balancer has seen. This makes things
much more complex. Combined with the non-ability to track logouts and
the errors coming in form a global situation (more than one Apache
instance), I think it will be more of a problem than a solution.

> Of course that redoes what a servlet engine would be doing and does so
> with lower fidelity.  An ability to ask a backend for its current
> session count and load balance new requests on that basis would be
> really helpful.

Seems much nicer.

> Actually this could and should be generalized beyond active sessions to
> a back-end health metric.  Each backend could compute and respond with a
> relative measure of busyness/health and respond and the load balancer
> could then balance new (session-less) requests to the least busy / most
> healthy backend.  This would seem to be *huge* step forward in load
> balancing capability/fidelity.
> 
> It's my understanding that mod_cluster is pursuing just this sort of
> thing to some degree -- but currently only works for JBoss backends.

Yes, I think the counter/aging discussion is for the baseline, i.e. when
we do not have any information channel to or from the backend nodes.

As soon as mod_cluster comes into play, we can use more up-to-date real
data and only need to decide how to interprete it and how to interpolate
during the update interval.

Regards,

Rainer


Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread jean-frederic clere

Jess Holle wrote:

Rainer Jung wrote:

In most situations aplications need stickyness. So balancing will not
happen in an ideal situation, instead it tries to keep load equal
although most requests are sticky.

Because of the influence of sticky requests it can happen that
accumulated load distributes very uneven between the nodes. Should the
balancer try to correct such accumulated differences?
  Other applications are memory bound. Memory is needed by request
handling but also by session handling. Data accumulation is mor
eimportant here, because of the sessions. Again, we can not be perfect,
because we don't get a notification, when a session expires or a user
logs out. So we can only count the "new" sessions. This counter in my
opinion also needs some aging, so that we won't compensate historic
inequality without bounds. I must confess, that I don't have an example
here, how this inequality can happen for sessions when balancing new
session requests (stickyness doesn't influence this), but I think
balancing based on old data is the wrong model here too.
  
An ability to balance based on new sessions with an idle time out on 
such sessions would be close enough to reality in cases where sessions 
expire rather than being explicitly invalidated (e.g. by a logout).


Storing the sessionid to share the load depending on the number of 
active sessions, brings a problem of security, no?




Of course that redoes what a servlet engine would be doing and does so 
with lower fidelity.  An ability to ask a backend for its current 
session count and load balance new requests on that basis would be 
really helpful.  Whether this ability is buried into AJP, for instance, 
or is simply a separate request to a designated URL is another question, 
but the latter approach seems fairly general and the number of such 
requests could be throttled by a time-to-live setting on the last such 
count obtained.


Actually this could and should be generalized beyond active sessions to 
a back-end health metric.  Each backend could compute and respond with a 
relative measure of busyness/health and respond and the load balancer 
could then balance new (session-less) requests to the least busy / most 
healthy backend.  This would seem to be *huge* step forward in load 
balancing capability/fidelity.


It's my understanding that mod_cluster is pursuing just this sort of 
thing to some degree -- but currently only works for JBoss backends.


This wrong it works with Tomcat too.

Cheers

Jean-Frederic


mod_dav WebDAVFS/1.7 and Transfer-Encoding

2009-05-06 Thread paul

Hi folks,

I hope this is the correct list to ask, please redirect if not.

Recent Versions of OS X (WebDAVFS/1.7) have problems accessing webdav 
(mod_dav). Upload creates zero-byte files and the clients gets 
disconnected. According to: 
http://discussions.apple.com/thread.jspa?messageID=8101932� the 
issue is:



We too ran into this issue with our software (Jungle Disk) and 
customers. As far as we've been able to tell Apple changed how Finder 
does PUT requests -- they now do them as Transfer-Encoding: chunked, 
which apparently a number of WebDAV implementations don't support. By 
adding support for this we have been able to interoperate with Finder again.



Fix was simple. Finder is sending the Transfer-Encoding header as 
Chunked (note the upper case 'c'). Apache is sensitive to case.


I think mod_dav would have the same issue. Minor patch to mod_dav 
required : it already supports chunking i think.




But reading mod_dav.c around line 2393:
 if (tenc) {
if (strcasecmp(tenc, "chunked")) {
/* Use this instead of Apache's default error string */
ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
  "Unknown Transfer-Encoding %s", tenc);
return HTTP_NOT_IMPLEMENTED;
}

uses strcasecmp() so the above theory seems wrong. We're using 2.2.9 and 
the above snipplet is from 2.2.11. Any chance this has been fixed after 
2.2.9? Does anyone have encountered the issue or is this a known problem?


thanks
 Paul



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Jess Holle

Rainer Jung wrote:

In most situations aplications need stickyness. So balancing will not
happen in an ideal situation, instead it tries to keep load equal
although most requests are sticky.

Because of the influence of sticky requests it can happen that
accumulated load distributes very uneven between the nodes. Should the
balancer try to correct such accumulated differences?
  
Other applications are memory bound. Memory is needed by request

handling but also by session handling. Data accumulation is mor
eimportant here, because of the sessions. Again, we can not be perfect,
because we don't get a notification, when a session expires or a user
logs out. So we can only count the "new" sessions. This counter in my
opinion also needs some aging, so that we won't compensate historic
inequality without bounds. I must confess, that I don't have an example
here, how this inequality can happen for sessions when balancing new
session requests (stickyness doesn't influence this), but I think
balancing based on old data is the wrong model here too.
  
An ability to balance based on new sessions with an idle time out on 
such sessions would be close enough to reality in cases where sessions 
expire rather than being explicitly invalidated (e.g. by a logout).


Of course that redoes what a servlet engine would be doing and does so 
with lower fidelity.  An ability to ask a backend for its current 
session count and load balance new requests on that basis would be 
really helpful.  Whether this ability is buried into AJP, for instance, 
or is simply a separate request to a designated URL is another question, 
but the latter approach seems fairly general and the number of such 
requests could be throttled by a time-to-live setting on the last such 
count obtained.


Actually this could and should be generalized beyond active sessions to 
a back-end health metric.  Each backend could compute and respond with a 
relative measure of busyness/health and respond and the load balancer 
could then balance new (session-less) requests to the least busy / most 
healthy backend.  This would seem to be *huge* step forward in load 
balancing capability/fidelity.


It's my understanding that mod_cluster is pursuing just this sort of 
thing to some degree -- but currently only works for JBoss backends.


--
Jess Holle



Re: mod_proxy / mod_proxy_balancer

2009-05-06 Thread Rainer Jung
Caution: long response!

On 05.05.2009 22:41, jean-frederic clere wrote:
> Jim Jagielski wrote:
>>
>> On May 5, 2009, at 3:02 PM, jean-frederic clere wrote:
>>
>>> Jim Jagielski wrote:
 On May 5, 2009, at 1:18 PM, jean-frederic clere wrote:
> Jim Jagielski wrote:
>> On May 5, 2009, at 12:07 PM, jean-frederic clere wrote:
>>> Jim Jagielski wrote:
 On May 5, 2009, at 11:13 AM, jean-frederic clere wrote:
>
> I am trying to get the worker->id and the scoreboard associated
> logic moved in the reset() when using a balancer, those workers
> need a different handling if we want to have a shared
> information area for them.
>
 The thing is that those workers are not really handled
 by the balancer itself (nor should be), so the reset() shouldn;'t
 apply. IMO, mod_proxy inits the generic forward/reverse workers
 and m_p_b should handle the balancer-related ones.
>>>
>>> Ok by running first the m_p_b child_init() the worker is
>>> initialised by the m_p_b logic and mod_proxy won't change it later.
>>>

>> Yeah... a quick test indicates, at least as far as the perl
>> framework is considered, changing to that m_p_b runs 1st in
>> child_init
>> results in normal and expected behavior Need to do some more
>> tracing to see if we can copy the pointer instead of the whole
>> data set with this ordering.
>
> I have committed the code... It works for my tests.
>
 Beat me to it :)
 BTW: I did create a proxy-sandbox from 2.2.x in hopes that a
 lot of what we do in trunk we can backport to 2.2.x
>>>
>>> Yep but I think we should first have the reset()/age() stuff working
>>> in trunk before backporting to httpd-2.2-proxy :-)
>>>
>>
>> For sure!!
>>
>> BTW: it seems to me that aging is only really needed when the
>> environment changes,
>> mostly when a worker comes back, or when the actual limits are changed
>> in real-time during runtime. Except for these, aging doesn't seem to
>> really add much... long-term steady state only gets invalid when the
>> steady-state changes, after all :)
>>
>> Comments?
>>
>>
> 
> I think we need it for few reasons:
> - When a worker is idle the information about its load is irrelevant.
> - Being able to calculate throughput and load balance using that
> information is only valid if you have a kind of ticker.
> - In some tests I have made with a mixture of long sessions and single
> request "sessions" you need to "forget" the load caused by the long
> sessions.

Balancing and stickyness are conflicting goals. Stickyness dictates the
node, once a session is created, balancing tries to distribute load
equally, so needs to choose the least loaded node.

In most situations aplications need stickyness. So balancing will not
happen in an ideal situation, instead it tries to keep load equal
although most requests are sticky.

Because of the influence of sticky requests it can happen that
accumulated load distributes very uneven between the nodes. Should the
balancer try to correct such accumulated differences?

It depends (yeah, as always): what we actually mean by "load" is varying
on the application. Abstractly we are talking about resource usage. The
backend nodes have limited resources and we want to make optimal use of
them by distributing the resource usage equally.

For some applications CPU is the limiting resource. This resource is
typically coupled to actual requests in flight and not to longer living
objects like sessions. Of course not all requests need an equal amount
of CPU, but as long as we can't actually measure the CPU load, balancing
the number of requests in the sense of "busyness" (parallel requests)
should be best for CPU. Because CPU monitoring is often done on the
basis of averages (and not the maximum short term use per interval),
some request count acumulation as a basis for the balancing will result
in better measured numbers (not necessarily in better "smallest maximum
load"). If we do not age, then strongly unequal historic distribution
(caused by stickyness) will result in an opposite unequal distribution
as soon as a lot of non-sticky requests come in. I think that's not optimal.

Other applications are memory bound. Memory is needed by request
handling but also by session handling. Data accumulation is mor
eimportant here, because of the sessions. Again, we can not be perfect,
because we don't get a notification, when a session expires or a user
logs out. So we can only count the "new" sessions. This counter in my
opinion also needs some aging, so that we won't compensate historic
inequality without bounds. I must confess, that I don't have an example
here, how this inequality can happen for sessions when balancing new
session requests (stickyness doesn't influence this), but I think
balancing based on old data is the wrong model here too.

Then another important resource is bandwith. Her