Re: mod_jk local error states vs. global error states

2009-03-07 Thread Mladen Turk

Rainer Jung wrote:

On 06.03.2009 13:32, Mladen Turk wrote:

For the rest it's simply too much to cope in a single email ;)


I put the force recovery fix and the "else" suggestion in a patch at:

http://people.apache.org/~rjung/mod_jk-dev/patches/local_states.patch



I've added 'in_error' for ajp worker.
We already have 'errors' counter, but this is
statistical one. New 'in_error' holds the number of connections
that are currently in error state.

If it's number get higher then busy/2 (more then half are in
error state) then the entire worker is marked as invalid.

Think this is much better then having additional retry for
local workers. The value busy/2 can probably be configured
via some directive similar to max_reply_timeouts,
eg, max_reply_errors.

Regards
--
^(TM)

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: mod_jk local error states vs. global error states

2009-03-06 Thread Mladen Turk

Rainer Jung wrote:

On 06.03.2009 13:32, Mladen Turk wrote:

For the rest it's simply too much to cope in a single email ;)


I put the force recovery fix and the "else" suggestion in a patch at:

http://people.apache.org/~rjung/mod_jk-dev/patches/local_states.patch

Everything apart from Hunk number 3 and the small change to 
jklb_worker.h is for forced recovery, Hunk 3 and jk_lb_worker.h are for 
"else".




I've added the new JK_AJP_PROTOCOL_ERROR, so IMO you can
now apply your proposed patch to 'else'

'else' now means that we've send the request but the tomcat
immediately dropped the connection (first read after
send failed). Although I'm not sure why you've choose the
10 second recovery delay. Can't we stay with the configured
recovery_timeout here as well?

Regards
--
^(TM)

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: mod_jk local error states vs. global error states

2009-03-06 Thread Rainer Jung

On 06.03.2009 13:32, Mladen Turk wrote:

For the rest it's simply too much to cope in a single email ;)


I put the force recovery fix and the "else" suggestion in a patch at:

http://people.apache.org/~rjung/mod_jk-dev/patches/local_states.patch

Everything apart from Hunk number 3 and the small change to 
jklb_worker.h is for forced recovery, Hunk 3 and jk_lb_worker.h are for 
"else".


Regards,

Rainer

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: mod_jk local error states vs. global error states

2009-03-06 Thread Mladen Turk

Rainer Jung wrote:

On 06.03.2009 14:19, Mladen Turk wrote:


JkMount /foo aw
JkMount /bar aw

Now, if /bar is slow and gets timeout it would mean that
/foo will be banned as well (although it might work perfectly)

But I see your point. Since configured it should be banned
immediately. However this requires that admins behave 'smart'
and deploy their applications to different instances and use
different workers.

Ideal would be for us to have per JkMount status in shared
memory. This is something for the future definitely.


Yeah, there's such a dependency between mounts and workers. But exactly 
for this case we now have reply timeouts which can be set per mount. 
Because here one size doesn't fit all. By default all reply timeouts are 
off though.




Right, but although the reply timeouts are per mount the
consequences are per worker, because if one mount triggers
timeout all others will be affected.


So I think going into global error here is safe.



For now yes, until we'll have a shared memory for each mount.
With that we'll be able to completely decouple
connection from application logic.


Regards
--
^(TM)

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: mod_jk local error states vs. global error states

2009-03-06 Thread Rainer Jung

On 06.03.2009 14:19, Mladen Turk wrote:

Rainer Jung wrote:

On 06.03.2009 13:32, Mladen Turk wrote:

Rainer Jung wrote:



All this should never touch the global state
if there are live connections.
Let the live connection decides for itself when it gets serviced.
Anything else is just plain 'guessing'

That was my general rule of thumb,
because the point is to be robust as much as possible


JK_CLIENT_ERROR: it does not touch the global state (well it sets it
to OK), but you do touch the local state. I argued, why I would set
the local state also to OK. Any answer?



Well you said:
"It doesn't matter for the logic, but makes keeping track of the
differences easier."

I think opposite. Marking local as error and global as ok makes
things track easier. local error means that this worker won't
be tested in next lb loop, global error means it won't be tested
until retry timeout.


Since it is no functional change, I can stick to your definition of 
"easier", so we keep it as is.



Explicitly setting global to OK reads as
'It wasn't our fault, it was client's fault, we are still in
contact with backend'


But that's actaully what we do, we do set global explicitely to OK, and 
we always did. I was talking about the local part here. But I think it's 
OK to leave it as it is.



But I agree, setting anything here is irrelevant, but like
you said "It makes things easier to track and read"


OK.


JK_STATUS_FATAL_ERROR: The whole purpose of the fail_on_status
configuration item is to tell something else. E.g. there is a status
if the context is not available, or the app could have a filter
returning a special status. For me it does not make sense to simply
ignore, what the admin configured using fail_on_status.



OK, this should probably be set to global as well if configured
explicitly. It will mean that all the sessions will be lost however.


OK. Yes, they'll be lost, but it depends on the admin to choose (or even 
set by a filter) good status codes. But I think the most popular case 
is, where the app is not deployed for some time.



JK_REPLY_TIMEOUT: Again I'm talking about the situation we have more
timeouts than max_reply_timeouts. By default we do not have any reply
timeout set, so the admin instructed us to react on reply timeouts.



JkMount /foo aw
JkMount /bar aw

Now, if /bar is slow and gets timeout it would mean that
/foo will be banned as well (although it might work perfectly)

But I see your point. Since configured it should be banned
immediately. However this requires that admins behave 'smart'
and deploy their applications to different instances and use
different workers.

Ideal would be for us to have per JkMount status in shared
memory. This is something for the future definitely.


Yeah, there's such a dependency between mounts and workers. But exactly 
for this case we now have reply timeouts which can be set per mount. 
Because here one size doesn't fit all. By default all reply timeouts are 
off though.


So I think going into global error here is safe.

Note: it's only in the first half of the timeout handling, when we are 
already above max_reply_timeouts, so we are not talking about isolated 
timeouts.



(iv) JK_SERVER_ERROR

We only get this, if a memory allocation fails.

I'm fine with what you decided, although actually I see no reason why
allocation should work better or worse for one of the lb members.
We can leave it, to keep track of the differences it would be easier
to set local state to OK too.



Well, again I disagree. Actually in worker or prefork the child
will simply die without setting the global error in shm.
If we set here the global error it will again kill all the
sessions if one child had some memory issues.


I didn't suggest setting global to ERROR, I suggested setting local to
OK, because I don't get, hwat local ERROR helps here, and keeping both
equal is easier to understand.



Hmm, right, but setting local to OK won't probably help.
It might end up in some garbage send to Tomcat, so it's
better to mark it as local error.


OK. Agreed.

Regards,

Rainer

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: mod_jk local error states vs. global error states

2009-03-06 Thread Rainer Jung

On 06.03.2009 14:19, Mladen Turk wrote:

Rainer Jung wrote:

On 06.03.2009 13:32, Mladen Turk wrote:

Huge one Rainer ;)


I know, but I went through it in depth.


Rainer Jung wrote:


We have three busy counters:

a) one for the lb in total
b) one for each lb sub
c) one for each ajp worker

In status worker we use only a) and c). In lb we use a) and b). Your
comment to BZ 46808 seems to indicate, that using c) instead ob b) in
lb would be better. We could then again remove b).

Right?

So we could drop the rec->s->busy++ and --, because that's done for
the ajp busyness alredy in jk_ajp_common.c and we would test against
aw->s->busy instead of rec->s->busy.

OK?



No because rec->s->busy is per-lb info,
and aw->s-busy >= res->s->busy if aw is member
of multiple lb's


But isn't the whole purpose of your changes to give the backend still
a chance, if it is processing requests? Why does it then matter,
whether those requests come from the same lb??


Any answer to that one? You are using this busy only for the purpose to 
differentiate between a busy backend (do not put it into global error 
under certain circumstances) and an idle backend (take most errors more 
seriously, because they are not simply triggered by ovrload).


So the real load should be the best differentiator, and aw->s-busy is 
close to the real load than res->s->busy.


Regards,

Rainer

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: mod_jk local error states vs. global error states

2009-03-06 Thread Mladen Turk

Rainer Jung wrote:

On 06.03.2009 13:32, Mladen Turk wrote:

Huge one Rainer ;)


I know, but I went through it in depth.


Rainer Jung wrote:


We have three busy counters:

a) one for the lb in total
b) one for each lb sub
c) one for each ajp worker

In status worker we use only a) and c). In lb we use a) and b). Your
comment to BZ 46808 seems to indicate, that using c) instead ob b) in
lb would be better. We could then again remove b).

Right?

So we could drop the rec->s->busy++ and --, because that's done for
the ajp busyness alredy in jk_ajp_common.c and we would test against
aw->s->busy instead of rec->s->busy.

OK?



No because rec->s->busy is per-lb info,
and aw->s-busy >= res->s->busy if aw is member
of multiple lb's


But isn't the whole purpose of your changes to give the backend still a 
chance, if it is processing requests? Why does it then matter, whether 
those requests come from the same lb??



2) Local states vs. global states
=

I went through all places were we use and set states in lb. I have
some comments:

a) Setting local states and global states to different values
-


First the easier cases (I think)

(i) JK_CLIENT_ERROR

(ii) JK_STATUS_FATAL_ERROR

(iii) JK_REPLY_TIMEOUT



All this should never touch the global state
if there are live connections.
Let the live connection decides for itself when it gets serviced.
Anything else is just plain 'guessing'

That was my general rule of thumb,
because the point is to be robust as much as possible


JK_CLIENT_ERROR: it does not touch the global state (well it sets it to 
OK), but you do touch the local state. I argued, why I would set the 
local state also to OK. Any answer?




Well you said:
"It doesn't matter for the logic, but makes keeping track of the 
differences easier."


I think opposite. Marking local as error and global as ok makes
things track easier. local error means that this worker won't
be tested in next lb loop, global error means it won't be tested
until retry timeout. Explicitly setting global to OK reads as
'It wasn't our fault, it was client's fault, we are still in
 contact with backend'
But I agree, setting anything here is irrelevant, but like
you said "It makes things easier to track and read"


JK_STATUS_FATAL_ERROR: The whole purpose of the fail_on_status 
configuration item is to tell something else. E.g. there is a status if 
the context is not available, or the app could have a filter returning a 
special status. For me it does not make sense to simply ignore, what the 
admin configured using fail_on_status.




OK, this should probably be set to global as well if configured
explicitly. It will mean that all the sessions will be lost however.

JK_REPLY_TIMEOUT: Again I'm talking about the situation we have more 
timeouts than max_reply_timeouts. By default we do not have any reply 
timeout set, so the admin instructed us to react on reply timeouts.




JkMount /foo aw
JkMount /bar aw

Now, if /bar is slow and gets timeout it would mean that
/foo will be banned as well (although it might work perfectly)

But I see your point. Since configured it should be banned
immediately. However this requires that admins behave 'smart'
and deploy their applications to different instances and use
different workers.

Ideal would be for us to have per JkMount status in shared
memory. This is something for the future definitely.




Now the more difficult cases:

(iv) JK_SERVER_ERROR

We only get this, if a memory allocation fails.

I'm fine with what you decided, although actually I see no reason why
allocation should work better or worse for one of the lb members.
We can leave it, to keep track of the differences it would be easier
to set local state to OK too.



Well, again I disagree. Actually in worker or prefork the child
will simply die without setting the global error in shm.
If we set here the global error it will again kill all the
sessions if one child had some memory issues.


I didn't suggest setting global to ERROR, I suggested setting local to 
OK, because I don't get, hwat local ERROR helps here, and keeping both 
equal is easier to understand.




Hmm, right, but setting local to OK won't probably help.
It might end up in some garbage send to Tomcat, so it's
better to mark it as local error.


Regards
--
^(TM)

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: mod_jk local error states vs. global error states

2009-03-06 Thread Rainer Jung

On 06.03.2009 13:32, Mladen Turk wrote:

Huge one Rainer ;)


I know, but I went through it in depth.


Rainer Jung wrote:


We have three busy counters:

a) one for the lb in total
b) one for each lb sub
c) one for each ajp worker

In status worker we use only a) and c). In lb we use a) and b). Your
comment to BZ 46808 seems to indicate, that using c) instead ob b) in
lb would be better. We could then again remove b).

Right?

So we could drop the rec->s->busy++ and --, because that's done for
the ajp busyness alredy in jk_ajp_common.c and we would test against
aw->s->busy instead of rec->s->busy.

OK?



No because rec->s->busy is per-lb info,
and aw->s-busy >= res->s->busy if aw is member
of multiple lb's


But isn't the whole purpose of your changes to give the backend still a 
chance, if it is processing requests? Why does it then matter, whether 
those requests come from the same lb??



2) Local states vs. global states
=

I went through all places were we use and set states in lb. I have
some comments:

a) Setting local states and global states to different values
-


First the easier cases (I think)

(i) JK_CLIENT_ERROR

(ii) JK_STATUS_FATAL_ERROR

(iii) JK_REPLY_TIMEOUT



All this should never touch the global state
if there are live connections.
Let the live connection decides for itself when it gets serviced.
Anything else is just plain 'guessing'

That was my general rule of thumb,
because the point is to be robust as much as possible


JK_CLIENT_ERROR: it does not touch the global state (well it sets it to 
OK), but you do touch the local state. I argued, why I would set the 
local state also to OK. Any answer?


JK_STATUS_FATAL_ERROR: The whole purpose of the fail_on_status 
configuration item is to tell something else. E.g. there is a status if 
the context is not available, or the app could have a filter returning a 
special status. For me it does not make sense to simply ignore, what the 
admin configured using fail_on_status.


JK_REPLY_TIMEOUT: Again I'm talking about the situation we have more 
timeouts than max_reply_timeouts. By default we do not have any reply 
timeout set, so the admin instructed us to react on reply timeouts.




Now the more difficult cases:

(iv) JK_SERVER_ERROR

We only get this, if a memory allocation fails.

I'm fine with what you decided, although actually I see no reason why
allocation should work better or worse for one of the lb members.
We can leave it, to keep track of the differences it would be easier
to set local state to OK too.



Well, again I disagree. Actually in worker or prefork the child
will simply die without setting the global error in shm.
If we set here the global error it will again kill all the
sessions if one child had some memory issues.


I didn't suggest setting global to ERROR, I suggested setting local to 
OK, because I don't get, hwat local ERROR helps here, and keeping both 
equal is easier to understand.



For the rest it's simply too much to cope in a single email ;)


Right!

Regards,

Rainer

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: mod_jk local error states vs. global error states

2009-03-06 Thread Mladen Turk

Huge one Rainer ;)

Rainer Jung wrote:


We have three busy counters:

a) one for the lb in total
b) one for each lb sub
c) one for each ajp worker

In status worker we use only a) and c). In lb we use a) and b). Your 
comment to BZ 46808 seems to indicate, that using c) instead ob b) in lb 
would be better. We could then again remove b).


Right?

So we could drop the rec->s->busy++ and --, because that's done for the 
ajp busyness alredy in jk_ajp_common.c and we would test against 
aw->s->busy instead of rec->s->busy.


OK?



No because rec->s->busy is per-lb info,
and aw->s-busy >= res->s->busy if aw is member
of multiple lb's



2) Local states vs. global states
=

I went through all places were we use and set states in lb. I have some 
comments:


a) Setting local states and global states to different values
-


First the easier cases (I think)

(i) JK_CLIENT_ERROR

(ii) JK_STATUS_FATAL_ERROR

(iii) JK_REPLY_TIMEOUT



All this should never touch the global state
if there are live connections.
Let the live connection decides for itself when it gets serviced.
Anything else is just plain 'guessing'

That was my general rule of thumb,
because the point is to be robust as much as possible




Now the more difficult cases:

(iv) JK_SERVER_ERROR

We only get this, if a memory allocation fails.

I'm fine with what you decided, although actually I see no reason why 
allocation should work better or worse for one of the lb members.
We can leave it, to keep track of the differences it would be easier to 
set local state to OK too.




Well, again I disagree. Actually in worker or prefork the child
will simply die without setting the global error in shm.
If we set here the global error it will again kill all the
sessions if one child had some memory issues.

For the rest it's simply too much to cope in a single email ;)

Cheers
--
^(TM)

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org