Re: mod_jk local error states vs. global error states
Rainer Jung wrote: On 06.03.2009 13:32, Mladen Turk wrote: For the rest it's simply too much to cope in a single email ;) I put the force recovery fix and the "else" suggestion in a patch at: http://people.apache.org/~rjung/mod_jk-dev/patches/local_states.patch I've added 'in_error' for ajp worker. We already have 'errors' counter, but this is statistical one. New 'in_error' holds the number of connections that are currently in error state. If it's number get higher then busy/2 (more then half are in error state) then the entire worker is marked as invalid. Think this is much better then having additional retry for local workers. The value busy/2 can probably be configured via some directive similar to max_reply_timeouts, eg, max_reply_errors. Regards -- ^(TM) - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: mod_jk local error states vs. global error states
Rainer Jung wrote: On 06.03.2009 13:32, Mladen Turk wrote: For the rest it's simply too much to cope in a single email ;) I put the force recovery fix and the "else" suggestion in a patch at: http://people.apache.org/~rjung/mod_jk-dev/patches/local_states.patch Everything apart from Hunk number 3 and the small change to jklb_worker.h is for forced recovery, Hunk 3 and jk_lb_worker.h are for "else". I've added the new JK_AJP_PROTOCOL_ERROR, so IMO you can now apply your proposed patch to 'else' 'else' now means that we've send the request but the tomcat immediately dropped the connection (first read after send failed). Although I'm not sure why you've choose the 10 second recovery delay. Can't we stay with the configured recovery_timeout here as well? Regards -- ^(TM) - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: mod_jk local error states vs. global error states
On 06.03.2009 13:32, Mladen Turk wrote: For the rest it's simply too much to cope in a single email ;) I put the force recovery fix and the "else" suggestion in a patch at: http://people.apache.org/~rjung/mod_jk-dev/patches/local_states.patch Everything apart from Hunk number 3 and the small change to jklb_worker.h is for forced recovery, Hunk 3 and jk_lb_worker.h are for "else". Regards, Rainer - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: mod_jk local error states vs. global error states
Rainer Jung wrote: On 06.03.2009 14:19, Mladen Turk wrote: JkMount /foo aw JkMount /bar aw Now, if /bar is slow and gets timeout it would mean that /foo will be banned as well (although it might work perfectly) But I see your point. Since configured it should be banned immediately. However this requires that admins behave 'smart' and deploy their applications to different instances and use different workers. Ideal would be for us to have per JkMount status in shared memory. This is something for the future definitely. Yeah, there's such a dependency between mounts and workers. But exactly for this case we now have reply timeouts which can be set per mount. Because here one size doesn't fit all. By default all reply timeouts are off though. Right, but although the reply timeouts are per mount the consequences are per worker, because if one mount triggers timeout all others will be affected. So I think going into global error here is safe. For now yes, until we'll have a shared memory for each mount. With that we'll be able to completely decouple connection from application logic. Regards -- ^(TM) - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: mod_jk local error states vs. global error states
On 06.03.2009 14:19, Mladen Turk wrote: Rainer Jung wrote: On 06.03.2009 13:32, Mladen Turk wrote: Rainer Jung wrote: All this should never touch the global state if there are live connections. Let the live connection decides for itself when it gets serviced. Anything else is just plain 'guessing' That was my general rule of thumb, because the point is to be robust as much as possible JK_CLIENT_ERROR: it does not touch the global state (well it sets it to OK), but you do touch the local state. I argued, why I would set the local state also to OK. Any answer? Well you said: "It doesn't matter for the logic, but makes keeping track of the differences easier." I think opposite. Marking local as error and global as ok makes things track easier. local error means that this worker won't be tested in next lb loop, global error means it won't be tested until retry timeout. Since it is no functional change, I can stick to your definition of "easier", so we keep it as is. Explicitly setting global to OK reads as 'It wasn't our fault, it was client's fault, we are still in contact with backend' But that's actaully what we do, we do set global explicitely to OK, and we always did. I was talking about the local part here. But I think it's OK to leave it as it is. But I agree, setting anything here is irrelevant, but like you said "It makes things easier to track and read" OK. JK_STATUS_FATAL_ERROR: The whole purpose of the fail_on_status configuration item is to tell something else. E.g. there is a status if the context is not available, or the app could have a filter returning a special status. For me it does not make sense to simply ignore, what the admin configured using fail_on_status. OK, this should probably be set to global as well if configured explicitly. It will mean that all the sessions will be lost however. OK. Yes, they'll be lost, but it depends on the admin to choose (or even set by a filter) good status codes. But I think the most popular case is, where the app is not deployed for some time. JK_REPLY_TIMEOUT: Again I'm talking about the situation we have more timeouts than max_reply_timeouts. By default we do not have any reply timeout set, so the admin instructed us to react on reply timeouts. JkMount /foo aw JkMount /bar aw Now, if /bar is slow and gets timeout it would mean that /foo will be banned as well (although it might work perfectly) But I see your point. Since configured it should be banned immediately. However this requires that admins behave 'smart' and deploy their applications to different instances and use different workers. Ideal would be for us to have per JkMount status in shared memory. This is something for the future definitely. Yeah, there's such a dependency between mounts and workers. But exactly for this case we now have reply timeouts which can be set per mount. Because here one size doesn't fit all. By default all reply timeouts are off though. So I think going into global error here is safe. Note: it's only in the first half of the timeout handling, when we are already above max_reply_timeouts, so we are not talking about isolated timeouts. (iv) JK_SERVER_ERROR We only get this, if a memory allocation fails. I'm fine with what you decided, although actually I see no reason why allocation should work better or worse for one of the lb members. We can leave it, to keep track of the differences it would be easier to set local state to OK too. Well, again I disagree. Actually in worker or prefork the child will simply die without setting the global error in shm. If we set here the global error it will again kill all the sessions if one child had some memory issues. I didn't suggest setting global to ERROR, I suggested setting local to OK, because I don't get, hwat local ERROR helps here, and keeping both equal is easier to understand. Hmm, right, but setting local to OK won't probably help. It might end up in some garbage send to Tomcat, so it's better to mark it as local error. OK. Agreed. Regards, Rainer - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: mod_jk local error states vs. global error states
On 06.03.2009 14:19, Mladen Turk wrote: Rainer Jung wrote: On 06.03.2009 13:32, Mladen Turk wrote: Huge one Rainer ;) I know, but I went through it in depth. Rainer Jung wrote: We have three busy counters: a) one for the lb in total b) one for each lb sub c) one for each ajp worker In status worker we use only a) and c). In lb we use a) and b). Your comment to BZ 46808 seems to indicate, that using c) instead ob b) in lb would be better. We could then again remove b). Right? So we could drop the rec->s->busy++ and --, because that's done for the ajp busyness alredy in jk_ajp_common.c and we would test against aw->s->busy instead of rec->s->busy. OK? No because rec->s->busy is per-lb info, and aw->s-busy >= res->s->busy if aw is member of multiple lb's But isn't the whole purpose of your changes to give the backend still a chance, if it is processing requests? Why does it then matter, whether those requests come from the same lb?? Any answer to that one? You are using this busy only for the purpose to differentiate between a busy backend (do not put it into global error under certain circumstances) and an idle backend (take most errors more seriously, because they are not simply triggered by ovrload). So the real load should be the best differentiator, and aw->s-busy is close to the real load than res->s->busy. Regards, Rainer - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: mod_jk local error states vs. global error states
Rainer Jung wrote: On 06.03.2009 13:32, Mladen Turk wrote: Huge one Rainer ;) I know, but I went through it in depth. Rainer Jung wrote: We have three busy counters: a) one for the lb in total b) one for each lb sub c) one for each ajp worker In status worker we use only a) and c). In lb we use a) and b). Your comment to BZ 46808 seems to indicate, that using c) instead ob b) in lb would be better. We could then again remove b). Right? So we could drop the rec->s->busy++ and --, because that's done for the ajp busyness alredy in jk_ajp_common.c and we would test against aw->s->busy instead of rec->s->busy. OK? No because rec->s->busy is per-lb info, and aw->s-busy >= res->s->busy if aw is member of multiple lb's But isn't the whole purpose of your changes to give the backend still a chance, if it is processing requests? Why does it then matter, whether those requests come from the same lb?? 2) Local states vs. global states = I went through all places were we use and set states in lb. I have some comments: a) Setting local states and global states to different values - First the easier cases (I think) (i) JK_CLIENT_ERROR (ii) JK_STATUS_FATAL_ERROR (iii) JK_REPLY_TIMEOUT All this should never touch the global state if there are live connections. Let the live connection decides for itself when it gets serviced. Anything else is just plain 'guessing' That was my general rule of thumb, because the point is to be robust as much as possible JK_CLIENT_ERROR: it does not touch the global state (well it sets it to OK), but you do touch the local state. I argued, why I would set the local state also to OK. Any answer? Well you said: "It doesn't matter for the logic, but makes keeping track of the differences easier." I think opposite. Marking local as error and global as ok makes things track easier. local error means that this worker won't be tested in next lb loop, global error means it won't be tested until retry timeout. Explicitly setting global to OK reads as 'It wasn't our fault, it was client's fault, we are still in contact with backend' But I agree, setting anything here is irrelevant, but like you said "It makes things easier to track and read" JK_STATUS_FATAL_ERROR: The whole purpose of the fail_on_status configuration item is to tell something else. E.g. there is a status if the context is not available, or the app could have a filter returning a special status. For me it does not make sense to simply ignore, what the admin configured using fail_on_status. OK, this should probably be set to global as well if configured explicitly. It will mean that all the sessions will be lost however. JK_REPLY_TIMEOUT: Again I'm talking about the situation we have more timeouts than max_reply_timeouts. By default we do not have any reply timeout set, so the admin instructed us to react on reply timeouts. JkMount /foo aw JkMount /bar aw Now, if /bar is slow and gets timeout it would mean that /foo will be banned as well (although it might work perfectly) But I see your point. Since configured it should be banned immediately. However this requires that admins behave 'smart' and deploy their applications to different instances and use different workers. Ideal would be for us to have per JkMount status in shared memory. This is something for the future definitely. Now the more difficult cases: (iv) JK_SERVER_ERROR We only get this, if a memory allocation fails. I'm fine with what you decided, although actually I see no reason why allocation should work better or worse for one of the lb members. We can leave it, to keep track of the differences it would be easier to set local state to OK too. Well, again I disagree. Actually in worker or prefork the child will simply die without setting the global error in shm. If we set here the global error it will again kill all the sessions if one child had some memory issues. I didn't suggest setting global to ERROR, I suggested setting local to OK, because I don't get, hwat local ERROR helps here, and keeping both equal is easier to understand. Hmm, right, but setting local to OK won't probably help. It might end up in some garbage send to Tomcat, so it's better to mark it as local error. Regards -- ^(TM) - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: mod_jk local error states vs. global error states
On 06.03.2009 13:32, Mladen Turk wrote: Huge one Rainer ;) I know, but I went through it in depth. Rainer Jung wrote: We have three busy counters: a) one for the lb in total b) one for each lb sub c) one for each ajp worker In status worker we use only a) and c). In lb we use a) and b). Your comment to BZ 46808 seems to indicate, that using c) instead ob b) in lb would be better. We could then again remove b). Right? So we could drop the rec->s->busy++ and --, because that's done for the ajp busyness alredy in jk_ajp_common.c and we would test against aw->s->busy instead of rec->s->busy. OK? No because rec->s->busy is per-lb info, and aw->s-busy >= res->s->busy if aw is member of multiple lb's But isn't the whole purpose of your changes to give the backend still a chance, if it is processing requests? Why does it then matter, whether those requests come from the same lb?? 2) Local states vs. global states = I went through all places were we use and set states in lb. I have some comments: a) Setting local states and global states to different values - First the easier cases (I think) (i) JK_CLIENT_ERROR (ii) JK_STATUS_FATAL_ERROR (iii) JK_REPLY_TIMEOUT All this should never touch the global state if there are live connections. Let the live connection decides for itself when it gets serviced. Anything else is just plain 'guessing' That was my general rule of thumb, because the point is to be robust as much as possible JK_CLIENT_ERROR: it does not touch the global state (well it sets it to OK), but you do touch the local state. I argued, why I would set the local state also to OK. Any answer? JK_STATUS_FATAL_ERROR: The whole purpose of the fail_on_status configuration item is to tell something else. E.g. there is a status if the context is not available, or the app could have a filter returning a special status. For me it does not make sense to simply ignore, what the admin configured using fail_on_status. JK_REPLY_TIMEOUT: Again I'm talking about the situation we have more timeouts than max_reply_timeouts. By default we do not have any reply timeout set, so the admin instructed us to react on reply timeouts. Now the more difficult cases: (iv) JK_SERVER_ERROR We only get this, if a memory allocation fails. I'm fine with what you decided, although actually I see no reason why allocation should work better or worse for one of the lb members. We can leave it, to keep track of the differences it would be easier to set local state to OK too. Well, again I disagree. Actually in worker or prefork the child will simply die without setting the global error in shm. If we set here the global error it will again kill all the sessions if one child had some memory issues. I didn't suggest setting global to ERROR, I suggested setting local to OK, because I don't get, hwat local ERROR helps here, and keeping both equal is easier to understand. For the rest it's simply too much to cope in a single email ;) Right! Regards, Rainer - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org
Re: mod_jk local error states vs. global error states
Huge one Rainer ;) Rainer Jung wrote: We have three busy counters: a) one for the lb in total b) one for each lb sub c) one for each ajp worker In status worker we use only a) and c). In lb we use a) and b). Your comment to BZ 46808 seems to indicate, that using c) instead ob b) in lb would be better. We could then again remove b). Right? So we could drop the rec->s->busy++ and --, because that's done for the ajp busyness alredy in jk_ajp_common.c and we would test against aw->s->busy instead of rec->s->busy. OK? No because rec->s->busy is per-lb info, and aw->s-busy >= res->s->busy if aw is member of multiple lb's 2) Local states vs. global states = I went through all places were we use and set states in lb. I have some comments: a) Setting local states and global states to different values - First the easier cases (I think) (i) JK_CLIENT_ERROR (ii) JK_STATUS_FATAL_ERROR (iii) JK_REPLY_TIMEOUT All this should never touch the global state if there are live connections. Let the live connection decides for itself when it gets serviced. Anything else is just plain 'guessing' That was my general rule of thumb, because the point is to be robust as much as possible Now the more difficult cases: (iv) JK_SERVER_ERROR We only get this, if a memory allocation fails. I'm fine with what you decided, although actually I see no reason why allocation should work better or worse for one of the lb members. We can leave it, to keep track of the differences it would be easier to set local state to OK too. Well, again I disagree. Actually in worker or prefork the child will simply die without setting the global error in shm. If we set here the global error it will again kill all the sessions if one child had some memory issues. For the rest it's simply too much to cope in a single email ;) Cheers -- ^(TM) - To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org