Re: Counting of I/O bytes
On Tue, 2002-10-15 at 23:07, Bojan Smojver wrote: The MMN bump in this case is no problem. Your patch just adds fields at the end of the structure. It's backward compatible, so it only needs an increase in the MMN minor number, not the major number. If the patch added fields in the middle of the struct, thus breaking binary compatibility, then it would be a problem. What if someone creates an array of conn_rec structures in their module? Isn't the code that fetches the next one going to be incorrect? For instance: Yes, you're right, anything that tries to allocate conn_rec objects will have problems. Basically, what would be happenning with the patch is that the numbers would be logged with the next request, rather then this one. However, if the data was SENT with the next request, then it should be LOGGED with the next request. This seems like the question of semantics to me. The problem I can see with this is the following: - connection opened (keepalive) - request for aaa.domain - not sent (i.e. the server is buffering it) - logged zero output - request for bbb.domain (on the same connection) - sent - logged bytes out for bbb.domain PLUS aaa.domain Is something like this possible? If not, I think we should be pretty much OK as the whole point of mod_logio is to log the traffic, most likely per virtual host. Yes, that scenario is possible. And even without multiple virtual hosts, attributing bytes to the wrong request will make the logging a lot less useful to anyone who's trying to use the bytes_sent data for debugging or planning purposes. If we can come up with a way to delay the logging until after the response is actually sent, though, then the problem will disappear and your patch will produce much more accurate numbers. Brian
Re: Counting of I/O bytes
Quoting Brian Pane [EMAIL PROTECTED]: Yes, that scenario is possible. And even without multiple virtual hosts, attributing bytes to the wrong request will make the logging a lot less useful to anyone who's trying to use the bytes_sent data for debugging or planning purposes. If we can come up with a way to delay the logging until after the response is actually sent, though, then the problem will disappear and your patch will produce much more accurate numbers. Bummer! Oh, well, life isn't perfect... Coming back to what William is proposing, the three pool approach... I'm guessing that with Response put in-between Connection and Request, the Response will store responses for multiple requests, right? Then, once the core-filter is done with it, the Response will be notified, it will finalise the processing (e.g. log all the requests) and destroy itself. Sounds good, but maybe we should just keep the requests around until they have been dealt with? BTW, is there are way in 2.0 to force every request to be written to the network in order? Maybe I should focus on making an option like that in order to make it work in 2.0? Bojan PS. So, when's 2.1 going to be released? Just kidding... ;-)
Re: Counting of I/O bytes
On Wed, 2002-10-16 at 00:10, Bojan Smojver wrote: Coming back to what William is proposing, the three pool approach... I'm guessing that with Response put in-between Connection and Request, the Response will store responses for multiple requests, right? Then, once the core-filter is done with it, the Response will be notified, it will finalise the processing (e.g. log all the requests) and destroy itself. Sounds good, but maybe we should just keep the requests around until they have been dealt with? There's a small benefit to cleaning up the request instead of keeping it around: you can free up the memory that was used for the request and recycle it for some other request. This can help to reduce the total memory footprint of the server. On the other hand, creating separate pools for the request and the response could end up increasing the total memory usage, because we currently use at least 8KB per pool. We'll probably need to do some research to figure out which approach works best in practice. BTW, is there are way in 2.0 to force every request to be written to the network in order? Maybe I should focus on making an option like that in order to make it work in 2.0? I think you could achieve this result by appending a bucket of type apr_bucket_flush to the output brigade. I think that would be a reasonable workaround for 2.0. Brian
Re: Counting of I/O bytes
I know that I'm ultimately wanting to log bytes per request pretty accurately to add up and tell if the whole file was transferred or not even if it was broken up into several byte range requests. I have given up on the possibility of this happening in 2.0 for a long time to come due to the architecture changes required to make this happen, we're back on 1.3 with three times as many servers because of the lack of threading... :-( Dave - Original Message - From: Bojan Smojver [EMAIL PROTECTED] Is something like this possible? If not, I think we should be pretty much OK as the whole point of mod_logio is to log the traffic, most likely per virtual host.
Re: Counting of I/O bytes
Quoting Brian Pane [EMAIL PROTECTED]: I think you could achieve this result by appending a bucket of type apr_bucket_flush to the output brigade. I think that would be a reasonable workaround for 2.0. Cool, thanks. This should be easy - I can simply do it in mod_logio's output filter, which will run before the core_output_filter for sure. The whole thing becomes sequential, although probably a bit slower... I'll let you know when the patch is ready. I'll be going for the patch in which there is an MMN bump and mod_logio keeps it's filters, but with the changed logic of logio_output_filter, which inserts the flush bucket at the end of the brigade. Bojan
Re: Counting of I/O bytes
Quoting David Burry [EMAIL PROTECTED]: I know that I'm ultimately wanting to log bytes per request pretty accurately to add up and tell if the whole file was transferred or not even if it was broken up into several byte range requests. I have given up on the possibility of this happening in 2.0 for a long time to come due to the architecture changes required to make this happen, we're back on 1.3 with three times as many servers because of the lack of threading... :-( Don't give up hope just yet. Brian had some last minute aces up his sleeve... Bojan
Re: Counting of I/O bytes
On Wed, 2002-10-16 at 17:54, Brian Pane wrote: I think you could achieve this result by appending a bucket of type apr_bucket_flush to the output brigade. I think that would be a reasonable workaround for 2.0. This is what I wanted to do initially: --- static apr_status_t logio_out_filter(ap_filter_t *f, apr_bucket_brigade *bb) { apr_bucket *e = APR_BRIGADE_LAST(bb); /* End of data, make sure we flush */ if (APR_BUCKET_IS_EOS(e)) { APR_BRIGADE_INSERT_TAIL(bb, apr_bucket_flush_create(f-c-bucket_alloc)); APR_BUCKET_REMOVE(e); apr_bucket_destroy(e); } return ap_pass_brigade(f-next, bb); } --- But then I thought, if the last bucket is EOS, we'll be sending the stuff out anyway. Or maybe I'm wrong there? If so, then this might be better: --- static apr_status_t logio_out_filter(ap_filter_t *f, apr_bucket_brigade *bb) { APR_BRIGADE_INSERT_TAIL(bb, apr_bucket_flush_create(f-c-bucket_alloc)); return ap_pass_brigade(f-next, bb); } --- Even tried this (insert FLUSH before EOS): --- static apr_status_t logio_out_filter(ap_filter_t *f, apr_bucket_brigade *bb) { apr_bucket *e = APR_BRIGADE_LAST(bb); /* End of data, make sure we flush */ if (APR_BUCKET_IS_EOS(e)) { APR_BUCKET_INSERT_BEFORE( apr_bucket_flush_create(f-c-bucket_alloc), e); } return ap_pass_brigade(f-next, bb); } --- But this last one makes Apache go into 100% CPU utilisation after the first ever request has been served. So, this is bad for sure! The first one works fine, but I have to admit that I don't know how to simulate requests that would normally be buffered, which makes my testing completely useless. Any input is appreciated... Bojan
Re: Counting of I/O bytes
At 02:10 AM 10/16/2002, Bojan Smojver wrote: Coming back to what William is proposing, the three pool approach... I'm guessing that with Response put in-between Connection and Request, the Response will store responses for multiple requests, right? No, I'm suggesting a single response pool for each request. As soon as the request is processed, the request pool disappears (although some body may still sitting in brigades, or set aside within filters). As soon as the body for -that- request- has been flushed complete, that response pool will disappear. Since this isn't too clear by the names, perhaps 'connection', 'request' and an outer 'handler' pool make things more clear? In that case, the handler pool would disappear at once when the response body had been constructed, and the 'request' pool would disappear once it was flushed. Then, once the core-filter is done with it, the Response will be notified, it will finalise the processing (e.g. log all the requests) and destroy itself. Sounds good, but maybe we should just keep the requests around until they have been dealt with? Consider pipelining. You can't start keeping 25 requests hanging around for 1 page + 24 images now, can you? The memory footprint would soar through the roof. BTW, is there are way in 2.0 to force every request to be written to the network in order? Maybe I should focus on making an option like that in order to make it work in 2.0? How do you mean? Everything to the network is serialized. What is out of order today, date stamps in the access logs? That is because it takes different amounts of time to handle different sorts of requests, availability from kernel disk cache of the pages, etc. Bill
Re: Counting of I/O bytes
On Thu, 2002-10-17 at 00:36, William A. Rowe, Jr. wrote: At 02:10 AM 10/16/2002, Bojan Smojver wrote: Coming back to what William is proposing, the three pool approach... I'm guessing that with Response put in-between Connection and Request, the Response will store responses for multiple requests, right? No, I'm suggesting a single response pool for each request. As soon as the request is processed, the request pool disappears (although some body may still sitting in brigades, or set aside within filters). As soon as the body for -that- request- has been flushed complete, that response pool will disappear. Since this isn't too clear by the names, perhaps 'connection', 'request' and an outer 'handler' pool make things more clear? In that case, the handler pool would disappear at once when the response body had been constructed, and the 'request' pool would disappear once it was flushed. I was associating this way too much with Request/Response in Servlet architecture. I guess I understand what you mean now. Then, once the core-filter is done with it, the Response will be notified, it will finalise the processing (e.g. log all the requests) and destroy itself. Sounds good, but maybe we should just keep the requests around until they have been dealt with? Consider pipelining. You can't start keeping 25 requests hanging around for 1 page + 24 images now, can you? The memory footprint would soar through the roof. Yep, get it. BTW, is there are way in 2.0 to force every request to be written to the network in order? Maybe I should focus on making an option like that in order to make it work in 2.0? How do you mean? Everything to the network is serialized. What is out of order today, date stamps in the access logs? That is because it takes different amounts of time to handle different sorts of requests, availability from kernel disk cache of the pages, etc. Wrong choice of words here maybe. What I meant here was in order in the time line, like this: - request received - request processed - request flushed to network - request received - request processed - request flushed to network ... rather then what can happen now: - request received - request processed - request received - request processed - request received - request processed - requests flushed to network ... That's what the code that I sent in my other e-mail is all about. Basically, I put the output filter in mod_logio (AP_FTYPE_NETWORK - 1), which has the only purpose of replacing the EOS bucket with FLUSH bucket (the actual counting is done in core output filter). Here it is again: - static apr_status_t logio_out_filter(ap_filter_t *f, apr_bucket_brigade *bb) { apr_bucket *b = APR_BRIGADE_LAST(bb); /* End of data, make sure we flush */ if (APR_BUCKET_IS_EOS(b)) { APR_BRIGADE_INSERT_TAIL(bb, apr_bucket_flush_create(f-c-bucket_alloc)); APR_BUCKET_REMOVE(b); apr_bucket_destroy(b); } return ap_pass_brigade(f-next, bb); } - When core output filter receives the brigade, it will flush because of the FLUSH bucket at the end. My main problem is - how do I test this sucker? What do I have to do to make sure two or more requests are in fact pipelined? Bojan
Counting of I/O bytes
I have submitted the patch along those lines a few days back, which also includes an MMN bump. In the light of the latest discussions on the list, it seems that a patch like that is not being looked at favourably by many people :-( Do you want me to rework that patch so it uses a privata data structure (to avoid MMN bump) rather then two new fields in conn_rec? Or is the quality of the patch unacceptable in general? Just to refresh everyone's memory, this is about the incorrect number of output bytes reported by mod_logio through %O. It does not include the incorrect number of bytes reported in r-bytes_sent, which is a completely separate issue. Any ideas welcome... Bojan
Re: Counting of I/O bytes
On Tue, 2002-10-15 at 22:09, Bojan Smojver wrote: I have submitted the patch along those lines a few days back, which also includes an MMN bump. In the light of the latest discussions on the list, it seems that a patch like that is not being looked at favourably by many people :-( The MMN bump in this case is no problem. Your patch just adds fields at the end of the structure. It's backward compatible, so it only needs an increase in the MMN minor number, not the major number. If the patch added fields in the middle of the struct, thus breaking binary compatibility, then it would be a problem. Do you want me to rework that patch so it uses a privata data structure (to avoid MMN bump) rather then two new fields in conn_rec? Or is the quality of the patch unacceptable in general? I've only had a couple of minutes to look at the patch (too busy with my day job this week), but I think the basic problem with the patch is that you're updating the bytes-sent field in the conn_rec after the writes complete in the core_output_filter, but you're reading that field in a logger function. But the logger function can be called *before* the socket write happens. When handling keep-alive requests, for example, the server may buffer up a small response to one request but not send it until the next response is generated. The first request gets logged as soon as all the response data is sent to core_output filter, so the mod_logio logging code might find that the bytes-sent value in the conn_rec is zero because the socket write hasn't actually happened yet. The best solution is to delay the logging until after the response has actually been sent. This would require more significant changes to the server, though. Most of the data that the logger needs is in the request's pool. Currently, this pool is cleared as soon as we send the response to the core_output_filter and log it. In order to accurately log the bytes sent, we'd have to keep the request pool around until the last of the that request's response data was written. That's not impossible, but it would require some design changes to the httpd core and the output filters. IMHO, that's a good example of a worthwhile design change to make for 2.1. Brian
Re: Counting of I/O bytes
On 15 Oct 2002, Brian Pane wrote: major number. If the patch added fields in the middle of the struct, thus breaking binary compatibility, then it would be a problem. Even adding at the end can break binary compat, since sizeof(conn_rec) changes, so you might have 3rd party code that allocates too little space. Assuming there is ever a case where 3rd party code allocates conn_rec's, that is. --Cliff
Re: Counting of I/O bytes
Quoting Brian Pane [EMAIL PROTECTED]: On Tue, 2002-10-15 at 22:09, Bojan Smojver wrote: I have submitted the patch along those lines a few days back, which also includes an MMN bump. In the light of the latest discussions on the list, it seems that a patch like that is not being looked at favourably by many people :-( The MMN bump in this case is no problem. Your patch just adds fields at the end of the structure. It's backward compatible, so it only needs an increase in the MMN minor number, not the major number. If the patch added fields in the middle of the struct, thus breaking binary compatibility, then it would be a problem. What if someone creates an array of conn_rec structures in their module? Isn't the code that fetches the next one going to be incorrect? For instance: conn_rec c[5]; c[1].not_the_first_field = some_value; /* OOPS, we'll probably get garbage */ Do you want me to rework that patch so it uses a privata data structure (to avoid MMN bump) rather then two new fields in conn_rec? Or is the quality of the patch unacceptable in general? I've only had a couple of minutes to look at the patch (too busy with my day job this week), but I think the basic problem with the patch is that you're updating the bytes-sent field in the conn_rec after the writes complete in the core_output_filter, but you're reading that field in a logger function. But the logger function can be called *before* the socket write happens. When handling keep-alive requests, for example, the server may buffer up a small response to one request but not send it until the next response is generated. The first request gets logged as soon as all the response data is sent to core_output filter, so the mod_logio logging code might find that the bytes-sent value in the conn_rec is zero because the socket write hasn't actually happened yet. The best solution is to delay the logging until after the response has actually been sent. This would require more significant changes to the server, though. Most of the data that the logger needs is in the request's pool. Currently, this pool is cleared as soon as we send the response to the core_output_filter and log it. In order to accurately log the bytes sent, we'd have to keep the request pool around until the last of the that request's response data was written. That's not impossible, but it would require some design changes to the httpd core and the output filters. IMHO, that's a good example of a worthwhile design change to make for 2.1. OK. I think I finally understand what the problem is here. That's what William was trying to explain to me before, but because I don't know the guts of Apache 2 all that well, I didn't understand :-( Basically, what would be happenning with the patch is that the numbers would be logged with the next request, rather then this one. However, if the data was SENT with the next request, then it should be LOGGED with the next request. This seems like the question of semantics to me. The problem I can see with this is the following: - connection opened (keepalive) - request for aaa.domain - not sent (i.e. the server is buffering it) - logged zero output - request for bbb.domain (on the same connection) - sent - logged bytes out for bbb.domain PLUS aaa.domain Is something like this possible? If not, I think we should be pretty much OK as the whole point of mod_logio is to log the traffic, most likely per virtual host. Bojan