Re: Counting of I/O bytes

2002-10-16 Thread Brian Pane

On Tue, 2002-10-15 at 23:07, Bojan Smojver wrote:

  The MMN bump in this case is no problem.  Your patch just adds
  fields at the end of the structure.  It's backward compatible,
  so it only needs an increase in the MMN minor number, not the
  major number.  If the patch added fields in the middle of the
  struct, thus breaking binary compatibility, then it would be
  a problem.
 
 What if someone creates an array of conn_rec structures in their module? Isn't
 the code that fetches the next one going to be incorrect? For instance:

Yes, you're right, anything that tries to allocate conn_rec
objects will have problems.

 Basically, what would be happenning with the patch is that the numbers would be
 logged with the next request, rather then this one. However, if the data was
 SENT with the next request, then it should be LOGGED with the next request. This
 seems like the question of semantics to me.
 
 The problem I can see with this is the following:
 
 - connection opened (keepalive)
 - request for aaa.domain
 - not sent (i.e. the server is buffering it)
 - logged zero output
 - request for bbb.domain (on the same connection)
 - sent
 - logged bytes out for bbb.domain PLUS aaa.domain
 
 Is something like this possible? If not, I think we should be pretty much OK as
 the whole point of mod_logio is to log the traffic, most likely per virtual host.

Yes, that scenario is possible.  And even without multiple virtual
hosts, attributing bytes to the wrong request will make the logging
a lot less useful to anyone who's trying to use the bytes_sent data
for debugging or planning purposes.  If we can come up with a way to
delay the logging until after the response is actually sent, though,
then the problem will disappear and your patch will produce much more
accurate numbers.

Brian





Re: Counting of I/O bytes

2002-10-16 Thread Bojan Smojver

Quoting Brian Pane [EMAIL PROTECTED]:

 Yes, that scenario is possible.  And even without multiple virtual
 hosts, attributing bytes to the wrong request will make the logging
 a lot less useful to anyone who's trying to use the bytes_sent data
 for debugging or planning purposes.  If we can come up with a way to
 delay the logging until after the response is actually sent, though,
 then the problem will disappear and your patch will produce much more
 accurate numbers.

Bummer! Oh, well, life isn't perfect...

Coming back to what William is proposing, the three pool approach... I'm
guessing that with Response put in-between Connection and Request, the Response
will store responses for multiple requests, right? Then, once the core-filter is
done with it, the Response will be notified, it will finalise the processing
(e.g. log all the requests) and destroy itself. Sounds good, but maybe we should
just keep the requests around until they have been dealt with?

BTW, is there are way in 2.0 to force every request to be written to the network
in order? Maybe I should focus on making an option like that in order to make it
work in 2.0?

Bojan

PS. So, when's 2.1 going to be released? Just kidding... ;-)



Re: Counting of I/O bytes

2002-10-16 Thread Brian Pane

On Wed, 2002-10-16 at 00:10, Bojan Smojver wrote:

 Coming back to what William is proposing, the three pool approach... I'm
 guessing that with Response put in-between Connection and Request, the Response
 will store responses for multiple requests, right? Then, once the core-filter is
 done with it, the Response will be notified, it will finalise the processing
 (e.g. log all the requests) and destroy itself. Sounds good, but maybe we should
 just keep the requests around until they have been dealt with?

There's a small benefit to cleaning up the request instead of
keeping it around: you can free up the memory that was used for
the request and recycle it for some other request.  This can
help to reduce the total memory footprint of the server.  On
the other hand, creating separate pools for the request and
the response could end up increasing the total memory usage,
because we currently use at least 8KB per pool.  We'll probably
need to do some research to figure out which approach works best
in practice.

 BTW, is there are way in 2.0 to force every request to be written to the network
 in order? Maybe I should focus on making an option like that in order to make it
 work in 2.0?

I think you could achieve this result by appending a bucket of type
apr_bucket_flush to the output brigade.  I think that would be a
reasonable workaround for 2.0.

Brian





Re: Counting of I/O bytes

2002-10-16 Thread David Burry

I know that I'm ultimately wanting to log bytes per request pretty
accurately to add up and tell if the whole file was transferred or not even
if it was broken up into several byte range requests.  I have given up on
the possibility of this happening in 2.0 for a long time to come due to the
architecture changes required to make this happen, we're back on 1.3 with
three times as many servers because of the lack of threading... :-(

Dave

- Original Message -
From: Bojan Smojver [EMAIL PROTECTED]

 Is something like this possible? If not, I think we should be pretty much
OK as
 the whole point of mod_logio is to log the traffic, most likely per
virtual host.





Re: Counting of I/O bytes

2002-10-16 Thread Bojan Smojver

Quoting Brian Pane [EMAIL PROTECTED]:

 I think you could achieve this result by appending a bucket of type
 apr_bucket_flush to the output brigade.  I think that would be a
 reasonable workaround for 2.0.

Cool, thanks. This should be easy - I can simply do it in mod_logio's output
filter, which will run before the core_output_filter for sure. The whole thing
becomes sequential, although probably a bit slower...

I'll let you know when the patch is ready. I'll be going for the patch in which
there is an MMN bump and mod_logio keeps it's filters, but with the changed
logic of logio_output_filter, which inserts the flush bucket at the end of the
brigade.

Bojan



Re: Counting of I/O bytes

2002-10-16 Thread Bojan Smojver

Quoting David Burry [EMAIL PROTECTED]:

 I know that I'm ultimately wanting to log bytes per request pretty
 accurately to add up and tell if the whole file was transferred or not even
 if it was broken up into several byte range requests.  I have given up on
 the possibility of this happening in 2.0 for a long time to come due to the
 architecture changes required to make this happen, we're back on 1.3 with
 three times as many servers because of the lack of threading... :-(

Don't give up hope just yet. Brian had some last minute aces up his sleeve...

Bojan



Re: Counting of I/O bytes

2002-10-16 Thread Bojan Smojver

On Wed, 2002-10-16 at 17:54, Brian Pane wrote:

 I think you could achieve this result by appending a bucket of type
 apr_bucket_flush to the output brigade.  I think that would be a
 reasonable workaround for 2.0.

This is what I wanted to do initially:

---
static apr_status_t logio_out_filter(ap_filter_t *f,
 apr_bucket_brigade *bb) {
apr_bucket *e = APR_BRIGADE_LAST(bb);

/* End of data, make sure we flush */
if (APR_BUCKET_IS_EOS(e)) {
APR_BRIGADE_INSERT_TAIL(bb,
apr_bucket_flush_create(f-c-bucket_alloc));
APR_BUCKET_REMOVE(e);
apr_bucket_destroy(e);
}

return ap_pass_brigade(f-next, bb);
}
---

But then I thought, if the last bucket is EOS, we'll be sending the
stuff out anyway. Or maybe I'm wrong there? If so, then this might be
better:

---
static apr_status_t logio_out_filter(ap_filter_t *f,
 apr_bucket_brigade *bb) {

APR_BRIGADE_INSERT_TAIL(bb,
apr_bucket_flush_create(f-c-bucket_alloc));

return ap_pass_brigade(f-next, bb);
}
---

Even tried this (insert FLUSH before EOS):

---
static apr_status_t logio_out_filter(ap_filter_t *f,
 apr_bucket_brigade *bb) {
apr_bucket *e = APR_BRIGADE_LAST(bb);

/* End of data, make sure we flush */
if (APR_BUCKET_IS_EOS(e)) {
  APR_BUCKET_INSERT_BEFORE(
  apr_bucket_flush_create(f-c-bucket_alloc), e);
}

return ap_pass_brigade(f-next, bb);
}
---

But this last one makes Apache go into 100% CPU utilisation after the
first ever request has been served. So, this is bad for sure!

The first one works fine, but I have to admit that I don't know how to
simulate requests that would normally be buffered, which makes my
testing completely useless. Any input is appreciated...

Bojan




Re: Counting of I/O bytes

2002-10-16 Thread William A. Rowe, Jr.

At 02:10 AM 10/16/2002, Bojan Smojver wrote:

Coming back to what William is proposing, the three pool approach... I'm
guessing that with Response put in-between Connection and Request, the Response will 
store responses for multiple requests, right?

No, I'm suggesting a single response pool for each request.  As soon as
the request is processed, the request pool disappears (although some 
body may still sitting in brigades, or set aside within filters).  As soon as
the body for -that- request- has been flushed complete, that response
pool will disappear.

Since this isn't too clear by the names, perhaps 'connection', 'request'
and an outer 'handler' pool make things more clear?  In that case, the
handler pool would disappear at once when the response body had been
constructed, and the 'request' pool would disappear once it was flushed.

 Then, once the core-filter is
done with it, the Response will be notified, it will finalise the processing
(e.g. log all the requests) and destroy itself. Sounds good, but maybe we should just 
keep the requests around until they have been dealt with?

Consider pipelining.  You can't start keeping 25 requests hanging around
for 1 page + 24 images now, can you?  The memory footprint would soar
through the roof.

BTW, is there are way in 2.0 to force every request to be written to the network
in order? Maybe I should focus on making an option like that in order to make it
work in 2.0?

How do you mean?  Everything to the network is serialized.  What is out
of order today, date stamps in the access logs? That is because it takes
different amounts of time to handle different sorts of requests, availability
from kernel disk cache of the pages, etc.

Bill




Re: Counting of I/O bytes

2002-10-16 Thread Bojan Smojver

On Thu, 2002-10-17 at 00:36, William A. Rowe, Jr. wrote:
 At 02:10 AM 10/16/2002, Bojan Smojver wrote:
 
 Coming back to what William is proposing, the three pool approach... I'm
 guessing that with Response put in-between Connection and Request, the Response 
will store responses for multiple requests, right?
 
 No, I'm suggesting a single response pool for each request.  As soon as
 the request is processed, the request pool disappears (although some 
 body may still sitting in brigades, or set aside within filters).  As soon as
 the body for -that- request- has been flushed complete, that response
 pool will disappear.
 
 Since this isn't too clear by the names, perhaps 'connection', 'request'
 and an outer 'handler' pool make things more clear?  In that case, the
 handler pool would disappear at once when the response body had been
 constructed, and the 'request' pool would disappear once it was flushed.

I was associating this way too much with Request/Response in Servlet
architecture. I guess I understand what you mean now.

  Then, once the core-filter is
 done with it, the Response will be notified, it will finalise the processing
 (e.g. log all the requests) and destroy itself. Sounds good, but maybe we should 
just keep the requests around until they have been dealt with?
 
 Consider pipelining.  You can't start keeping 25 requests hanging around
 for 1 page + 24 images now, can you?  The memory footprint would soar
 through the roof.

Yep, get it.

 BTW, is there are way in 2.0 to force every request to be written to the network
 in order? Maybe I should focus on making an option like that in order to make it
 work in 2.0?
 
 How do you mean?  Everything to the network is serialized.  What is out
 of order today, date stamps in the access logs? That is because it takes
 different amounts of time to handle different sorts of requests, availability
 from kernel disk cache of the pages, etc.

Wrong choice of words here maybe. What I meant here was in order in the
time line, like this:

- request received
- request processed
- request flushed to network
- request received
- request processed
- request flushed to network
...

rather then what can happen now:

- request received
- request processed
- request received
- request processed
- request received
- request processed
- requests flushed to network
...

That's what the code that I sent in my other e-mail is all about.
Basically, I put the output filter in mod_logio (AP_FTYPE_NETWORK - 1),
which has the only purpose of replacing the EOS bucket with FLUSH bucket
(the actual counting is done in core output filter). Here it is again:

-
static apr_status_t logio_out_filter(ap_filter_t *f,
 apr_bucket_brigade *bb) {
apr_bucket *b = APR_BRIGADE_LAST(bb);

/* End of data, make sure we flush */
if (APR_BUCKET_IS_EOS(b)) {
APR_BRIGADE_INSERT_TAIL(bb,
 apr_bucket_flush_create(f-c-bucket_alloc));
APR_BUCKET_REMOVE(b);
apr_bucket_destroy(b);
}

return ap_pass_brigade(f-next, bb);
}
-

When core output filter receives the brigade, it will flush because of
the FLUSH bucket at the end. My main problem is - how do I test this
sucker? What do I have to do to make sure two or more requests are in
fact pipelined?

Bojan




Counting of I/O bytes

2002-10-15 Thread Bojan Smojver

I have submitted the patch along those lines a few days back, which also
includes an MMN bump. In the light of the latest discussions on the list, it
seems that a patch like that is not being looked at favourably by many people :-(

Do you want me to rework that patch so it uses a privata data structure (to
avoid MMN bump) rather then two new fields in conn_rec? Or is the quality of the
patch unacceptable in general?

Just to refresh everyone's memory, this is about the incorrect number of output
bytes reported by mod_logio through %O. It does not include the incorrect number
of bytes reported in r-bytes_sent, which is a completely separate issue.

Any ideas welcome...

Bojan



Re: Counting of I/O bytes

2002-10-15 Thread Brian Pane

On Tue, 2002-10-15 at 22:09, Bojan Smojver wrote:
 I have submitted the patch along those lines a few days back, which also
 includes an MMN bump. In the light of the latest discussions on the list, it
 seems that a patch like that is not being looked at favourably by many people :-(

The MMN bump in this case is no problem.  Your patch just adds
fields at the end of the structure.  It's backward compatible,
so it only needs an increase in the MMN minor number, not the
major number.  If the patch added fields in the middle of the
struct, thus breaking binary compatibility, then it would be
a problem.

 Do you want me to rework that patch so it uses a privata data structure (to
 avoid MMN bump) rather then two new fields in conn_rec? Or is the quality of the
 patch unacceptable in general?

I've only had a couple of minutes to look at the patch (too busy
with my day job this week), but I think the basic problem with the
patch is that you're updating the bytes-sent field in the conn_rec
after the writes complete in the core_output_filter, but you're
reading that field in a logger function.  But the logger function
can be called *before* the socket write happens.  When handling
keep-alive requests, for example, the server may buffer up a
small response to one request but not send it until the next
response is generated.  The first request gets logged as soon as
all the response data is sent to core_output filter, so the
mod_logio logging code might find that the bytes-sent value
in the conn_rec is zero because the socket write hasn't actually
happened yet.

The best solution is to delay the logging until after the response
has actually been sent.  This would require more significant
changes to the server, though.  Most of the data that the logger
needs is in the request's pool.  Currently, this pool is cleared
as soon as we send the response to the core_output_filter and log
it.  In order to accurately log the bytes sent, we'd have to keep
the request pool around until the last of the that request's
response data was written.  That's not impossible, but it would
require some design changes to the httpd core and the output
filters.  IMHO, that's a good example of a worthwhile design
change to make for 2.1.

Brian





Re: Counting of I/O bytes

2002-10-15 Thread Cliff Woolley

On 15 Oct 2002, Brian Pane wrote:

 major number.  If the patch added fields in the middle of the
 struct, thus breaking binary compatibility, then it would be
 a problem.

Even adding at the end can break binary compat, since sizeof(conn_rec)
changes, so you might have 3rd party code that allocates too little space.
Assuming there is ever a case where 3rd party code allocates conn_rec's,
that is.

--Cliff




Re: Counting of I/O bytes

2002-10-15 Thread Bojan Smojver

Quoting Brian Pane [EMAIL PROTECTED]:

 On Tue, 2002-10-15 at 22:09, Bojan Smojver wrote:
  I have submitted the patch along those lines a few days back, which also
  includes an MMN bump. In the light of the latest discussions on the list,
 it
  seems that a patch like that is not being looked at favourably by many
 people :-(
 
 The MMN bump in this case is no problem.  Your patch just adds
 fields at the end of the structure.  It's backward compatible,
 so it only needs an increase in the MMN minor number, not the
 major number.  If the patch added fields in the middle of the
 struct, thus breaking binary compatibility, then it would be
 a problem.

What if someone creates an array of conn_rec structures in their module? Isn't
the code that fetches the next one going to be incorrect? For instance:

conn_rec c[5];
c[1].not_the_first_field = some_value; /* OOPS, we'll probably get garbage */

  Do you want me to rework that patch so it uses a privata data structure
 (to
  avoid MMN bump) rather then two new fields in conn_rec? Or is the quality
 of the
  patch unacceptable in general?
 
 I've only had a couple of minutes to look at the patch (too busy
 with my day job this week), but I think the basic problem with the
 patch is that you're updating the bytes-sent field in the conn_rec
 after the writes complete in the core_output_filter, but you're
 reading that field in a logger function.  But the logger function
 can be called *before* the socket write happens.  When handling
 keep-alive requests, for example, the server may buffer up a
 small response to one request but not send it until the next
 response is generated.  The first request gets logged as soon as
 all the response data is sent to core_output filter, so the
 mod_logio logging code might find that the bytes-sent value
 in the conn_rec is zero because the socket write hasn't actually
 happened yet.
 
 The best solution is to delay the logging until after the response
 has actually been sent.  This would require more significant
 changes to the server, though.  Most of the data that the logger
 needs is in the request's pool.  Currently, this pool is cleared
 as soon as we send the response to the core_output_filter and log
 it.  In order to accurately log the bytes sent, we'd have to keep
 the request pool around until the last of the that request's
 response data was written.  That's not impossible, but it would
 require some design changes to the httpd core and the output
 filters.  IMHO, that's a good example of a worthwhile design
 change to make for 2.1.

OK. I think I finally understand what the problem is here. That's what William
was trying to explain to me before, but because I don't know the guts of Apache
2 all that well, I didn't understand :-(

Basically, what would be happenning with the patch is that the numbers would be
logged with the next request, rather then this one. However, if the data was
SENT with the next request, then it should be LOGGED with the next request. This
seems like the question of semantics to me.

The problem I can see with this is the following:

- connection opened (keepalive)
- request for aaa.domain
- not sent (i.e. the server is buffering it)
- logged zero output
- request for bbb.domain (on the same connection)
- sent
- logged bytes out for bbb.domain PLUS aaa.domain

Is something like this possible? If not, I think we should be pretty much OK as
the whole point of mod_logio is to log the traffic, most likely per virtual host.

Bojan