Re: should input filter return the exact amount of bytes asked for?

2003-11-14 Thread Justin Erenkrantz
--On Thursday, November 13, 2003 11:01 AM -0800 Stas Bekman [EMAIL PROTECTED] 
wrote:

Should we add an explicit explanation to AP_MODE_READBYTES: return at most
readbytes data. Can't return 0 with APR_BLOCK_READ. Can't return more than
readbytes data.
I'd say the first and last one are equivalent statements.  And, that 
APR_BLOCK_READ description belongs with the definition of APR_BLOCK_READ not 
AP_MODE_READBYTES.

Also while we are at it I have a few more questions:

 /** The filter should return at most one line of CRLF data.
  *  (If a potential line is too long or no CRLF is found, the
  *   filter may return partial data).
  */
 AP_MODE_GETLINE,
does it mean that the filter should ignore the readbytes argument in this
mode?
I think so, yes.

 /** The filter should implicitly eat any CRLF pairs that it sees. */
 AP_MODE_EATCRLF,
does it mean that it should do the same as AP_MODE_GETLINE but kill CRLF? If
not how much data is it supposed to read? Or is it a mode that never goes on
its own and should be OR'ed with some definitive mode, e.g.:
AP_MODE_GETLINE|AP_MODE_EATCRLF and AP_MODE_READBYTES|AP_MODE_EATCRLF?
It's meant to be called right before we read the next pipelined request on the 
connection.  Old (really old) Netscape clients added spurious CRLFs between 
requests.  I don't see a clear rationale why it'd have to be 'combined' with 
other ap_get_brigade() modes.  The only one that'd make sense (to me) is 
AP_MODE_GETLINE.  Note that AP_MODE_EATCRLF doesn't necessarily return 
anything.  It's wildly HTTP specific...

Though it'd be nice to add a note re: APR_BLOCK_READ in the
AP_MODE_READBYTES doc above. Or I guess may be it belongs to some filters
tutorial...
I'll note that I wrote an article on describing httpd-2.x's filters for some 
Linux magazine recently.  I bet you can find back issues.  As an aside, I 
never actually saw the final copy or the printed copy.  So, don't blame me if 
it doesn't help.  ;-)  -- justin


Re: should input filter return the exact amount of bytes asked for?

2003-11-14 Thread Stas Bekman
Justin Erenkrantz wrote:

Thanks for the explanations Justin. Once I'll get some free time I'll need to 
revamp the filters chapter [1] to address the read mode issue. So far I was 
completely ignoring it :(

(1) http://perl.apache.org/docs/2.0/user/handlers/filters.html

Though it'd be nice to add a note re: APR_BLOCK_READ in the
AP_MODE_READBYTES doc above. Or I guess may be it belongs to some filters
tutorial...


I'll note that I wrote an article on describing httpd-2.x's filters for 
some Linux magazine recently.  I bet you can find back issues.  As an 
aside, I never actually saw the final copy or the printed copy.  So, 
don't blame me if it doesn't help.  ;-)  -- justin
Is that the one you are talking about?
http://www.linux-mag.com/2003-08/apache_01.html
rbb wrote a bunch of filtering articles some 2 years ago or so too. It'd 
probably be nice to ask those magazines if we can dump them somewhere under 
the docs-2.0 project, versus linking to them, as ezines tend to move things a 
lot and even kill them.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


Re: should input filter return the exact amount of bytes asked for?

2003-11-13 Thread Stas Bekman
Justin Erenkrantz wrote:
On Tue, Nov 04, 2003 at 01:41:46AM -0800, Stas Bekman wrote:

filter. What happens if the filter returns less bytes (while there is still 
more data coming?) What happens if the filter returns more bytes than 
requested (e.g. because it uncompressed some data). After all the incoming 


Less bytes = OK.
Same bytes = OK.
More bytes = Not OK.  (Theoretically possible though with bad filters.)
Great. Where this should be documented? In the ap_get_brigade .h?

Also,

 0 bytes = Not OK

right? Or how otherwise would you explain the assertion:

  AP_DEBUG_ASSERT(!APR_BRIGADE_EMPTY(bb));

in consumers like ap_get_client_block. Or do you say that a filter can return 
a non-empty brigade with an empty single bucket?

Thanks Justin.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


Re: should input filter return the exact amount of bytes asked for?

2003-11-13 Thread Justin Erenkrantz
--On Thursday, November 13, 2003 12:38 AM -0800 Stas Bekman [EMAIL PROTECTED] 
wrote:

Great. Where this should be documented? In the ap_get_brigade .h?
It's already in util_filters.h.  Read the documentation for ap_input_mode_t:

   /** The filter should return at most readbytes data. */
   AP_MODE_READBYTES,
   ...
right? Or how otherwise would you explain the assertion:

   AP_DEBUG_ASSERT(!APR_BRIGADE_EMPTY(bb));
If using APR_BLOCK_READ, it's illegal to return 0 bytes with AP_MODE_READBYTES 
- that is what this assert is checking for in maintainer mode (this was a 
troublesome assert at one point).  It's the same expectation as doing a 
blocking socking read() - blocking reads shouldn't return until something is 
returned.  -- justin


Re: should input filter return the exact amount of bytes asked for?

2003-11-13 Thread Stas Bekman
Justin Erenkrantz wrote:
--On Thursday, November 13, 2003 12:38 AM -0800 Stas Bekman 
[EMAIL PROTECTED] wrote:

Great. Where this should be documented? In the ap_get_brigade .h?


It's already in util_filters.h.  Read the documentation for 
ap_input_mode_t:

   /** The filter should return at most readbytes data. */
   AP_MODE_READBYTES,
   ...
Aha! I was looking in the wrong place then. Thanks Justin.

Should we add an explicit explanation to AP_MODE_READBYTES: return at most 
readbytes data. Can't return 0 with APR_BLOCK_READ. Can't return more than 
readbytes data.

Also while we are at it I have a few more questions:

/** The filter should return at most one line of CRLF data.
 *  (If a potential line is too long or no CRLF is found, the
 *   filter may return partial data).
 */
AP_MODE_GETLINE,
does it mean that the filter should ignore the readbytes argument in this mode?

/** The filter should implicitly eat any CRLF pairs that it sees. */
AP_MODE_EATCRLF,
does it mean that it should do the same as AP_MODE_GETLINE but kill CRLF? If 
not how much data is it supposed to read? Or is it a mode that never goes on 
its own and should be OR'ed with some definitive mode, e.g.:
AP_MODE_GETLINE|AP_MODE_EATCRLF and AP_MODE_READBYTES|AP_MODE_EATCRLF?

right? Or how otherwise would you explain the assertion:

   AP_DEBUG_ASSERT(!APR_BRIGADE_EMPTY(bb));


If using APR_BLOCK_READ, it's illegal to return 0 bytes with 
AP_MODE_READBYTES - that is what this assert is checking for in 
maintainer mode (this was a troublesome assert at one point).  It's the 
same expectation as doing a blocking socking read() - blocking reads 
shouldn't return until something is returned.  -- justin
Cool:

/** Determines how a bucket or brigade should be read */
typedef enum {
APR_BLOCK_READ,   /** block until data becomes available */
APR_NONBLOCK_READ /** return immediately if no data is available */
} apr_read_type_e;
Though it'd be nice to add a note re: APR_BLOCK_READ in the AP_MODE_READBYTES 
doc above. Or I guess may be it belongs to some filters tutorial...

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


Re: should input filter return the exact amount of bytes asked for?

2003-11-11 Thread Justin Erenkrantz
On Tue, Nov 04, 2003 at 01:41:46AM -0800, Stas Bekman wrote:
 filter. What happens if the filter returns less bytes (while there is still 
 more data coming?) What happens if the filter returns more bytes than 
 requested (e.g. because it uncompressed some data). After all the incoming 

Less bytes = OK.
Same bytes = OK.
More bytes = Not OK.  (Theoretically possible though with bad filters.)

HTH.  -- justin


Re: should input filter return the exact amount of bytes asked for?

2003-11-11 Thread William A. Rowe, Jr.
At 03:31 AM 11/11/2003, Justin Erenkrantz wrote:
On Tue, Nov 04, 2003 at 01:41:46AM -0800, Stas Bekman wrote:
 filter. What happens if the filter returns less bytes (while there is still 
 more data coming?) What happens if the filter returns more bytes than 
 requested (e.g. because it uncompressed some data). After all the incoming 

Less bytes = OK.

But not great if there is more incoming data available (consider that one
can call with NONBLOCK and dig up some more.  There is a balance to be
found here, one doesn't want to slurp 15mb of a file at onces, but one doesn't
want bytes to trickle up one at a time.

Same bytes = OK.

Of course

More bytes = Not OK.  (Theoretically possible though with bad filters.)

Wrong.  This is OK across the board, please consider;

module requests 1000 arbitrary bytes;

  codepage module requests 1000

http reads one 'chunk' available, 8000 bytes
and will return that page

  codepage can translate 7998 bytes and comes to
  a screeching halt for a 3 byte sequence.  returns
  our Now Translated 4000 bytes

module sees a 4000 byte heap bucket.

What can you do?  Instead of treating that bucket as a singleton
when you want 1000 bytes, consume the first 1000 bytes from that
bucket (or the brigade.)

Please review the archives for this discussion (the brigades on the
apr list, the filter api on httpd.)  This was a very long thread, but the
net result of filters is that you get what is available/handy, not any
specific number of bytes.

BIll 



Re: should input filter return the exact amount of bytes asked for?

2003-11-11 Thread Justin Erenkrantz
--On Tuesday, November 11, 2003 11:24 AM -0600 William A. Rowe, Jr. 
[EMAIL PROTECTED] wrote:

More bytes = Not OK.  (Theoretically possible though with bad filters.)
Wrong.  This is OK across the board, please consider;
Uh, no.  We changed the filter semantics some time ago to stop this insanity. 
It was inefficient to call AP_MODE_READBYTES and have it return more than 
asked for.  Check out the CVS log for util_filter.h, specifically around 
revision 1.62.

module requests 1000 arbitrary bytes;

  codepage module requests 1000

http reads one 'chunk' available, 8000 bytes
and will return that page
  codepage can translate 7998 bytes and comes to
  a screeching halt for a 3 byte sequence.  returns
  our Now Translated 4000 bytes
module sees a 4000 byte heap bucket.

What can you do?  Instead of treating that bucket as a singleton
when you want 1000 bytes, consume the first 1000 bytes from that
bucket (or the brigade.)
No.  That means you have 3k more bytes you have to consume that you didn't ask 
for.  The filter wouldn't return it again.  Writing code that used input 
filters and having to deal with that it could get more than asked for was just 
confusing and led to lots of error-prone code.

If it asks for 1k in AP_MODE_READBYTES, it gets at most 1k.  Anything else is 
broken.  (util_filter.h AP_MODE_READBYTES says as much, but that's not fair, 
because I wrote that comment.)

Please review the archives for this discussion (the brigades on the
apr list, the filter api on httpd.)  This was a very long thread, but the
net result of filters is that you get what is available/handy, not any
specific number of bytes.
That *was* indeed the position at one time, but when I redid the input filters 
(which was about rewrite #14 of input filters), we corrected this because it 
was causing lots of problems to return more than asked for - this is when we 
added the mode argument to ap_get_brigade.  mod_ssl's input filtering code was 
just broken under that old API.

And, the big boys even reviewed the code and semantic changes before it went 
in.  So, it was definitely RTC.  ;-)  -- justin


Re: should input filter return the exact amount of bytes asked for?

2003-11-06 Thread Stas Bekman
Stas Bekman wrote:
I'm trying to get rid of ap_get_client_block(), but I don't understand a 
few things. ap_get_client_block() asks for readbytes from the upstream 
filter. What happens if the filter returns less bytes (while there is 
still more data coming?) What happens if the filter returns more bytes 
than requested (e.g. because it uncompressed some data). After all the 
incoming filters all propogate a request for N bytes read to the core_in 
filter, which returns that exact number if it can. Now as the data flows 
up the filter chain its length may change. Does it mean that if the 
filter didn't return the exact amount asked for it's broken? Is that the 
case when it returns less data than requested? Or when it returns more 
data?

I'm trying to deal with the case where a user call wants N bytes and 
I've to give that exact number in a single call. I'm not sure whether I 
should buffer things if I've got too much data or on the opposite ask 
for more bbs if I don't have enough data. Are there any modules I can 
look at to learn from?

The doc for ap_get_brigade doesn't say anything about ap_get_brigade 
satisfying 'readbytes' argument.

/**
 * Get the current bucket brigade from the next filter on the filter
 * stack.  The filter returns an apr_status_t value.  If the bottom-most
 * filter doesn't read from the network, then ::AP_NOBODY_READ is returned.
 * The bucket brigade will be empty when there is nothing left to get.
 * @param filter The next filter in the chain
 * @param bucket The current bucket brigade.  The original brigade passed
 *   to ap_get_brigade() must be empty.
 * @param mode   The way in which the data should be read
 * @param block  How the operations should be performed
 *   ::APR_BLOCK_READ, ::APR_NONBLOCK_READ
 * @param readbytes How many bytes to read from the next filter.
 */
AP_DECLARE(apr_status_t) ap_get_brigade(ap_filter_t *filter,
apr_bucket_brigade *bucket,
ap_input_mode_t mode,
apr_read_type_e block,
apr_off_t readbytes);


What bothers me most is the case where a filter may return more data than it 
has been asked for in the AP_MODE_READBYTES mode. ap_get_client_block() 
doesn't deal with buffering such data and drops it on the floor. So it either 
has to be fixed to do the buffering or the filter spec (ap_get_brigade) needs 
to clearly state that no more than requested amount of data should be returned 
in the AP_MODE_READBYTES. And ap_get_client_block needs to assert if it gets more.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com


should input filter return the exact amount of bytes asked for?

2003-11-04 Thread Stas Bekman
I'm trying to get rid of ap_get_client_block(), but I don't understand a few 
things. ap_get_client_block() asks for readbytes from the upstream filter. 
What happens if the filter returns less bytes (while there is still more data 
coming?) What happens if the filter returns more bytes than requested (e.g. 
because it uncompressed some data). After all the incoming filters all 
propogate a request for N bytes read to the core_in filter, which returns that 
exact number if it can. Now as the data flows up the filter chain its length 
may change. Does it mean that if the filter didn't return the exact amount 
asked for it's broken? Is that the case when it returns less data than 
requested? Or when it returns more data?

I'm trying to deal with the case where a user call wants N bytes and I've to 
give that exact number in a single call. I'm not sure whether I should buffer 
things if I've got too much data or on the opposite ask for more bbs if I 
don't have enough data. Are there any modules I can look at to learn from?

The doc for ap_get_brigade doesn't say anything about ap_get_brigade 
satisfying 'readbytes' argument.

/**
 * Get the current bucket brigade from the next filter on the filter
 * stack.  The filter returns an apr_status_t value.  If the bottom-most
 * filter doesn't read from the network, then ::AP_NOBODY_READ is returned.
 * The bucket brigade will be empty when there is nothing left to get.
 * @param filter The next filter in the chain
 * @param bucket The current bucket brigade.  The original brigade passed
 *   to ap_get_brigade() must be empty.
 * @param mode   The way in which the data should be read
 * @param block  How the operations should be performed
 *   ::APR_BLOCK_READ, ::APR_NONBLOCK_READ
 * @param readbytes How many bytes to read from the next filter.
 */
AP_DECLARE(apr_status_t) ap_get_brigade(ap_filter_t *filter,
apr_bucket_brigade *bucket,
ap_input_mode_t mode,
apr_read_type_e block,
apr_off_t readbytes);
__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com