Re: [Patch] non blocking writes in core

2013-11-21 Thread Joe Orton
On Tue, Nov 19, 2013 at 07:44:07PM +0200, Graham Leggett wrote:
 On 18 Nov 2013, at 1:24 PM, Plüm, Rüdiger, Vodafone Group 
 ruediger.pl...@vodafone.com wrote:
 
  +rv = send_brigade_nonblocking(net-client_socket, bb,
  +  (ctx-bytes_written), c);
  +if (APR_STATUS_IS_EAGAIN(rv)) {
  +setaside_remaining_output(f, ctx, bb, c);
  +}
  +else if (rv != APR_EAGAIN) {
  
  What if rv is APR_SUCCESS?
 
 This is indeed broken, fixed.
 
 Some more testing has revealed that mod_ssl's output filter breaks 
 rules 2 and 5 of the 10 output filter rules published here: 
 http://httpd.apache.org/docs/trunk/da/developer/output-filters.html#rules

Those rules are written (explicitly) for resource-level filters.  They 
would have to be a little different for CONNECTION level, e.g. EOS 
handling should probably be different... though I'm not sure how we'd 
write the rule.  Probably connection filters should delete or pass EOS, 
since they don't (shouldn't) care about request boundaries.  Dunno.

mod_ssl's output filter goes to some effort to apply (10) consistently, 
and is agnostic to data bucket types (5), but yeah, the metadata 
handling is looks wrong as you say.

Regards, Joe


Re: [Patch] non blocking writes in core

2013-11-21 Thread Graham Leggett
On 21 Nov 2013, at 10:43 AM, Joe Orton jor...@redhat.com wrote:

 Those rules are written (explicitly) for resource-level filters.  They 
 would have to be a little different for CONNECTION level, e.g. EOS 
 handling should probably be different... though I'm not sure how we'd 
 write the rule.  Probably connection filters should delete or pass EOS, 
 since they don't (shouldn't) care about request boundaries.  Dunno.
 
 mod_ssl's output filter goes to some effort to apply (10) consistently, 
 and is agnostic to data bucket types (5), but yeah, the metadata 
 handling is looks wrong as you say.

Having looked at this is a little deeper, having implemented the initial pass 
at nonblocking support for the core, and then trying to implement how mod_ssl 
might use this, it has uncovered another snag - filters right now assume that 
APR_EAGAIN is a fatal error and leave immediately, when in reality we might be 
half way through transforming and writing a brigade.

Right now, it seems that filters have the property that they are obliged to 
consume the brigade they have been given. What this means practically is that 
if mod_ssl is given a 4GB file bucket, it is obliged to consume the entire 4GB 
file bucket and ensure it is written downstream before returning, and that 
breaks write completion. Ideally, the core filter should be able to return 
APR_EAGAIN and mod_ssl should do the right thing to set aside the remainder 
before passing back the error to the caller, but it doesn't do that, and 
neither does mod_deflate and a host of other external filters.

Ideally we need a new type of output filter, an asynchronous output filter, 
which in addition to the ten rules have these rules:

- Empty brigades must be passed upstream and not ignored like now. This is so 
that an upstream filter with setaside data has the chance to keep writing it, 
without messing about with metadata buckets or other performance sapping stuff.

- Async filters must understand and react to APR_EAGAIN from upstream, 
APR_EAGAIN is not an error, instead it is a request to call us again when we 
are writable, possibly with an empty brigade. We must setaside what we have so 
far unwritten (probably using functionality buried today in the core output 
filter, suitably reference counting the c-data_in_output_filters).

- The core output filter becomes non blocking by default, if we want blocking 
behaviour we must explicitly send a flush bucket. While it would be nice to 
pass a block/nonblock flag like we do with the input filter stack this 
competes with the flush buckets, which effectively mean block please. 
Defaulting to nonblock until explicitly told otherwise is nice and simple.

Given that the world must be given an opportunity to migrate to async filters, 
we would need to support both at the same time.

What I am thinking is that a filter registers itself either as a legacy 
synchronous filter (you'll be given a brigade, don't return until all the 
brigade is consumed), or an asynchronous filter (feel free to return APR_EAGAIN 
at any time, and setaside data if you receive APR_EAGAIN from upstream), you 
state explicitly which.

The key bit will be the transition from an upstream async filter to an earlier 
sync filter, APR_EAGAIN will need to be trapped and responded to by sending a 
flush bucket up the stack, effectively allowing us to convert APR_EAGAIN into 
APR_SUCCESS which can then be safely received by the earlier sync filter. This 
is a simple task that could be accomplished inside ap_pass_brigade().

The event MPM would simply need to pass empty brigades to c-output_filters 
until c-data_in_output_filters returns to zero.

The addition of a sync filter to the c-output_filters would cause 
write_completion to stop working, but it will still be functional until the 
sync filter became async.

We currently have this situation where we keep track of the start of the 
protocol filter stack in c-output_filters instead of the real start of the 
stack, we could for now insert a helper protocol filter that runs first and 
waits until EOS is received meaning synchronous processing is done, before 
setting aside the data and returning APR_SUCCESS, inviting event to enter write 
completion in the hope that everything from this filter onwards is async and 
therefore eligible for write completion. This filter can also be responsible 
for flow control, vastly simplifying the core_output_filter we have now.

All of this is still backportable to v2.4.

Regards,
Graham
--



RE: [Patch] non blocking writes in core

2013-11-21 Thread Plüm , Rüdiger , Vodafone Group


 -Original Message-
 From: Graham Leggett [mailto:minf...@sharp.fm]
 Sent: Donnerstag, 21. November 2013 14:58
 To: dev@httpd.apache.org
 Subject: Re: [Patch] non blocking writes in core
 
 On 21 Nov 2013, at 10:43 AM, Joe Orton jor...@redhat.com wrote:
 
  Those rules are written (explicitly) for resource-level filters.  They
  would have to be a little different for CONNECTION level, e.g. EOS
  handling should probably be different... though I'm not sure how we'd
  write the rule.  Probably connection filters should delete or pass EOS,
  since they don't (shouldn't) care about request boundaries.  Dunno.
 
  mod_ssl's output filter goes to some effort to apply (10) consistently,
  and is agnostic to data bucket types (5), but yeah, the metadata
  handling is looks wrong as you say.
 
 Having looked at this is a little deeper, having implemented the initial
 pass at nonblocking support for the core, and then trying to implement how
 mod_ssl might use this, it has uncovered another snag - filters right now
 assume that APR_EAGAIN is a fatal error and leave immediately, when in
 reality we might be half way through transforming and writing a brigade.
 
 Right now, it seems that filters have the property that they are obliged
 to consume the brigade they have been given. What this means practically
 is that if mod_ssl is given a 4GB file bucket, it is obliged to consume
 the entire 4GB file bucket and ensure it is written downstream before
 returning, and that breaks write completion. Ideally, the core filter
 should be able to return APR_EAGAIN and mod_ssl should do the right
 thing to set aside the remainder before passing back the error to the
 caller, but it doesn't do that, and neither does mod_deflate and a host of
 other external filters.
 
 Ideally we need a new type of output filter, an asynchronous output
 filter, which in addition to the ten rules have these rules:
 
 - Empty brigades must be passed upstream and not ignored like now. This is
 so that an upstream filter with setaside data has the chance to keep
 writing it, without messing about with metadata buckets or other
 performance sapping stuff.
 
 - Async filters must understand and react to APR_EAGAIN from upstream,
 APR_EAGAIN is not an error, instead it is a request to call us again when
 we are writable, possibly with an empty brigade. We must setaside what we
 have so far unwritten (probably using functionality buried today in the
 core output filter, suitably reference counting the c-
 data_in_output_filters).
 
 - The core output filter becomes non blocking by default, if we want
 blocking behaviour we must explicitly send a flush bucket. While it would
 be nice to pass a block/nonblock flag like we do with the input filter
 stack this competes with the flush buckets, which effectively mean block
 please. Defaulting to nonblock until explicitly told otherwise is nice
 and simple.
 
 Given that the world must be given an opportunity to migrate to async
 filters, we would need to support both at the same time.
 
 What I am thinking is that a filter registers itself either as a legacy
 synchronous filter (you'll be given a brigade, don't return until all the
 brigade is consumed), or an asynchronous filter (feel free to return
 APR_EAGAIN at any time, and setaside data if you receive APR_EAGAIN from
 upstream), you state explicitly which.
 
 The key bit will be the transition from an upstream async filter to an
 earlier sync filter, APR_EAGAIN will need to be trapped and responded to
 by sending a flush bucket up the stack, effectively allowing us to convert
 APR_EAGAIN into APR_SUCCESS which can then be safely received by the
 earlier sync filter. This is a simple task that could be accomplished
 inside ap_pass_brigade().
 
 The event MPM would simply need to pass empty brigades to c-
 output_filters until c-data_in_output_filters returns to zero.
 
 The addition of a sync filter to the c-output_filters would cause
 write_completion to stop working, but it will still be functional until
 the sync filter became async.
 
 We currently have this situation where we keep track of the start of the
 protocol filter stack in c-output_filters instead of the real start of
 the stack, we could for now insert a helper protocol filter that runs
 first and waits until EOS is received meaning synchronous processing is

If this filter sets aside all stuff until eos (or a flush bucket) is seen it 
might
cause a huge amount of memory consumption. So we possibly need some limit for 
memory consuming
buckets which when passed causes us to send the stuff we have down the chain 
even without an eos
or flush bucket. As far as I know the current core output filter also only 
buffers a limited
amount of memory buckets until it decides to go for blocking write.

Otherwise sounds sensible.

Regards

Rüdiger



Re: [Patch] non blocking writes in core

2013-11-21 Thread Graham Leggett
On 21 Nov 2013, at 4:50 PM, Plüm, Rüdiger, Vodafone Group 
ruediger.pl...@vodafone.com wrote:

 If this filter sets aside all stuff until eos (or a flush bucket) is seen it 
 might
 cause a huge amount of memory consumption. So we possibly need some limit for 
 memory consuming
 buckets which when passed causes us to send the stuff we have down the chain 
 even without an eos
 or flush bucket. As far as I know the current core output filter also only 
 buffers a limited
 amount of memory buckets until it decides to go for blocking write.

In theory it would be ideal if flow control could be handled at the start of 
the protocol stack rather than at the end. In other words, the protocol used 
for flow control in core be moved into a dedicated filter. We will then be able 
to pass file buckets through without a problem, and properly handle morphing 
and memory buckets up to the sensible limits we have now. Life then becomes 
simple, memory constrained (and hopefully exclusively async) downstream from 
this filter.

Regards,
Graham
--



Re: [Patch] non blocking writes in core

2013-11-21 Thread Jim Jagielski
/me like :)

On Nov 21, 2013, at 8:57 AM, Graham Leggett minf...@sharp.fm wrote:

 Given that the world must be given an opportunity to migrate to async 
 filters, we would need to support both at the same time.
 
 What I am thinking is that a filter registers itself either as a legacy 
 synchronous filter (you'll be given a brigade, don't return until all the 
 brigade is consumed), or an asynchronous filter (feel free to return 
 APR_EAGAIN at any time, and setaside data if you receive APR_EAGAIN from 
 upstream), you state explicitly which.
 
 The key bit will be the transition from an upstream async filter to an 
 earlier sync filter, APR_EAGAIN will need to be trapped and responded to by 
 sending a flush bucket up the stack, effectively allowing us to convert 
 APR_EAGAIN into APR_SUCCESS which can then be safely received by the earlier 
 sync filter. This is a simple task that could be accomplished inside 
 ap_pass_brigade().
 
 The event MPM would simply need to pass empty brigades to c-output_filters 
 until c-data_in_output_filters returns to zero.
 
 The addition of a sync filter to the c-output_filters would cause 
 write_completion to stop working, but it will still be functional until the 
 sync filter became async.
 
 We currently have this situation where we keep track of the start of the 
 protocol filter stack in c-output_filters instead of the real start of the 
 stack, we could for now insert a helper protocol filter that runs first and 
 waits until EOS is received meaning synchronous processing is done, before 
 setting aside the data and returning APR_SUCCESS, inviting event to enter 
 write completion in the hope that everything from this filter onwards is 
 async and therefore eligible for write completion. This filter can also be 
 responsible for flow control, vastly simplifying the core_output_filter we 
 have now.
 
 All of this is still backportable to v2.4.
 
 Regards,
 Graham
 --
 



Re: [Patch] non blocking writes in core

2013-11-19 Thread Graham Leggett
On 18 Nov 2013, at 1:24 PM, Plüm, Rüdiger, Vodafone Group 
ruediger.pl...@vodafone.com wrote:

 +rv = send_brigade_nonblocking(net-client_socket, bb,
 +  (ctx-bytes_written), c);
 +if (APR_STATUS_IS_EAGAIN(rv)) {
 +setaside_remaining_output(f, ctx, bb, c);
 +}
 +else if (rv != APR_EAGAIN) {
 
 What if rv is APR_SUCCESS?

This is indeed broken, fixed.

Some more testing has revealed that mod_ssl's output filter breaks rules 2 and 
5 of the 10 output filter rules published here: 
http://httpd.apache.org/docs/trunk/da/developer/output-filters.html#rules

Instead of passing on all metadata buckets as required, mod_ssl only takes into 
account EOS, EOC and FLUSH, and tries to consume metadata buckets as normal 
data buckets. SSL_write() then sees the zero length write and treats it as a 
noop. The core filter is never called, the event mpm interprets a non zero 
c-data_in_output_filters as an invitation to try again, and we go into a spin. 
If mod_ssl does this, we can expect other modules in the wild to be a problem.

A possible solution to this dilemma is to try and detect buggy filters and 
compensate for the problem. We might have a protocol helper filter that run 
early on and passes the brigade upstream. If we started with 
c-data_in_output_filters as non zero and we are in WRITE_COMPLETION, and on 
return the filter sees APR_SUCCESS from upstream, but c-data_in_output_filters 
is still not zero, we may not have reached the core filter, so call the core 
filter directly once with a nonblock bucket simulating the behaviour we have 
now. This will ensure the setaside data in the core output filter will always 
be written at least once on each call, preventing the spin. This path in theory 
should only be followed if the buggy filter is present, because when the buggy 
filter is not present and the core is successfully reached the return code will 
be APR_EAGAIN, or APR_SUCCESS with c-data_in_output_filters going to zero.

Regards,
Graham
--



Re: [Patch] non blocking writes in core

2013-11-19 Thread Graham Leggett
On 19 Nov 2013, at 7:44 PM, Graham Leggett minf...@sharp.fm wrote:

 This is indeed broken, fixed.
 
 Some more testing has revealed that mod_ssl's output filter breaks rules 2 
 and 5 of the 10 output filter rules published here: 
 http://httpd.apache.org/docs/trunk/da/developer/output-filters.html#rules
 
 Instead of passing on all metadata buckets as required, mod_ssl only takes 
 into account EOS, EOC and FLUSH, and tries to consume metadata buckets as 
 normal data buckets. SSL_write() then sees the zero length write and treats 
 it as a noop. The core filter is never called, the event mpm interprets a non 
 zero c-data_in_output_filters as an invitation to try again, and we go into 
 a spin. If mod_ssl does this, we can expect other modules in the wild to be a 
 problem.
 
 A possible solution to this dilemma is to try and detect buggy filters and 
 compensate for the problem. We might have a protocol helper filter that run 
 early on and passes the brigade upstream. If we started with 
 c-data_in_output_filters as non zero and we are in WRITE_COMPLETION, and on 
 return the filter sees APR_SUCCESS from upstream, but 
 c-data_in_output_filters is still not zero, we may not have reached the core 
 filter, so call the core filter directly once with a nonblock bucket 
 simulating the behaviour we have now. This will ensure the setaside data in 
 the core output filter will always be written at least once on each call, 
 preventing the spin. This path in theory should only be followed if the buggy 
 filter is present, because when the buggy filter is not present and the core 
 is successfully reached the return code will be APR_EAGAIN, or APR_SUCCESS 
 with c-data_in_output_filters going to zero.

Something like the attached patch, which was a lot simpler than the above.

Regards,
Graham
--


httpd-core-nonblocking2.patch
Description: Binary data


RE: [Patch] non blocking writes in core

2013-11-19 Thread Bert Huijben
 -Original Message-
 From: Graham Leggett [mailto:minf...@sharp.fm]
 Sent: dinsdag 19 november 2013 18:44
 To: dev@httpd.apache.org
 Subject: Re: [Patch] non blocking writes in core
 
 On 18 Nov 2013, at 1:24 PM, Plüm, Rüdiger, Vodafone Group
 ruediger.pl...@vodafone.com wrote:
 
  +rv = send_brigade_nonblocking(net-client_socket, bb,
  +  (ctx-bytes_written), c);
  +if (APR_STATUS_IS_EAGAIN(rv)) {
  +setaside_remaining_output(f, ctx, bb, c);
  +}
  +else if (rv != APR_EAGAIN) {
 
  What if rv is APR_SUCCESS?
 
 This is indeed broken, fixed.

This also breaks unsafe for platforms where there are multiple EAGAIN
values, like on Windows where APR_STATUS_IS_EAGAIN() returns true for quite
a few error codes.

Bert




RE: [Patch] non blocking writes in core

2013-11-18 Thread Plüm , Rüdiger , Vodafone Group
Some comments:

@@ -581,6 +564,33 @@
   eor_buckets_in_brigade, morphing_bucket_in_brigade);
 }
 
+/* Handle non blocking writes. If we saw a non blocking bucket, attempt
+ * a non blocking write. If the non blocking write would have returned
+ * APR_EGAIN, set aside the remainder and return APR_EAGAIN.
+ */
+if (nonblock_bucket_in_brigade) {
+if (loglevel = APLOG_TRACE6) {
+ap_log_cerror(APLOG_MARK, APLOG_TRACE8, APR_SUCCESS, c,
+  core_output_filter: non blocking write 
+  to the network);
+}
+rv = send_brigade_nonblocking(net-client_socket, bb,
+  (ctx-bytes_written), c);
+if (APR_STATUS_IS_EAGAIN(rv)) {
+setaside_remaining_output(f, ctx, bb, c);
+}
+else if (rv != APR_EAGAIN) {

What if rv is APR_SUCCESS?

+/* The client has aborted the connection */
+ap_log_cerror(APLOG_MARK, APLOG_TRACE1, rv, c,
+  core_output_filter: writing data to the network);
+c-aborted = 1;
+}
+return rv;
+}
+
+/* Otherwise handle a normal write. A non blocking write will be attempted,
+ * but we don't return APR_EAGAIN, we return APR_SUCCESS instead.
+ */
 if (bytes_in_brigade = THRESHOLD_MIN_WRITE) {
 rv = send_brigade_nonblocking(net-client_socket, bb,
   (ctx-bytes_written), c);



+AP_DECLARE_DATA const apr_bucket_type_t ap_bucket_type_nonblock = {
+EOC, 5, APR_BUCKET_METADATA,
+apr_bucket_destroy_noop,

I guess this should be something like NBB instead of EOC (copy and paste 
error?)

Otherwise looks sensible from a brief look.

Regards

Rüdiger

 -Original Message-
 From: Graham Leggett 
 Sent: Sonntag, 17. November 2013 20:41
 To: dev@httpd.apache.org
 Subject: [Patch] non blocking writes in core
 
 Hi all,
 
 Continuing on from the discussion about how we might support write
 completion in mod_ssl, I have come up with the following patch below.
 
 I started by changing the event MPM to call all protocol filters instead
 of just the hard coded write filter:
 
 rv = ap_pass_brigade(c-output_filters, cs-bb);
 
 The first thing that this revealed was that we need to pass a non-empty
 brigade down the stack. I needed a metadata bucket to pass down the stack
 to make the brigade not-empty, and so chose to create a metadata bucket
 for this purpose (more on this below).
 
 The second thing that we needed to support for mod_ssl to have any hope of
 taking advantage of write completion was non-blocking writes to the core
 output filter. This would give mod_ssl the option to setaside any large
 buckets (example: file buckets) and allow the core to enter write
 completion as soon as it saw EAGAIN.
 
 We can't change the API for ap_pass_brigade() to add a nonblock flag like
 an input filter, but we can pass a NONBLOCK metadata bucket down the
 stack, and this dovetails nicely with our need for a metadata bucket
 above.
 
 So, this patch does the following:
 
 - Takes out the pass NULL to the core output filter hack that supported
 write completion previously in the event MPM (other MPMs to follow).
 - Add the nonblock bucket, and pass this bucket down the protocol stack
 during write completion.
 - Teach the core output filter how to react to the NONBLOCK bucket,
 returning APR_EAGAIN appropriately when the NONBLOCK bucket is present
 (and APR_SUCCESS when not, as per previous behaviour).
 
 In theory, this opens the door for us to support asynchronous filters that
 know about NONBLOCK buckets and nonblocking behaviour. In the case of
 naive filters, we may end up with existing blocking behaviour during write
 completion, but this is no worse than we have now.
 
 Also in theory, all of this can be backported to v2.4.
 
 Regards,
 Graham
 --