Reviewing the Filtering API

2004-09-22 Thread Nick Kew

The 2.0 filter chain is a great tool: for me it's _the_ major innovation
that turns httpd-2.0 from a (mere) webserver to a powerful applications
platform.  But extensive working with it highlights weaknesses.
The introduction of AddOutputFilterByType sought to address one of the
weaknesses, but it's a bolt-on that doesn't really fit, and is
problematic.  And even if fully successful, it's limited.

As you know, I'm proposing a new filtering framework, and have
implemented (modulo bugs) the main functionality in mod_filter.

Until yesterday, this was implemented purely as a module, suitable for
use with httpd-2.0 and its filters.  That meant some inevitable
duplication of data structures and inefficiency.  It now has several
users running the module with 2.0, and I propose to maintain a version
that can be used with 2.0 without patching or recompiling anything.
But the main thrust is towards tighter integration for 2.2.

Yesterday I made the first move towards integration, by merging the
most important data structs with util_filter and adding a couple of
new API calls (on which more below).

I got some useful feedback on this list when I first mooted the idea
that is now mod_filter, and more recently from users of mod_filter
(my filter_init is badly broken - fix to come).

But I'd like to broaden that into a wider review of filtering.


*** A few issues with util_filter in 2.0:

ap_filter_type
==

Making this an enum and then using values like AP_FTYPE_[anything] + 5
(as is done in, for example, mod_ssl) makes no sense.  An int with
a set of #defined values makes more sense.

ap_filter_t
===

This inclues both request_rec and conn_rec fields, but the request_rec
is invalid in content-level filters, while the conn_rec is of course
available from the request_rec where valid.  So, shouldn't that be a
union?


Documentation
=

I recently fixed PR:19688, but there are other less critical issues
outstanding, such as
 * @param ftype The type of filter function, either ::AP_FTYPE_CONTENT or
 *  ::AP_FTYPE_CONNECTION




* Simplifying Filtering

Yesterday I introduced two new API functions in util_filter:

ap_register_output_filter_protocol
ap_filter_protocol

together with a set of associated #defines

The first function is ap_register_output_filter with an additional
argument proto_flags.  The second sets proto_flags during a request.
The purpose of these is to enable filters to 'opt out' of concerning
themselves with the lower-level details of supporting HTTP.

Example: mod_include

mod_include is a typical output content filter, in that it changes the
data passing through, including changing the byte count.  It's almost
certainly the most widely known and used such filter.

As it stands, it correctly unsets content length, and it deals with
cacheing/Last-Modified in its own way based on configuration (XBitHack).
But it also has some bugs: for example:
* if a Content-MD5 is set, it doesn't unset it.  Likewise an ETag.
* it won't work correctly if served partial contents, but it
  does nothing to prevent that happening (vide discussion on
  handling ranges a couple of months ago).
For mod_include to deal fully with these is a significant burden on the
modules authors.

The new API calls offers mod_include the opportunity to be simplified
at the same time as fixing edge-case bugs such as those I've discussed.
A simple way is to replace the existing

ap_register_output_filter("INCLUDES", includes_filter, includes_setup,
  AP_FTYPE_RESOURCE);

with the new variant

ap_register_output_filter_protocol("INCLUDES",
includes_filter, includes_setup, AP_FTYPE_RESOURCE,
AP_FILTER_PROTO_CHANGE | AP_FILTER_PROTO_CHANGE_LENGTH
| AP_FILTER_PROTO_NO_BYTERANGE );

This causes mod_filter to unset all headers that are invalidated by
the module's content transformation, and prevent it getting byteranges
from the backend.  With this, mod_include still has to process XBitHack
and cacheing headers itself - these are very specific to SSI and don't
generalise to other filters - but mod_filter does everything else.

As with any other filter, mod_include will run unchanged within the
new framework by simply ignoring the additional API calls.

I need review on this, and I need to fix my existing code.  But looking
ahead, any problems with a wider-ranging review of util-filter, including
but not limited to fixing the problems identified above?

-- 
Nick Kew


Re: Reviewing the Filtering API

2004-09-22 Thread Justin Erenkrantz
--On Wednesday, September 22, 2004 5:13 PM +0100 Nick Kew 
<[EMAIL PROTECTED]> wrote:

*** A few issues with util_filter in 2.0:
ap_filter_type
==
Making this an enum and then using values like AP_FTYPE_[anything] + 5
(as is done in, for example, mod_ssl) makes no sense.  An int with
a set of #defined values makes more sense.
Why?  A C enum is operationally equivalent to an int with a set of 
#defines.  So, I'm not seeing the benefit here.  With an enum, we can get 
some range of type safety.

ap_filter_t
===
This inclues both request_rec and conn_rec fields, but the request_rec
is invalid in content-level filters, while the conn_rec is of course
available from the request_rec where valid.  So, shouldn't that be a
union?
Um, request_rec and conn_rec is certainly valid in the content-level 
filters.  So, I'm not sure what you mean.  As an aside, unions lead to 
confusion (i.e. when is member 'a' valid?); so I wouldn't want us to use 
them here: especially for something that third-parties have to deal with 
directly.

Documentation
=
I recently fixed PR:19688, but there are other less critical issues
outstanding, such as
 * @param ftype The type of filter function, either ::AP_FTYPE_CONTENT or
 *  ::AP_FTYPE_CONNECTION
I'm not sure what you mean.
Yesterday I introduced two new API functions in util_filter:
ap_register_output_filter_protocol
ap_filter_protocol
together with a set of associated #defines
The first function is ap_register_output_filter with an additional
argument proto_flags.  The second sets proto_flags during a request.
The purpose of these is to enable filters to 'opt out' of concerning
themselves with the lower-level details of supporting HTTP.
For the record, I sort of like this idea.  But, I'm concerned about how 
these 'flags' are enforced.

Example: mod_include
mod_include is a typical output content filter, in that it changes the
data passing through, including changing the byte count.  It's almost
certainly the most widely known and used such filter.
As it stands, it correctly unsets content length, and it deals with
cacheing/Last-Modified in its own way based on configuration (XBitHack).
But it also has some bugs: for example:
* if a Content-MD5 is set, it doesn't unset it.  Likewise an ETag.
* it won't work correctly if served partial contents, but it
  does nothing to prevent that happening (vide discussion on
  handling ranges a couple of months ago).
For mod_include to deal fully with these is a significant burden on the
modules authors.
The new API calls offers mod_include the opportunity to be simplified
at the same time as fixing edge-case bugs such as those I've discussed.
A simple way is to replace the existing
ap_register_output_filter("INCLUDES", includes_filter, includes_setup,
  AP_FTYPE_RESOURCE);
with the new variant
ap_register_output_filter_protocol("INCLUDES",
includes_filter, includes_setup, AP_FTYPE_RESOURCE,
AP_FILTER_PROTO_CHANGE | AP_FILTER_PROTO_CHANGE_LENGTH
| AP_FILTER_PROTO_NO_BYTERANGE );
This causes mod_filter to unset all headers that are invalidated by
the module's content transformation, and prevent it getting byteranges
from the backend.  With this, mod_include still has to process XBitHack
and cacheing headers itself - these are very specific to SSI and don't
generalise to other filters - but mod_filter does everything else.
As with any other filter, mod_include will run unchanged within the
new framework by simply ignoring the additional API calls.
This sounds good.  -- justin