Re: Notice of Intent: T&R 1.3.34

2005-10-11 Thread Glenn Strauss
On Mon, Oct 10, 2005 at 10:54:36AM -0400, Jim Jagielski wrote:
> I will be T&R'ing 1.3.34 On Tues or Weds

May I humbly request inclusion of a patch I wrote almost a year ago?
http://issues.apache.org/bugzilla/show_bug.cgi?id=31858
|31858|New|Maj|2004-10-22|regular expression matching broken on amd64
It is not a feature request; it fixes a crashing bug on AMD64.
This has been discussed, has been validated, and is included in
Gentoo (http://bugs.gentoo.org/show_bug.cgi?id=70177) and probably
other distributions.

BTW, if the ASF is looking for help better maintaining the 1.3 tree,
I am sure that there are a few people on this list, myself included,
who would volunteer.

Cheers,
Glenn


Re: [PATCH] fixing broken gnu ld (mis)detection problem

2004-12-14 Thread Glenn Strauss
On Tue, Dec 14, 2004 at 03:58:42AM -0600, William A. Rowe, Jr. wrote:
> I would like to see ALOT of feedback to current-testers or dev
> or even apache-modules of the alpha before declaring first beta.
> Once beta - we should be very adverse to API changes - our module
> authors will want to fix once and be done.  Or should we just
> trash the idea of alphas since not enough people are testing?
> Heck, while we are at it, lets just declare it GA.
> 
> The alpha better work for a good number of folks before we go
> to beta.  Then ya - as you say, the oddball kernel/distro issues
> will start popping up and be fixed pretty easily before GA.

A brief reminder of what Paul brought up, and I agree with:

Corporate project managers need a better sense of the release
schedule before they build test-time into their schedules.
I am not asking for hard-and-fast release dates, because httpd
is released when it is ready and not on artificial deadlines.

A completely open-ended release date -- as is currently the case --
is all but ignored by project managers.  Why spend time testing
something that is not going to be released for maybe another year
and will probably change immensely between now and then?

However, if releases were aimed for every, say, 6 months, with
a tag and semi-freeze two months prior, then I think we would see
a lot more testing by corporate users (who aren't already very
involved in this list) on those tags.

Cheers,
Glenn


Re: 2.1.1 tarballs posted...

2004-11-23 Thread Glenn Strauss
On Mon, Nov 22, 2004 at 10:06:02PM +, Matthieu Estrade wrote:
> Justin Erenkrantz wrote:
> 
> >
> >
> >Grab the 2.1.1 tarballs while they're fresh.  Please start testing 
> >these releases - they should have the intent of becoming the beginning 
> >of the 2.2.x series modulo all of the cleanup work we'll have to do 
> >after we branch.  For now, 2.1.1 includes APR/APR-util 1.0.1 - we can 
> >adjust this later, if need be.
> >
> >2.1.1 is currently at alpha level - if we get enough +1s (i.e. 3 or 
> >more), it can then be promoted to beta.  -- justin
> >
> +1 for beta

One of the goals for releasing tar balls of the tree is to encourage
feedback from *outside* the ASF.  Before going beta, I suggest putting
out a call for feedback from large implementers and deployers.  I have
a feeling that many lurk on this list and their experiences with the
tarballs leading up to a release are very, very important.

Cheers,
Glenn


Re: Fwd: [PROPOSAL-VOTE] Adopt lazy consensus for backports...

2004-11-17 Thread Glenn Strauss
On Wed, Nov 17, 2004 at 10:45:14PM +, Matthieu Estrade wrote:
[...]
> I think 2.1 is not public enough
> Actually, i think people don't take time to use cvs + cvs on apr and 
> apr-util, or don't take time to find snapshot to use 2.1. They also 
> don't telnet apache.org to look what we are running. They go on 
> httpd.apache.org, follow download link, and take 2.0.
> 
> A good solution would be to schedule some snapshot based on time (time 
> is impartial and we will not fall in unfinished discussion) and some 
> release based on dev and stability, and communicate more on these new 
> features and enhancement

+1  There have been plenty of requests for tarballs of 2.1
so that more people could more easily play with it and report
issues, even if it is clearly marked httpd-2.1.xx-dev.tar.gz
or httpd-2.1.xx-unstable.tar.gz.

Cheers,
Glenn


Re: Branching and release scheduling

2004-11-16 Thread Glenn Strauss
On Tue, Nov 16, 2004 at 06:34:36PM -0700, Brad Nicholes wrote:
> I have to agree with Jim.  Well put!

And I concur, too.

> >>> [EMAIL PROTECTED] Tuesday, November 16, 2004 5:55:04 PM >>>
[...]
> I find it somewhat hard to get "excited" by 2.x development because
> 2.1 almost seems like a black hole at times... You put the energy
> into the development but have to wait for it to be folded into
> an actual version that gets released and used. Now waiting is
> good, but sometimes it seems that the backports are slow in getting
> approved. I think most developers like, well, if not *immediate*
> satisfaction, something more quick than the current 
> develop-2.1-and-wait-
> for-a-long-time-before-folded-into-2.0.
> 
> Stability is great, but we should be careful that we don't "unduly"
> sacrifice developer energy. This may not be so clear, so feel free
> to grab me :)

Having "stable" releases too often is a burden on admins who need
to test, qualify, and roll-out new versions.  

At the same time, having HEAD languish in svn too long without
critical mass of testers takes the steam out of HEAD because
people will invest more energy in the stable branch.

Now, I think even (stable) and odd (development) sequences are goodness.

I would like to additionally suggest a release schedule for "stable"
along the lines of OpenBSD's semi-annual release.  That way, people
can develop towards the next release instead of working on stable
and playing catch-up each major release.

About two months before the next "stable" release is estimated,
feature-freeze is set and any new features that have not established
themselves as stable will not make it into this stable branch.
Having such a release schedule will hopefully expand the testing
community because large companies will now be invested in testing
a known quantity with a known (estimated) release.

None of this precludes a 2.2 branch which may include an incompatible
API (hopefully with limited impact), or a 3.0 branch with wider API/ABI
consequences.

Cheers,
Glenn


Re: [RFC] Patch for mod_log_config to allow conditioning on status code

2004-10-16 Thread Glenn Strauss
On Fri, Oct 15, 2004 at 08:14:03PM -0600, Paul Querna wrote:
> Glenn Strauss wrote:
> >>On Fri, 15 Oct 2004, Luc Pardon wrote:
> > What are the general requirements to getting a patch or module
> >included in the contrib/ directory?  And on that topic, what
> >happened to http://www.apache.org/dist/httpd/contrib/ ?
> >
> 
> I think /contrib/ is dead.
> 
> I would rather look at committing your patch to httpd CVS.
> 
> I don't have any objections to it on a concept level, but I have not 
> reviewed your code in detail.

Thanks, Paul.

I don't want to discourage Luc, but there's a steep uphill battle
to getting anything into Apache 1.3.  I've tried unsuccessfully
in the past to get what amounts to a two-line addition to
mod_log_config.c to add vhost information to the error log.
Here's some of the history (all for a patch whose meat is 2 lines!):

2003-06-24
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=105649710514527&w=2
2003-07-08
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=105763930020467&w=2
2003-09-04
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=106268054016100&w=2
2003-10-13
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=106608066626342&w=2
http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=106609152403979&w=2

I have been using the current patch, attached below, in production for
a very long time now.  Luc, do you think the patch below might have a
place in your modifications, possibly wrapped in an #ifdef so that
it can be enabled or disabled at compile time?

Cheers,
Glenn

#
# http://www.gluelogic.com/code/
#
diff -ruN apache_1.3.31/src/main/http_log.c apache_1.3.31-new/src/main/http_log.c
--- apache_1.3.31/src/main/http_log.c   2004-02-16 17:29:33.0 -0500
+++ apache_1.3.31-new/src/main/http_log.c   2004-05-24 12:26:06.0 -0400
@@ -349,7 +349,8 @@
 * first. -djg
 */
len += ap_snprintf(errstr + len, sizeof(errstr) - len,
-   "[client %s] ", r->connection->remote_ip);
+   "[client %s] [%s:%u] ", r->connection->remote_ip,
+   ap_get_server_name(r), ap_get_server_port(r));
 }
 if (!(level & APLOG_NOERRNO)
&& (save_errno != 0)


Re: [RFC] Patch for mod_log_config to allow conditioning on status code

2004-10-15 Thread Glenn Strauss
On Sat, Oct 16, 2004 at 12:58:48AM +0100, Nick Kew wrote:
> On Fri, 15 Oct 2004, Luc Pardon wrote:
>
> The usual fate of patches in bugzilla is that, even if they are
> appropriate for inclusion, they need a committer to take sufficient
> interest to review and incorporate them.  A chronic shortage of round
> tuits means this is rather hit-and-miss.

Quite true and quite the cause of consternation to those of us
lurkers out here eager to help if offered the opportunity ...

> >One more thing: I became aware that the "flexible interface for
> > mod_log_config" patch (# 25014) also allows conditioning on status
> > code(s), and there are three other contributed patches against
> > mod_log_config waiting for a decision (# 28037, 29449 and 31311). I am
> > willing to ensure compatibility with any or all of them if desired.
> 
> If you can fix a whole bunch of related bugs on bugzilla without your
> patch becoming big and complex, that adds value but still doesn't
> guarantee anything.
> 
> Ask yourself: is your code sufficiently different to anything we already
> have to merit releasing separately as a third-party module?  If yes, then
> do that.  If no, then it's probably appropriate to offer a patch.
> My guess would be no.

What are the general requirements to getting a patch or module
included in the contrib/ directory?  And on that topic, what
happened to http://www.apache.org/dist/httpd/contrib/ ?

Thanks!
Glenn


Re: buffered logs + piped logs

2004-10-14 Thread Glenn Strauss
On Thu, Oct 14, 2004 at 02:03:06PM -0400, Brian Akins wrote:
> Joe Orton wrote:
> 
> >Speculation: the buffer is being flushed, but the piped logger is
> >SIGTERMed already just like the restart case, per PR 26467?
> 
> Perhaps.  Any progress on that?
> 
> My performance drops when I disable buffered logs, and I would like to 
> be able to do piped logs.  I modified rotatelogs to use a buffer similar 
> to buffered logs in mod_log_config.c.  It registers a sigterm handler to 
> flush the log.
> 
> I would like to be able to have "external" logging, so I can rotate with 
> out a restart.  I would also like to still be able to do graceful 
> restarts.  I thought about using unix domain sockets.  I didn't want to 
> replicate someone else's work.  I couldn't find anything on google.  Any 
> suggestions (doen's have to use unix sockets)?

http://cr.yp.to/daemontools/multilog.html

multilog automatically rotates logs when they reach
a certain size or when SIGALRM is received.
Upon receiving SIGTERM, multilog reads until end of
current line and then exits.  You might wish to disable
that behavior in the code if using it with Apache logging,
especially since multilog cleanly exits automatically
after reading EOF.  (YMMV -- I haven't tried blocking SIGTERM)

http://cronolog.org/

Cronolog automatically rotates logs based on a template,
so you can, say, rotate logs hourly or even more often if
desired.

If you do frequent graceful restarts on a busy server, then
multilog with SIGTERM blocked might be a better choice.
Check the cronolog mailing list for more on graceful restarts.
However, if your only reason for frequent graceful restarts
was to rotate logs, then either of the above are good choices.

I happen to use cronolog for rotating web logs, and I use
multilog for logging of a variety of other deamons.

Cheers,
Glenn


Re: mod_disk_cache directives naming convention

2004-10-07 Thread Glenn Strauss
On Thu, Oct 07, 2004 at 12:12:57PM -0700, Justin Erenkrantz wrote:
> --On Thursday, October 7, 2004 12:13 PM -0600 Jean-Jacques Clar 
> <[EMAIL PROTECTED]> wrote:
> 
> >I won't probably agree if we use 'Waboozle', and I suggest that the
> >description
> >should with the name of the module like MemCache* and DiskCache* to
> >make it easier for related directives to be grouped together (sorted).
> 
> I'd really prefer for all caching directives to be under Cache* so that the 
> alphabetical ordering of the directives that we generate in our docs group 
> them together.  Using MemCache* and DiskCache* instead of CacheMem* and 
> CacheDisk* sort of blow that away.

+1  directives should be grouped into module namespace prefixes
whenever possible

Not only will things sort together in web pages, but this is more
intuitive for many non-English speakers since many languages put
the adjectives after the noun, i.e. the subdescription after the
description.

Cheers,
Glenn


Re: custom vhost module

2004-09-28 Thread Glenn Strauss
On Tue, Sep 28, 2004 at 04:14:03PM -0400, Brian Akins wrote:
> Glenn Strauss wrote:
> >On Tue, Sep 28, 2004 at 09:29:15AM -0400, Brian Akins wrote:
> >
> >>
> >Have you tried giving a list of addresses and ports?
> >
> >
> >...
> >
> 
> I need to be able to use *:8080 *:8081, but that's not allowed...

It seems to be allowed when _not_ using 'NameVirtualHost *',
at least on Apache 1.3.

Would you explain a bit more what you're trying to do?

If you're writing a C module in Apache2, look in ap_read_request(c)
and ap_update_vhost_from_headers(r) if you want to programmatically
change the vhost, but this needs to be done very early in the
request.

Cheers,
Glenn


Re: custom vhost module

2004-09-28 Thread Glenn Strauss
On Tue, Sep 28, 2004 at 09:29:15AM -0400, Brian Akins wrote:
> This may be wrong list to ask, but, is there a supported way to have a 
> module decide which vhost a request belongs to?  I know of some of the 
> mass virtual hosting modules, but they seem to not be fully integrated. 
>  (Not sure if that's the correct way to say that).
> 
> What I'm trying to accomplish:
> 
> I need to be able to have port based virtual hosts that matches on 
> multiple ports.  Ie, ports 8080-8083 match one VirtualHost, while ports 
> 9090-9093 match another, etc.  This does not seem to be supported by 
> current vhost "chooser."

Have you tried giving a list of addresses and ports?


...


Admittedly that is a bit cumbersome for a large range of ports.
For a large range, a reverse proxy using either mod_proxy or
mod_rewrite should work.

Cheers,
Glenn


Re: new config organization for 2.1

2004-09-26 Thread Glenn Strauss
On Sat, Sep 25, 2004 at 10:23:40AM -0600, Brad Nicholes wrote:
> I am in total agreement in fact my guess would be that most
> organizations will eventually split up the httpd.conf file to fit
> whatever needs they have.  But I would imagine that no two organizations
> split up the .conf file in the same way.  So splitting it up and
> installing it in the way that the ASF likes it, just seems more
> confusing than helpful.  But like Joshua said, "If you can suggest a
> better compromise solution, I am very interested",  I don't have a
> better suggestion yet so I will have to think about it some more.

Well, from a more basic HCI level, the monolithic httpd.conf is
extremely difficult to grok for those just being introduced, even
for otherwise seasoned unix admins.  It's way too long and with way
too many directives for which it is unclear how they are related.

(I've taken to writing modules, e.g. mod_foo with directives that
 are all a part of the same namespace, e.g. FooSomething, FooOther)

I support Joshua in that httpd.conf should be as simple as possible
to minimally obtain the same functionality offered by simple web
servers: the ability to server files from disk and handle mime-types;
the barrier to entry to set up a basic, simple, and secure Apache
should be no more difficult than setting up thttpd, boa, mathopd, etc.
And by secure, I mean that the admin should understand the features
that he or she is enabling, not just throwing in a magic incantation
of line-noise directives, a la sendmail.cf.

Of course, commented examples on how to add available features should
be present in other files.   Still, most people should have a config
file that is less than 100 lines, not in excess of 1000 lines.

Cheers,
Glenn


Re: Windows IA64 builds

2004-09-12 Thread Glenn Strauss
On Mon, Sep 13, 2004 at 12:09:20AM -0500, William A. Rowe, Jr. wrote:
> At 05:05 PM 9/12/2004, you wrote:
> >-AP_DECLARE(long) ap_get_client_block(request_rec *r, char *buffer, apr_size_t 
> >bufsiz);
> >+AP_DECLARE(apr_ssize_t) ap_get_client_block(request_rec *r, char *buffer, 
> >apr_size_t bufsiz);
> >
> >Don't know why long was used here, but it causes a warning
> >for Windows IA64 build since long is 32 bits. I'm wondering
> >however if there are any other 64 bit platforms that also have
> >32 bit longs that have ignored this warning. Hopefully not
> >and therefore this would be safe for backport to 2.0. Anyone
> >have information to the contrary?
> 
> On (most/all?) unixes - sizeof(long) == sizeof(void*).

Yes, AFAIK all contemporary unices are LP32, ILP32 (most common),
or LP64 ABIs.  Standard and very predictable and largely compatible
(with the exception of (long double) which is implemented by some
vendors as IEEE Quad and others as IEEE Double-Extended.)

The Open Group has a nice paper out on the subject:
  http://www.opengroup.org/public/tech/aspen/lp64_wp.htm
and compares LP64 to other 64-bit models, including ILP64
and LLP64, the one which Win64 follows.

> On Win64, we have sizeof(long) < sizeof(void*).  That's the
> discrepancy you are observing, and why it wasn't recognized.

The WINE dev list had this to contribute, including some GCC compiler
switches to help ease the pain:
  http://www.mail-archive.com/[EMAIL PROTECTED]/msg14882.html
Thread begins here:
  http://www.mail-archive.com/[EMAIL PROTECTED]/msg14810.html

Cheers,
Glenn


Re: RFE: ap_input_mode_t as bitflags

2004-08-13 Thread Glenn Strauss
On Fri, Aug 13, 2004 at 03:08:17PM -0400, Joe Schaefer wrote:
> Glenn Strauss <[EMAIL PROTECTED]> writes:
> 
> [...]
> 
> > I'm not sure the answer to this one:
> > Are protocol filters attached to the request (I think so)
> > or to the connection?  If attached to the request, then 
> > wouldn't they need to push-back excess data before the request
> > finishes if the data is to be used by later requests on the
> > same connection?
> > 
> > The HTTP_IN filter allocates its ctx from r->pool, so it won't
> > survive multiple requests on the same connection.
> 
> I don't think HTTP_IN would normally read more than one 
> request at a time,

You suggested always using READBYTES.  If a POST used chunked T-E to
send input, then when HTTP_IN is reading chunk trailers, too much data
might be read, i.e. the next request line might be read.  What should
ap_get_mime_headers() do with the excess data at that point?  Should
it arbitrarily push an ephemeral filter onto the bottom of the
connection stack?

> but if it did, it could add an
> ephemeral connection filter and put the excess brigade
> in there.

Messy.  That makes the assumption that the connection-level
filters are right above HTTP_IN; that f->next is a connection-level
filter.  Otherwise, what about other protocol filters which might
be doing translation?  You're bypassing them and throwing data
above them at the connection level.  I don't think this occurs in
current practice, but it is still a violation of what is intended
for filtering, unless you're going to require that there only be
a single protocol filter, or that HTTP_IN always be at the top of
the protocol filter stack.


This just occurred to me: why not promote HTTP_IN to be a
connection level filter, and add a buffer brigade to http_ctx_t?
Not only would that handle the case above, but it would also
obviate the need for the ugly AP_MODE_EATCRLF because the
HTTP_IN filter would handle what is now done in check_pipeline_flush().

And if we remove folding ability from ap_rgetline_core(), (and
replace the one usage of folding in proxy_util.c), then we can
also get rid of AP_MODE_SPECULATIVE.

Getting rid of AP_MODE_SPECULATIVE and AP_MODE_EATCRLF sounds to me
like moves in the right direction.  What do you think?

Cheers,
Glenn



Re: RFE: ap_input_mode_t as bitflags

2004-08-13 Thread Glenn Strauss
On Fri, Aug 13, 2004 at 02:36:38PM -0400, Joe Schaefer wrote:
> Glenn Strauss <[EMAIL PROTECTED]> writes:
> > If you are suggesting that there be no line-mode to read from filters,
> 
> I am.
> 
> > then we might also need some sort of way to push excess data back up
> > the filter chain if we pulled it, parsed out lines, and decided that
> > we did not need all of it, e.g. pulling the next HTTP request line
> > into the HTTP_IN filter for the current request.  
> 
> If a protocol filter reads more data than it needs, then that filter
> needs to set-aside the excess data for downstream filters to collect.
> I see no reason why an imput filter needs to push the excess data 
> back upstream.

I'm not sure the answer to this one:
Are protocol filters attached to the request (I think so)
or to the connection?  If attached to the request, then 
wouldn't they need to push-back excess data before the request
finishes if the data is to be used by later requests on the
same connection?

The HTTP_IN filter allocates its ctx from r->pool, so it won't
survive multiple requests on the same connection.

Cheers,
Glenn


Re: RFE: ap_input_mode_t as bitflags

2004-08-13 Thread Glenn Strauss
On Fri, Aug 13, 2004 at 01:21:15PM -0400, Greg Ames wrote:
> Glenn Strauss wrote:
> >On Wed, Aug 11, 2004 at 03:51:13PM -0700, Justin Erenkrantz wrote:
> 
> >>Please back up a bit.
> >>
> >>Why do you think the modes should be combined?  -- justin
> >
> 
> >More details:
> >-
> >Why bitflags, you ask?
> 
> >AP_MODE_MIME_FOLDING = 16
> 
> yuck.
> 
> getline would be more efficient and easier to understand if we removed the 
> folding code and created a separate function whose only job is to do 
> folding, i.e., get single lines one at a time and combine them if 
> appropriate. ap_get_mime_headers_core() works pretty much like that now, 
> but it has optimizations and additional functions that won't work for a 
> more general ap_get_folded_line() or whatever.

ap_rgetline and ap_get_mime_headers are too specific to be useful
except in a few places.  They both pull from r->input_filters, which
makes them unusable by any routine that wants to pull a line from a
brigade and is located in the filter chain anywhere other than at the
end.

I'm aiming more at a generic API to aid routines that have a brigade
(that they can stick in an ap_filter_ra_t) and have these routines
do the dirty work, whether it be to pull a certain number of bytes or
mime folding or speculative reads.

> If the seldom used folding code is removed from the basic getline, there 
> will be no need to test a folding flag/parameter in the usual non-folding 
> case, and the instruction cache hit rate will improve.

If you're concerned with icache performance, then there are plenty
of other places where Apache could benefit from a little code reorg.
(We all know that increasing code reuse and reducing code duplication
often improves icache performance.)

> As far as I know, only mod_proxy calls ap_getline today with the folding 
> flag set.  Third party modules could do the same, but if we changed this on 
> a release boundary it shouldn't be too disruptive.

That's because ap_get_mime_headers() implements its own folding.
The hoop-jumping behavior of ap_rgetline_core() to not remove all
whitespace from a line containing only whitespace is not needed
if the API that detects MIME folding is at a lower lever and can
return consistent values for completed lines.

(An alternative is not having ap_rgetline_core() remove whitespace
 and having each routine that calls it do it itself.)



Any comments on the ap_rgetline() and ap_get_mime_headers() examples
using ap_brigade_ra* that I posted to the list this morning?

One of the reasons I wrote this API is to be able to make the core
routines ap_rgetline_core(), ap_get_mime_headers(), and others
fully non-blocking if APR_NONBLOCKING_READ is requested.  (Instead
of using ap_regetline_core() and ap_get_mime_headers(), I'd make
other routines that took an (ap_filter_ra_t *) as an arg so that
state could be kept when nonblocking was desired)

Do you have other suggestions how to accomplish this without having 
something keeping state (such as the readahead API I proposed)?

Thanks.
Glenn


Re: RFE: ap_input_mode_t as bitflags

2004-08-13 Thread Glenn Strauss
On Fri, Aug 13, 2004 at 12:37:30PM -0400, Joe Schaefer wrote:
> Justin Erenkrantz <[EMAIL PROTECTED]> writes:
> 
> [...]
> 
> > Therefore, folding might only be possible to do in ap_http_filter, but
> > it can't go down further as into core_input_filter (which is where we
> > now call apr_brigade_split_line). A new getline_folding filter right
> > in front of ap_http_filter would do the trick,
> 
> I really don't like the very confusing AP_MODE_* semantics

Would they be less confusing if the behavior was more consistent?
That's what I'm trying to accomplish, too.

> as a fundamental component of the input filter API.  Why not just 
> replace ap_get_mime_headers_core with a "request_parser" filter 
> (or add this functionality to ap_http_filter) that parses both 
> the request line and the incoming headers?  If that were done, 
> there would be no need for an AP_MODE_GETLINE invocation (hence 
> no apr_brigade_split_line in core_input_filter).

The problem is that many protocols are based around line-by-line
or MIME headers delimited by a blank CRLF line.  And even when
not in filters, I find myself having to parse lines out of brigades.

If you are suggesting that there be no line-mode to read from filters,
then we might also need some sort of way to push excess data back up
the filter chain if we pulled it, parsed out lines, and decided that
we did not need all of it, e.g. pulling the next HTTP request line
into the HTTP_IN filter for the current request.  IFF the HTTP_IN
filter was right before the connection level filters, it could insert
a filter at the bottom of the connection filters and push back the
excess data.  However, I think this is more complicated and more fragile
than having an API -- such as the readahead API I proposed -- that
handles the AP_MODE_* semantics for filters that use it.

Cheers,
Glenn


Re: RFE: ap_input_mode_t as bitflags

2004-08-13 Thread Glenn Strauss
On Thu, Aug 12, 2004 at 10:20:14AM -0700, Justin Erenkrantz wrote:
> --On Thursday, August 12, 2004 2:51 AM -0400 Glenn Strauss 
> <[EMAIL PROTECTED]> wrote:

> >of code duplication between modules.  For example, the behavior of
> >line-mode is vauge and requires that callers re-parse the brigade
> >to check for complete lines.  I've got a whole litany of things,
> 
> It's for the reason listed below...
> 
> >including bad code examples of brigade parsing in the Apache core,
> >but that's another post.
> 
> Well, pointers as to where things aren't right are certainly appreciated.

Certain design decisions result in situations that are annoying
and difficult to work with.

(Some moreso than others, and I am in no way suggesting that all
 of these can easily be fixed or even that they should be fixed)

- The way that all line-mode reads work, the caller must re-parse
  the returned brigade, repeating exactly what upstream filters did.
  The caller must read every bucket, check the length, and check that
  it is LF terminated, when the upstream filters already knew those
  answers.  And because partial lines are returned, the caller must
  do this parsing even when upstream knows that the line is not
  complete.  Also, because partial lines are returned, upstream
  filters do not know when a line length limit has been reached
  if portions of the line had been returned previously.

- ap_http_filter() blocks on read chunk lines and chunk trailers,
  even when APR_NONBLOCK_READ is requested.  Actually, anything that
  uses ap_rgetline() or ap_get_mime_headers() will block.

- apr_brigade_split_line() might return more than readbytes
  The entire bucket is returned when readbytes is exceeded, instead
  of bucket splitting at the limit.

- apr_brigade_split_line() in core_input_filter() uses HUGE_STRING_LEN
  as maximum line length; regardless of what caller specified in
  readbytes.

- Every filter that buffers buckets must reinvent the wheel to protect
  against the buildup of many, many buckets each containing 1-byte,
  or other bucket fragmentation fun.

- AP_MODE_SPECULATIVE only seems to work for reading bytes from connection
  filters.  If it were used during chunking (it is not in practice),
  then it might return portions of chunk lines or chunk trailers
  along with the data, and downstream caller would not know the
  difference.  In other words, AP_MODE_SPECULATIVE is not useful
  to too many filters unless the filter knows certain things about
  where it is in the filter stack.  That means that ap_rgetline()
  can not be used except by those same filters that are hyper-aware
  of their location in the filter stack and ramifications of pulling
  data speculatively.  As it is, AP_MODE_SPECULATIVE is a noticeable
  amount of extra work for the SSL and HTTP_IN filters.

- apr_bucket_read() does not bother to look at the value in &len
  that was passed in.  It always reads APR_BUCKET_BUFF_SIZE
  (8000 bytes) in the case of sockets and pipes.  That's good for
  efficiency, but there is no way for a caller to specify
  "only read X bytes" when it really, really means "only read X bytes"

- the close() routines for pipes and sockets burn a system call if
  the fd is already closed and marked as such.  This could easily
  be avoided by checking if the fd < 0 in the pipe or socket bucket
  before attempting the close() on the pipe or socket bucket fd


That's all I can think of at the moment.

Cheers,
Glenn


Re: RFE: ap_input_mode_t as bitflags

2004-08-13 Thread Glenn Strauss
On Thu, Aug 12, 2004 at 02:06:39PM -0700, Justin Erenkrantz wrote:
> --On Thursday, August 12, 2004 3:52 PM -0400 Glenn Strauss 
> <[EMAIL PROTECTED]> wrote:
> 
> >I saw so much repeated code for parsing brigades, that I created a
> >"readahead" API: ap_brigade_ra().  It is passed similar arguments as
> >those to input filters, and additionally is passed a readahead struct
> >and a readahead limit.  This abstraction acts as a buffer and parses
> >out bytes and lines, reading from upstream filters when it needs to.
> 
> Okay, I have no idea what this ap_brigade_ra() API is or what its intention 
> is based upon your description.

Code reuse and consistent input filter behavior. :)

> You keep referring to the 'readahead API' 
> as the magic solution for all of the issues I raised.  So, I respectfully 
> ask that you please provide this API with examples to this list if you want 
> to continue the discussion.

I'll try.  Apologies for the length.
The following is an attempt at some documentation.

The code below has _not_ been tested, and in fact compilation has not
even been attempted.  It's been cut-n-pasted from various places in
my code.  Still I would be happy to fix any mistakes that are pointed out.

Thanks!
Glenn

(If there are any questions of license, the code that is mine below
 is released under the revised BSD license.  Use it in good health.)



typedef struct filter_readahead_ctx_t {

apr_bucket_brigade *readahead;
apr_bucket_brigade *partial;
apr_bucket_brigade *lookahead;

/* (internal use) bucket following last in-memory data bucket */
apr_bucket *b;

/* length of data read ahead */
/* (up to end of brigade or to first bucket that is not in-memory) */
/* (includes length of data in both readahead and 'partial' brigades) */
/* caller MUST update this if it modifies readahead brigade */
apr_off_t len_readahead;

/* length of data in 'partial' brigade (all buckets are in-memory) */
/* caller MUST update this if it modifies 'partial' brigade */
apr_off_t len_partial;

/* (private) running count of data set aside from ra->readahead
 * when performing speculative reads (AP_MODE_PEEK) */
apr_off_t len_lookahead;

/* (private) running count of data from upstream (appended to readahead)
 * in excess of caller-specified max length to readahead when reading
 * from a non-in-memory bucket.  (On subsequent calls to read routines,
 * it is added to len_upstream to simulate reads from upstream) */
apr_off_t len_pending;

/* running count of data read from upstream (appended to readahead) */
/* (includes data that was subsequently accounted in len_downstream)*/
/* caller may reset this at any time */
apr_off_t len_upstream;

/* running count of data sent downstream (appended to caller bb) */
/* caller may reset this at any time */
apr_off_t len_downstream;

} ap_filter_ra_t;


AP_DECLARE(void)
ap_filter_ra_init(ap_filter_ra_t * const restrict ra,
  apr_pool_t * const restrict pool,
  apr_bucket_alloc_t * const restrict bucket_alloc)
{
memset(ra, '\0', sizeof(ap_filter_ra_t));
ra->readahead = apr_brigade_create(pool, bucket_alloc);

/* (created when needed)
ra->partial   = apr_brigade_create(pool, bucket_alloc);
ra->lookahead = apr_brigade_create(pool, bucket_alloc);
 */

ra->b = APR_BRIGADE_SENTINEL(ra->readahead);
}

/* (not strictly necessary, but frees brigades sooner than connection cleanup)*/
AP_DECLARE(void)
ap_filter_ra_destroy(ap_filter_ra_t * const restrict ra)
{
if (ra->readahead != NULL) {
apr_brigade_destroy(ra->readahead);
}
if (ra->lookahead != NULL) {
apr_brigade_destroy(ra->lookahead);
}
if (ra->partial   != NULL) {
apr_brigade_destroy(ra->partial);
}
}



/**
 * Read bytes from upstream.
 * @param ra readahead structure to keep readahead state
 * @param f current filter (not f->next)
 *(e.g. filter with which ra is associated)
 * @param bb brigade in which to return results
 * @param r_mode AP_MODE_BYTES (and not AP_MODE_LINE), plus optional flags
 *(neither or one of AP_MODE_PASS_SPECIALS or AP_MODE_PEEK, not both)
 *(@see ap_input_mode_t)
 * @param r_type APR_BLOCK_READ or APR_NONBLOCK_READ
 *(@see apr_read_type_e)
 * @param r_limit max number bytes to return in bb
 * @param ra_limit max number of bytes to readahead
 *(includes data buffered in ra and data read from upstream)
 *(if ra_limit <= 0, r_limit is used as ra_limit)
 * @return APR_SUCCESS and buckets appended to bb if successful
 * APR_EINVAL if AP_MODE_PASS_

Re: BRIGADE_NORMALIZE is bad coding example

2004-08-12 Thread Glenn Strauss
On Thu, Aug 12, 2004 at 02:27:03PM -0700, Justin Erenkrantz wrote:
> --On Thursday, August 12, 2004 5:03 PM -0400 Glenn Strauss 
> <[EMAIL PROTECTED]> wrote:
> 
> > BRIGADE_NORMALIZE skips the bucket after a 0-length bucket.
> >In doing so, it might skip a 0-length bucket following it.  If the last
> 
> IMHO, the cleanest correct fix is to just add an else clause.  At the cost 
> of more variables, you could always use d and forgo the else clause, but I 
> think that'll just confuse readers of the macro even more.  We also can't 
> accept C99 extensions.  *shrug*
> 
> We probably never hit this odd corner case in practice though because 
> core_input_filter only has one of the following: one empty bucket, one 
> socket bucket, one heap bucket, or a heap and a socket bucket: two empty 
> buckets won't happen due to the design.  FWIW, BRIGADE_NORMALIZE was thrown 
> out of apr-util because it was really bad to begin with: hence the comment 
> ("This is bad.") and the localization of it to only server/core.c.  It was 
> never meant to be an example - yet, this is a minor bug more than anything 
> serious.  -- justin

Right.  I didn't say it was a problem in practice.
I did say that it was a terrible piece of code, and since this list
often refers people to "look at the code", it should be fixed, IMHO.
It is a _bad_ and _broken_ example of how to loop through a brigade.

As for C99 extensions, I understand that it is not available on all
platforms, but why can't new code checked in include the 'restrict'
keyword?  Just like there is an APR_INLINE macro, why isn't there
an APR_RESTRICT macro indirection?  Would a patch implementing such
in APR be accepted for APR 1.0?

Cheers,
Glenn


BRIGADE_NORMALIZE is bad coding example

2004-08-12 Thread Glenn Strauss
The BRIGADE_NORMALIZE macro in server/core.c is a
terrible coding example of brigade use.

It took me a little while to wrap my head around brigades
because I had assumed this code to be correct.  It is not.


BRIGADE_NORMALIZE checks e before checking e != sentinel on first time
through the loop.  In core_input_filter(), the brigade is known not to
be empty when this macro is called, but the macro has no such knowledge.
This macro would be better integrated into core_input_filter() and not
abstracted as a macro with undocumented assumptions.

BRIGADE_NORMALIZE skips the bucket after a 0-length bucket.
In doing so, it might skip a 0-length bucket following it.  If the last
bucket is 0-length, it will skip the sentinel and loop through the
brigade twice, thereby catching the 0-length buckets that were skipped
the first time through, unless there had been 4 zero-length buckets in
a row.  Repeat loop again if last bucket is a zero-length bucket ...
(Quick fix is to change line with 'd' to: d = APR_BUCKET_PREV(e); )

With better coding, it is redundant to check
  (!APR_BRIGADE_EMPTY(b) && e != APR_BRIGADE_SENTINEL(b))
If BRIGADE_NORMALIZE was not skipping buckets improperly, simply checking
  e != APR_BRIGADE_SENTINEL(b)
would be sufficient.


#if 0

/*** BROKEN ***/  /* this is from server/core.c and is BROKEN */

/**
 * Remove all zero length buckets from the brigade.
 */
#define BRIGADE_NORMALIZE(b) \
do { \
apr_bucket *e = APR_BRIGADE_FIRST(b); \
do {  \
if (e->length == 0 && !APR_BUCKET_IS_METADATA(e)) { \
apr_bucket *d; \
d = APR_BUCKET_NEXT(e); \
apr_bucket_delete(e); \
e = d; \
} \
e = APR_BUCKET_NEXT(e); \
} while (!APR_BRIGADE_EMPTY(b) && (e != APR_BRIGADE_SENTINEL(b))); \
} while (0)

#endif



/*** CORRECT ***/

/**
 * Remove all zero-length buckets from a brigade.
 * (IMPL: In tight loops, there is some benefit to storing sentinel in constant)
 * (IMPL: 'restrict' keyword is part of C99 standard
 *Compile w/ "gcc -std=c99" (or "gcc -Drestrict= -Wall ..." to disable))
 */
#define BRIGADE_NORMALIZE(bb)   \
do {\
register apr_bucket * restrict ap__b;   \
apr_bucket *ap__btmp = APR_BRIGADE_FIRST(bb);   \
apr_bucket * const apr__sentinel = APR_BRIGADE_SENTINEL(bb);\
while ((ap__b = ap__btmp) != apr__sentinel) {   \
ap__btmp = APR_BUCKET_NEXT(ap__b);  \
if (ap__b->length == 0 && !APR_BUCKET_IS_METADATA(ap__b)) { \
apr_bucket_delete(ap__b);   \
}   \
}   \
} while (0)


Cheers,
Glenn


Re: RFE: ap_input_mode_t as bitflags

2004-08-12 Thread Glenn Strauss
I think your response demonstrates fairly well how obtuse input
filtering has become. ;)

I saw so much repeated code for parsing brigades, that I created a
"readahead" API: ap_brigade_ra().  It is passed similar arguments as
those to input filters, and additionally is passed a readahead struct
and a readahead limit.  This abstraction acts as a buffer and parses
out bytes and lines, reading from upstream filters when it needs to.

Filters can store their readahead struct in their ctx and can use
the readahead API instead of calling ap_get_brigade() themselves.
(There's a lot more to say here, but I'll have to put together a
more coherent description)

In short, the readahead API attempts to address all of your concerns
listed below, and more.  (I originally started this when I wanted to
make HTTP_IN filter fully non-blocking and saw just how deep the
blocking went.)


On Thu, Aug 12, 2004 at 10:20:14AM -0700, Justin Erenkrantz wrote:
>--On Thursday, August 12, 2004 2:51 AM -0400 Glenn Strauss 
><[EMAIL PROTECTED]> wrote:
>
>> /** The filter should pass any special buckets (not in-memory) as long
>> as it  *  does not need to perform any processing on them (translation
>> or protocol  *  delimiting) (augments AP_MODE_BYTES; mutually exclusive
>> w/ AP_MODE_PEEK)  */
>> AP_MODE_PASS_SPECIALS = 4,
>
>This is a violation of the bucket brigade API (i.e. trying to force a type on 
>the returned bucket when it is supposed to be abstract).  I definitely would 
>like to see a *lot* more justification as to why this is mandatory.

Unlike AP_MODE_EXHAUSTIVE which is an imperative, AP_MODE_PASS_SPECIALS
is an optional request that can be made with AP_MODE_BYTES.

AP_MODE_BYTES | AP_MODE_PASS_SPECIALS is intended to be the safe equivalent
to AP_MODE_EXHAUSTIVE.  It says "pass me everything that you've got, that
you can safely pass.  I, the caller, can handle non-in-memory buckets".
For example, the core input filter could pass everything.  SSL would NOT
pass the socket because it must decrypt the data.  The HTTP_IN filter
would NOT pass everything because it would need to keep track of C-L or
chunking.  However, in the proxy response case where not C-L or chunked
(reading until connection close), then HTTP_IN could pass the socket.
The readahead API handles the special buckets for any filter that uses it.

Among other things, AP_MODE_BYTES | AP_MODE_PASS_SPECIALS allows the
input filter chain to collapse itself where it can.

>> /** The filter read should be treated as speculative and any returned
>>  *  data should be stored for later retrieval in another mode.
>>  *  (augments AP_MODE_BYTES or AP_MODE_LINE)
>>  */
>> AP_MODE_PEEK = 8,
>
>PEEK (aka SPECULATIVE: PEEK was a poor name for a lot of reasons which is why 
>we tossed it).  It's not possible to know whether it is going to be a line or 
>a byte, so it doesn't make sense for that to be its own independent mode. 

It is not a separate mode in my API.  It is either
(AP_MODE_BYTES | AP_MODE_PEEK) or (AP_MODE_LINE | AP_MODE_PEEK)

Input filtering is (should be) a read API which reads a set of bytes.
Sometimes the caller wants to treat these bytes as organized in a line.
Sometimes the caller wants to read a (possibly) MIME-folded set of lines.
Sometimes the caller wants to speculatively read some bytes or speculative
read a line. ...

These "sometimes" are why I need ap_input_mode_t to be a set of bitflags.

("peek" vs. "speculative" name doesn't matter to me)

>Doing a speculative read on a line isn't possible and adds a lot of overhead.

Yes it's possible and it is built into the readahead API.  Other than
copying buckets (small, but non-trivial overhead), it does not add any
additional overhead.  At the end of the readahead routines, the buckets
to be returned are either removed from the readahead struct, or copied
from them.  A simple conditional decides.

>You should only be calling SPECULATIVE when your protocol doesn't state what 
>to do next.  If you are expecting a line, then just call AP_MODE_GETLINE.  So, 
>I don't believe this option isn't combinable with anything else.

I think speculative mode is severly broken.  We should not call "across"
filters; we should not have to know which filters are above us.
AP_MODE_PEEK actually pulls data from upstream into the readahead struct.
We are actually peeking into data that we then have the opportunity to
consume again.  However, any translation from above filters will already
have been performed, and will not need to be performed again; it is 
buffered in the readahead struct.

If the only reason for SPECULATIVE mode is to peek into connection filters,
then get rid of SPECULATIVE mode and just insert a "peek" filter at
the bottom of the 

Re: RFE: ap_input_mode_t as bitflags

2004-08-11 Thread Glenn Strauss
On Wed, Aug 11, 2004 at 03:51:13PM -0700, Justin Erenkrantz wrote:
> --On Wednesday, August 11, 2004 5:16 PM -0400 Glenn Strauss 
> <[EMAIL PROTECTED]> wrote:
> 
> >I'm finding ap_input_mode_t very restrictive as a linear enum
> >and would like to make it an enum of bitflags.
> 
> Please back up a bit.
> 
> Why do you think the modes should be combined?  -- justin

Short answer:
-
I am refactoring most of the input filtering for more consistent
behavior and much greater code reuse.  However, two incompatible
changes are needed, and with Apache 2.2 in sight, I wanted to
introduce those changes now.

Those changes are: ap_input_mode_t becomes bitflags, and input
filters must check for bitflags instead of exact values
(mode & AP_MODE_GETLINE) instead of (mode == AP_MODE_GETLINE)
Also, AP_MODE_GETLINE should not return partial lines.


More details:
-
Why bitflags, you ask?

I had hoped to introduce this a bit more slowly,
but here is a brief peek without too much explanation:

I'd like any of the following to be legal values:

AP_MODE_BYTES
AP_MODE_BYTES | AP_MODE_PASS_SPECIALS
AP_MODE_BYTES | AP_MODE_PEEK
AP_MODE_LINE
AP_MODE_LINE | AP_MODE_PEEK
AP_MODE_LINE | AP_MODE_MIME_FOLDING
AP_MODE_LINE | AP_MODE_MIME_FOLDING | AP_MODE_PEEK

typedef enum {

/** The filter should return at most one line of CRLF data.
 *  (If a potential line is too long or no CRLF is found,
 *   the line is not returned, an error is.)
 */
AP_MODE_LINE = 1,

/** The filter should return at most readbytes data. */
AP_MODE_BYTES = 2,

/** The filter should pass any special buckets (not in-memory) as long as it
 *  does not need to perform any processing on them (translation or protocol
 *  delimiting) (augments AP_MODE_BYTES; mutually exclusive w/ AP_MODE_PEEK)
 */
AP_MODE_PASS_SPECIALS = 4,

/** The filter read should be treated as speculative and any returned
 *  data should be stored for later retrieval in another mode.
 *  (augments AP_MODE_BYTES or AP_MODE_LINE)
 */
AP_MODE_PEEK = 8,

/** When reading lines, look for MIME-folded continuations as well
 *  (augments AP_MODE_LINE)
 */
AP_MODE_MIME_FOLDING = 16

} ap_input_mode_t;


I think that input filtering is difficult to use and leads to a lot
of code duplication between modules.  For example, the behavior of
line-mode is vauge and requires that callers re-parse the brigade
to check for complete lines.  I've got a whole litany of things,
including bad code examples of brigade parsing in the Apache core,
but that's another post.

One advantage of the new API:
MIME continuation lines are used so often (HTTP headers/trailers,
email message headers, MIME encapsulation, ...) that they should be
part of the core API.  Instead of code duplication (witness
ap_rgetline_core() and ap_get_mime_headers_core() both implementing
continuation line unfolding), AP_MODE_LINE | AP_MODE_MIME_FOLDING
does not return a partial line.  It only returns a complete line
including continuations.  Simplifying things a bit for now, this
allows caller to know that if its brigade is returned non-empty,
that the line is complete, including continuations, without caller
having to re-parse the line to double-check.

(More posts to follow, but I do not wish to flood the list if no
 one wants to engage the conversation.)

Cheers,
Glenn


RFE: ap_input_mode_t as bitflags

2004-08-11 Thread Glenn Strauss
I'm finding ap_input_mode_t very restrictive as a linear enum
and would like to make it an enum of bitflags.

If I put together a patch, what are the chances it will be accepted?

It is for Apache 2.1/2.2 only, because it
a) breaks binary compatibility by changing the ap_input_mode_t values
b) breaks modules that check (mode == AP_MODE_GETLINE) and the like
   instead of (mode & AP_MODE_GETLINE)
My patch would include minimal fixes for the input filters
http, proxy, and ssl.

Looking for some encouragement.  Thanks.
Glenn


Re: POST without Content-Length

2004-08-07 Thread Glenn Strauss
On Sat, Aug 07, 2004 at 05:24:19PM -0700, Roy T. Fielding wrote:
> >Since the Apache server can not know if CGI requires C-L, I conclude
> >that CGI scripts are broken if they require C-L and do not return
> >411 Length Required when the CGI/1.1 CONTENT_LENGTH environment
> >variable is not present.  It's too bad that CGI.pm and cgi-lib.pl
> >are both broken in this respect.  Fixing them would be simple and
> >that would take care of the vast majority of legacy apps.

Here's my argument, notwithstanding current practice and the
extremely poor quality of many, many CGI programs out there:

[snip]
> CGI is supposed to be a simple interface for web programming.

You and your cohorts did a great job with it.  It's been quite
extensible and robust (which is not to say it is perfect :)).

[snip]
> CGI was defined in 1993.  HTTP/1.0 in 1993-95.  HTTP/1.1 in 1995-97.

OK, so any robust CGI written after 1995 that is intended to be
gatewayed from an HTTP web server should grok all standard HTTP/1.0
request methods and should know which ones it supports, and which
ones _require_ a message body (e.g. POST, PUT).

And, naturally, the CGI should know it is speaking to an HTTP/1.x 
server on the other end of the gateway because of the CGI/1.1
SERVER_PROTOCOL environment variable.


Now, I always read the CGI/1.1 spec as _requiring_ a CONTENT_LENGTH
if there was any message body.  (<-- Note the qualification).


http://hoohoo.ncsa.uiuc.edu/cgi/env.html

The following environment variables are specific to the request being fulfilled by the 
gateway program:

[...]

#  CONTENT_TYPE

For queries which have attached information, such as HTTP POST and PUT, this is the 
content type of the data.

# CONTENT_LENGTH

The length of the said content as given by the client. 


It is implied that CONTENT_LENGTH should be set to 0 by the HTTP side
of the gateway when a request method that requires a message body has
an empty body.  Therefore, on the other side of the gateway, any robust
little CGI should check for CONTENT_LENGTH, because it wants to be able
to detect premature end of content.  (It doesn't even have to know that
the HTTP server required a message body for that method)

If a script expects content on stdin and does not find CONTENT_LENGTH,
then even very old CGI scripts should abort with 400 Bad Request.
Better, the script should return 411 Length Required if it
requires CONTENT_LENGTH and one is not present.

Of course, if the script does not require that content be present,
and CONTENT_LENGTH is not present, scripts that do not support
HTTP/1.1 semantics (i.e. Transfer-Encoding) will assume no content
and will silently ignore the content that was submitted.  However,
since the content is not required, one must assume that no corruption
will occur because of its absense; only some missing information results

(RFC 2616 says that C-L should be ignored if T-E is present, and so
an HTTP/1.1 server might pass CONTENT_LENGTH=0 to such scripts along
with HTTP_TRANSFER_ENCODING=chunked, and still be proper.  Scripts
that require content, should then definitely return 400 Bad Request)

In any case, CGI scripts are welcome to support HTTP/1.1 and T-E
if they see appropriate SERVER_PROTOCOL and HTTP_TRANSFER_ENCODING
environment variables.

Cheers,
Glenn


Re: POST without Content-Length

2004-08-07 Thread Glenn Strauss
On Sat, Aug 07, 2004 at 03:36:33PM -0700, Roy T. Fielding wrote:
> >>CGI would happen after mod_deflate.  If mod_deflate changes the 
> >>request
> >>body without also (un)setting content-length, then it is broken.
> >
> >Huh? Input filters are pulled, so they run *after* the handler has been
> >started. And - CONTENT_LENGTH (if any - It's unset for chunked as 
> >well) still
> >reflects the Content-Length sent by the client. So the current 
> >behaviour is
> >correct in all cases.
> 
> No, it is broken in all cases.  CGI scripts cannot handle chunked input
> and they cannot handle bodies without content-length -- that is how the
> interface was designed.  You would have to define a CGI+ interface to
> get some other behavior.
> 
> >A CGI script therefore should never trust Content-Length, but just read
> >stdin until it meets an EOF.
> 
> We cannot redefine CGI.  It is a legacy crap interface.  Input filters
> either have to be disabled for CGI or replaced with a buffering system
> that takes HTTP/1.1 in and supplies CGI with the correct metadata and 
> body.

Actually, IMHO, RFC 2616 Section 4.4 clearly defines expected behavior:

   For compatibility with HTTP/1.0 applications, HTTP/1.1 requests
   containing a message-body MUST include a valid Content-Length header
   field unless the server is known to be HTTP/1.1 compliant. If a
   request contains a message-body and a Content-Length is not given,
   the server SHOULD respond with 400 (bad request) if it cannot
   determine the length of the message, or with 411 (length required) if
   it wishes to insist on receiving a valid Content-Length.

Since the Apache server can not know if CGI requires C-L, I conclude
that CGI scripts are broken if they require C-L and do not return
411 Length Required when the CGI/1.1 CONTENT_LENGTH environment
variable is not present.  It's too bad that CGI.pm and cgi-lib.pl
are both broken in this respect.  Fixing them would be simple and
that would take care of the vast majority of legacy apps.

For custom apps, supporting T-E chunked and C-L in CGI is trivial,
and only requires that the calling app use a get_input() or similar
abstraction instead of read()ing directly from stdin.

In Apache 1.3, ap_setup_client_block(r, REQUEST_CHUNKED_PASS);
just passes along chunks to target applications.  Adding support
to mod_cgi in Apache2 for passing chunks is also straightforward.

Cheers,
Glenn


[PATCH] add ap_send_http_header back to Apache2

2004-08-04 Thread Glenn Strauss
These's been a recent discussion over on the modperl list
(reference: http://marc.theaimsgroup.com/?t=10912298461&r=1&w=2)
with regards to C-L and HEAD requests.  Dynamic handlers that know the
length of the content, but do not want to generate the content must send
something (anything) down the filter chain before returned (before EOS),
otherwise the content-length filter strips ends up stripping the C-L header.

Since the body is discarded for HEAD requests, sending anything works
(because then the content-length filter steps out of the way).
The cleanest solution seems to be to send a flush bucket to trigger
the sending of headers.


Since this seems to me to be something that lots of dynamic handlers might
use, I'd like to ask that ap_send_http_header() be re-added to Apache2.
It's implementation currently is simply to send a flush bucket down the
filter chain.

Even though HTTP headers are sent when needed, there are occasions when
dynamic handlers want to say "send headers now".  ap_send_http_header()
should provide that for them.  This is instead of relying on the current
behavior of a flush bucket, or can this behavior be relied upon and can
it be documented as such?  Thanks!

Cheers,
Glenn


diff -ruN httpd-2.0.50/include/ap_compat.h httpd-2.0.50-new/include/ap_compat.h
--- httpd-2.0.50/include/ap_compat.h2004-02-09 15:54:33.0 -0500
+++ httpd-2.0.50-new/include/ap_compat.h2004-08-04 15:22:08.657596200 -0400
@@ -22,6 +22,5 @@
 /* redefine 1.3.x symbols to the new symbol names */
 
 #define MODULE_VAR_EXPORTAP_MODULE_DECLARE_DATA
-#define ap_send_http_header(r) ;
 
 #endif /* AP_COMPAT_H */
diff -ruN httpd-2.0.50/include/http_protocol.h httpd-2.0.50-new/include/http_protocol.h
--- httpd-2.0.50/include/http_protocol.h2004-06-11 16:46:41.0 -0400
+++ httpd-2.0.50-new/include/http_protocol.h2004-08-04 15:26:59.078445520 -0400
@@ -73,6 +73,12 @@
 /* Finish up stuff after a request */
 
 /**
+ * Triggers sending of response headers, if not already sent to client
+ * @param r The current request
+ */
+AP_DECLARE(apr_status_t) ap_send_http_header(request_rec *r);
+
+/**
  * Called at completion of sending the response.  It sends the terminating
  * protocol information.
  * @param r The current request
diff -ruN httpd-2.0.50/modules/http/http_protocol.c 
httpd-2.0.50-new/modules/http/http_protocol.c
--- httpd-2.0.50/modules/http/http_protocol.c   2004-02-09 15:53:18.0 -0500
+++ httpd-2.0.50-new/modules/http/http_protocol.c   2004-08-04 15:44:59.746159056 
-0400
@@ -1275,6 +1275,15 @@
 basic_http_header(r, bb, protocol);
 }
 
+AP_DECLARE(apr_status_t) ap_send_http_header(request_rec * const r)
+{
+/* (similar to ap_rflush() but returns (apr_status_t) instead of (int)) */
+apr_bucket_alloc_t * const bucket_alloc = r->connection->bucket_alloc;
+apr_bucket_brigade * const bb = apr_brigade_create(r->pool, bucket_alloc);
+APR_BRIGADE_INSERT_HEAD(bb, apr_bucket_flush_create(bucket_alloc));
+return ap_pass_brigade(r->output_filters, bb);
+}
+
 /* Navigator versions 2.x, 3.x and 4.0 betas up to and including 4.0b2
  * have a header parsing bug.  If the terminating \r\n occur starting
  * at offset 256, 257 or 258 of output then it will not properly parse