Re: Inclusion of mpm-itk into HEAD

2007-06-27 Thread Rici Lake


On 27-Jun-07, at 1:01 PM, Joshua Slive wrote:


On 6/27/07, Nick Kew [EMAIL PROTECTED] wrote:


This is a problem that could be solved by documentation.
Maybe not quite as simple, but when the alternative is accepting
new connections whilst running as root.


Here's a start:
http://wiki.apache.org/httpd/Recipes/Privilege_Separation



I just added a section to:

http://wiki.apache.org/httpd/Recipes/ 
Different_UserIDs_Using_Reverse_Proxy


which could also use a bit of polishing. It uses mod_rewrite to do the  
proxying, rather than virtual hosts, which makes it possible to avoid

having to change the proxy server's configuration when adding or
deleting user servers.

If the user servers are listening on high ports, then they can be
started as the user/group rather than as root, and the owner could
have quite a bit of flexibility in configuring their server. It's
quite possible that less reliance on .htaccess files would actually
compensate for the additional cost of running multiple server
instances.



Re: Inclusion of mpm-itk into HEAD

2007-06-27 Thread Rici Lake


On 27-Jun-07, at 6:07 PM, Joshua Slive wrote:


Good point. I moved some of this discussion into its own section,
since it applies equally to the main example.


Yes, that's much better.


I also removed your comments about needing separate LockFile/etc
locations, since its not true in recent versions. (These files are
created with the pid of the parent process appended to ensure they are
unique.)


I actually tested that config file, and one of the stupid mistakes
I made was not putting a PidFile directive into each config. As
far as I can see, the ScoreBoardFile also needs a unique name;
only the lock file has the pid appended. Also, I had to manually
set UseCanonicalName Off, even though the manual says that's the
2.2 default. (This is a 2.2.4 installation from the FreeBSD port,
although I didn't set up a jailed test.)

In any event, if you're not starting the user servers as root,
which seems like the most secure setup, then it's quite possible
that they will not have write access to the /var subdirectories.
If they were started in a (chroot) jail then none of that would
be an issue.

The other issues I ran into were fairly minor: I had to map the
various modules and the mime.types file into places where the
user servers could see them.

I personally think this is quite a practical solution for
mass virtual hosting, but it would take a bit more work to
document and test.

It might well be useful to have a sort of kiosk version of
apache httpd, in which particular settings (for example, the
listening interfaces and some of the mpm tuning directives)
were locked down in some fashion. (Not everyone is going to
want to use FreeBSD jails.) The goal would be to allow the
user to directly write their own httpd configuration, thus
avoiding the need to use .htaccess files.



Re: [mod_wombat] UserData Metatables

2007-05-06 Thread Rici Lake
Brian McCallister wrote:
 I am leaning towards changing all the userdata types (request_rec,
 server_rec, conn_rec, apr_table_t) in mod_wombat to have anonymous
 metatables rather than the current setup where they share a metatable.


+1

 The drawback is that we lose the ~type safety of being able to use
 luaL_checkuserdata(...) to  pop userdata with a specific metatable
 out of the stack.

There are other ways of doing this. One good way is to keep
the metatable as the environment of the instance methods
(and, if necessary, of the metamethods). Then the check
is really simple (and quite a lot faster than luaL_checkuserdata).

Code (from memory, I don't have my computer with me):

static Foo* check_self (lua_State *L) {
  if (!lua_getmetatable(L, 1) || !lua_rawequal(L, 1, LUA_ENVIRONINDEX)) {
luaL_typerror(L, 1, Foo);
  }
  lua_pop(L, 1);
  return lua_touserdata(L, 1);
}


The self-test is the most common, but it is often also
desirable to check individual parameters for specific
types, either prescriptively or polymorphically. This
can be done safely by keeping the type's name as
a field in the metatable, say __typename, and also
using the luaL_newmetatable mechanism in parallel.
luaL_newmetable just stores the metatable in the
registry with the type name as a key, so we can
do the same thing also putting the type name as
a key in the metatable:

int my_newmetatable (lua_State *L, const char* typename) {
  lua_pushstring(L, typename);
  lua_pushvalue(L, -1);  /* dup the name */
  lua_gettable(L, LUA_REGISTRYINDEX);
/* ... name meta-or-nil */
  switch (lua_type(L, -1)) {
default: {
  return luaL_error(L, Invalid registration of %s: not a table,
typename);
}
case LUA_TTABLE: {
  /* Make sure it satisifies the constraint */
  lua_getfield(L, -1, __typename);
  if (!lua_rawequal(L, -1, -3))
luaL_error(L, Invalid registration of %s: typename not set,
typename);
  /* Everything's ok, fix the stack up */
  lua_pop(L, 1);
  lua_replace(L, -2);
  return 0;
}
case LUA_TNIL: {
  /* Need a new table */
  lua_pop(L, 1):  /* S: ... name */
  lua_createtable(L, 0, 4);  /* S: ... name meta */
  lua_insert(L, -2);  /* S: ... meta name */
  lua_pushvalue(L, -1): /* S: ... meta name name */
  lua_setfield(L, -3, __typename);  /* S: ... meta name */
  lua_pushvalue(L, -2);  /* S: ... meta name meta */
  lua_settable(L, LUA_REGISTRYINDEX);  /* S: ... meta */
  return 1;
}
  }
}


Now, we can verify that a metatable is correct by getting the
__typename field out of it, and making sure that the
registered typename is that metatable:

/* Only use this with arguments (positive index); see
 * lauxlib for how to normalize the index if necessary
 *
 * Returns true if the indicated argument is some
 * valid type; in that case, leaves the typename on
 * top of the stack. Otherwise, returns 0 and does
 * not change the stack
 */
int *my_checksometype (lua_State *L, int argindex) {
  if (lua_getmetatable(L, argindex)) {
lua_getfield(L, -1, __typename);  /* ... meta name */
lua_pushvalue(L, -1);  /* ... meta name name */
lua_rawget(L, LUA_REGISTRYINDEX);  /* ... meta name registered-table */
if (lua_rawequal(L, -1, -3)) {
  /* OK, fix the stack */
  lua_pop(L, 1);
  lua_replace(L, -2);
  return 1;
}
lua_pop(L, 3);  /* restore the stack */
  }
  return 0;  /* Not ours */
}

This leaves the typename on top of the stack for possible
polymorphism. If we're just interested in a single type,
we can just check with strcmp:

static void* my_checktype (lua_State *L, int argindex, const char
*typename) {
  if (my_checksometype(L, argindex)
   0 == strcmp(lua_tostring(L, -1), typename)) {
lua_pop(L, 1);  /* Ditch the typename */
return lua_touserdata(L, argindex);
  }
  luaL_typerror(L, argindex, typename);
  return NULL;  /* Never reached */
}

A simple way to do polymorphism would be to just
run through the possibilities, but if there are a lot of
them, it's also possible to use a dispatch table (i.e.
a Lua table) kept as an upvalue, with the typename as a key.

/* See example below.
int my_dispatch (lua_State *L, int argindex, int vindex, const char *msg) {
  const luaL_Reg *entry;
  if (!my_checksometype(L, argindex)) {
/* In case we want to dispatch on lua types */
if (lua_isnoneornil(L, argindex))
  lua_pushliteral(L, nil);
else
  lua_pushstring(L, lua_typename(L, lua_type(L, argindex)));
  }
  /* Look it up in the dispatch table */
  lua_rawget(L, vindex);
  entry = lua_touserdata(L, -1);
  if (entry) {
/* Got it, just call it */
lua_pop(L, 1);
return entry-func(L);
  }
  return luaL_typerror(L, argindex, msg);
}

To use this, we need to set up a dispatch table, and
attach it to a method. A helper function comes in
handy. This function expects the stack to end with
the method table followed by 'nups' upvalues, and
puts the dispatch table as upvalue 

Re: [PATCH] mod_wombat: add table_get and table_set

2007-05-04 Thread Rici Lake
A few comments intermingled into the patch:

Brian McCallister wrote:
 On Apr 30, 2007, at 2:02 PM, Akins, Brian wrote:
 --- apr_lua.c (revision 0)
 +++ apr_lua.c (revision 0)
 @@ -0,0 +1,55 @@
 +#include apr.h
 +#include apr_tables.h
 +
 +#include lua.h
 +#include lauxlib.h
 +#include lualib.h
 +
 +#define lua_unboxpointer(L,i)  (*(void **)(lua_touserdata(L, i)))
 +
 +static apr_table_t* check_apr_table(lua_State* L, int index) {
 +luaL_checkudata(L, index, Apr.Table);
 +apr_table_t* t = (apr_table_t*)lua_unboxpointer(L, index);
 +return t;
 +}
 +
 +static int lua_table_set(lua_State* L) {
 +apr_table_t *t = check_apr_table(L, 1);
 +const char* key = luaL_checkstring(L, 2);
 +const char* val = luaL_checkstring(L, 3);
 +
 +apr_table_set(t, key, val);

In Lua, setting to nil is equivalent to deletion. So I think this should be:

  if (lua_isnoneornil(L, 3)
apr_table_unset(t, key);
  else {
const char *val = luaL_checkstring(L, 3);
apr_table_set(t, key, val);
  }
 +return 0;
 +}
 +
 +static int lua_table_get(lua_State* L) {
 +apr_table_t *t = check_apr_table(L, 1);
 +const char* key = luaL_checkstring(L, 2);
 +const char *val = apr_table_get(t, key);

val might be NULL; the function should return nil in that case:

  if (val == NULL)
lua_pushnil(L);
  else
lua_pushstring(L, val);

 +lua_pushstring(L, val);
 +return 1;
 +}
 +
 +static const luaL_reg lua_table_methods[] = {
 +{set, lua_table_set},
 +{get, lua_table_get},
 +{0, 0}
 +};

 Even though these are static, we might want to be careful in naming
 as these are reaching into lua's namespace (lua_* and luaL_*).

Agreed. Also, it's misleading -- they are APR tables, not Lua tables.


 +
 +
 +int apr_lua_init(lua_State *L, apr_pool_t *p) {
 +luaL_newmetatable(L, Apr.Table);
 +luaL_openlib(L, apr_table, lua_table_methods, 0);
 +lua_pushstring(L, __index);
 +lua_pushstring(L, get);
 +lua_gettable(L, 2);
 +lua_settable(L, 1);
 +
 +lua_pushstring(L, __newindex);
 +lua_pushstring(L, set);
 +lua_gettable(L, 2);
 +lua_settable(L, 1);
 +
 +return 0;
 +}


This is poor Lua style. You shouldn't assume (or require) the stack to
be empty at the start of the function. Use top-relative (negative) indices
instead -- they are no slower.

Also use lua_{get,set}field for clarity, and luaL_register (luaL_openlib
is deprecated).

  luaL_register(L, apr_table, lua_table_methods);
  luaL_newmetatable(L, Apr.Table);
  lua_getfield(L, -2, get);
  lua_setfield(L, -2, __index);
  lua_getfield(L, -2, set);
  lua_setfield(L, -2, __newindex);






Re: SSL downloads faster than non SSL?

2005-08-03 Thread Rici Lake


On 3-Aug-05, at 8:11 AM, William A. Rowe, Jr. wrote:


In the APR library, yes, we translate 'apr_sendfile' to TransmitFile()
on win32.  Some other magic occurs to obtain a file handle which can
be passed to TransmitFile.  But there are enough flaws in the TF() api
that perhaps this would be better defaulted to 'off'.



+1



Re: How can shared modules parse the configuration tree(ap_conftree)???

2005-06-27 Thread Rici Lake


On 27-Jun-05, at 4:47 AM, luca regini wrote:

I see that the symbols ap_conftree and ap_top_module are not defined 
in any header but only in the source code. Mod_info uses this symbols.
I need to do something similar to what mod_info does but in a shared 
module. Is there a way to do it?? The fact that ap_conftree and 
ap_top_module are not defined in any header does mean that the API is 
not stable and probably will change???

 
I need to parse the configuration tree because i would like to chance 
programmatically the value of a cookie in an input filter. However the 
right value for the cookie is directory - dependent  and per directory 
configuration is not available in an input filter. So my idea whas to 
desume the right value parsing the per directory directives i was 
interested in at module startup.


That's a really bad idea. The fact that it is difficult should be a 
good hint that it is contrary to the design of Apache.


I gather the problem you're running into is that the headers are read 
before the configuration merge is done (for reasons I think are pretty 
obvious). However, cookies are certainly not used until the 
configuration merge is done. So why not do whatever you want to do with 
the cookies in a hook which runs at an appropriate time? There are lots 
to choose from. The input filter could put whatever information it 
wanted to pass to the hook in the module's per-request structure.




Re: 2.1.5 available for testing

2005-06-20 Thread Rici Lake

On 20-Jun-05, at 8:55 AM, jean-frederic clere wrote:


and GET with content-length.


I think that is not forbidden in the rfc...
But what about returning HTTP_BAD_REQUEST if Content-Length is not 0?


I don't believe rfc2616 forbids GET requests to have entity-bodies, so 
it could be argued that rejecting such a request a priori is 
non-conformant. Obviously, if the handler for the request does not 
expect an entity-body, which would be the case in all the handlers I've 
seen, the request ought to be rejected.


On the other hand, from a pragmatic point of view, proxying GETs with 
entity-bodies to unknown downstream servers may cause them to 
misbehave. So there is a good argument for mod_proxy to reject such 
requests unless specifically instructed otherwise.




Re: Multiple AAA providers

2005-05-27 Thread Rici Lake


On 27-May-05, at 10:53 AM, Jess Holle wrote:


 Russell Howe wrote:Jess Holle wrote:



Is there any remaining/ongoing interest in this development area?

The need to authenticate a single resource against multiple disparate
(non-failover/non-redundant) LDAP servers looms large and I'd like to
think that this would be part of Apache 2.2 soon...  [I'd rather not
have to hack this in in a narrow, special-cased, hackish way 
myself...]


I have a JAAS LoginModule which I wrote for Jetty that does exactly 
this

(if I understand what you mean, that is :).

At work, I have our website authentication first checking OpenLDAP, 
then

falling back to Win2k Active Directory.




 [By disparate/non-failover/non-redundant, I mean that each LDAP would 
be checked for a given user until that user entry was found (at which 
point no other LDAPs would be checked for the given user regardless of 
the success/failure of the bind).  This differs from strictly failover 
LDAPs wherein Apache keeps trying to contact LDAP URLs until it finds 
one that responds (is up) and then just uses that one as the LDAP -- 
we have that now but it does not help in these use cases.]

I want to be able to do the same from Apache, and am pretty tempted to
start coding up a module to do it.




 That would be a great grand unified theory (and I see it as useful) 
but what I care most about is multiple LDAPs.  If we could just have 
the existing mod_auth_ldap handle multiple LDAPs (beyond in a strict 
failover capacity) that would be *huge*.  If we can't get the grand 
unified approach, I'd at least like to see multiple LDAP handling.


I'm very interested in implementing this myself. To make what I'm doing 
more generally useful, I'd like to know what people expect from the 
implementation of Require after a multiple LDAP search. Should you be 
able to put the ldap server name in a Require? Or are you only 
concerned with require valid-user?




Re: [1.3 PATCH] rework suppress-error-charset feature

2005-05-02 Thread Rici Lake
On 1-May-05, at 7:04 PM, Jeff Trawick wrote:
I found a description of the problem:
When Apache issues a redirect in response to a client request, the 
response
includes some actual text to be displayed in case the client can't 
(or doesn't)
automatically follow the redirection. Apache ordinarily labels this 
text according
to the character set which it uses, which is ISO-8859-1.

However, if the redirection is to a page that uses a different 
character set, some
broken browser versions will try to use the character set from the 
redirection text
rather than the actual page. This can result in Greek, for instance, 
being
incorrectly rendered.
If the action is required to compensate for a browser bug, wouldn't it 
be better to leave it as an environment variable and set it with 
BrowserMatch?



Re: Proposed patch: always cleanup brigades in ap_pass_brigade

2005-04-26 Thread Rici Lake
On 26-Apr-05, at 5:28 AM, Joe Orton wrote:
This is really just a should we provide extra API guarantees to 
prevent
users shooting themselves in the foot debate, right?  Let's agree to
disagree :)
Fair enough. I guess where I'm coming from is that I'm a user of the
API, and my feet have been shot at enough times already :)
In any event, a draft docstring. This does not reflect my view, and I
don't know if there is actually a consensus view, but it may reflect
the current state of affairs:
It got linewrapped (format=flowed?) but otherwise I think this looks
good - thanks a lot.
Sorry about that. The first time I saw RFC 2646, I checked the date to 
see whether it was a prank. Sadly, no -- it's standards track. And the 
problem with standards is that people *will* go and implement them, no 
matter how dumb they might be...

Final chance for any dissenter to object? Else
I'll commit this...
My final thoughts: We've agreed, I think, on the following:
1) The caller retains ownership of the brigade. The callee *must not* 
destroy it.

2) The caller has made a contract that every bucket in the brigade will 
remain valid until the callee returns. It does not guarantee anything 
beyond that, so the callee *must* explicitly setaside any buckets it 
wants to retain.

3) The callee may manipulate the contents of the brigade as it sees 
fit, during the span of the call. That is, it may insert and remove 
buckets, pass the brigade to another filter, read buckets (possibly 
morphing the brigade), etc. However, if it removes a bucket, that 
bucket becomes its responsibility.

I hope that's clear from the docstring I wrote.
So the only point of disagreement is the state of the brigade on return.
Number 3 above does not permit any semantics to be attached to the 
state of the brigade on return; any such semantics would have to be a 
private agreement between caller and callee, and the filter mechanism 
does not lend itself to such private agreements because it is not easy 
to guarantee that no other filter intervenes in the filter chain. If 
someone wanted to stack their own filters in this manner, they could 
just call the filter function directly rather than using 
ap_pass_brigade.

So the current state of play is: the callee MAY empty the brigade; the 
caller MUST cleanup the brigade if it wishes to reuse it. This leads to 
precisely the duplication of calls to apr_brigade_cleanup that people 
seem to be objecting to.

Better in that case would be that the callee MUST empty the brigade 
(and, in accordance with (2) and (3) above, either delete or setaside 
the buckets it removes.) In that case, the caller need not cleanup the 
brigade; it is permitted to assume that it is empty on return and need 
not waste cycles on a pointless apr_brigade_cleanup.

If there is any chance of agreement on this, I can produce a modified 
docstring on a moment's notice. :)

Rici


Moving on to input filters :)

2005-04-26 Thread Rici Lake
The docstring in util_filter.h for ap_get_brigade says:
 * Get the current bucket brigade from the next filter on the filter
 * stack.  The filter returns an apr_status_t value.  If the bottom-most
 * filter doesn't read from the network, then ::AP_NOBODY_READ is 
returned.
 * The bucket brigade will be empty when there is nothing left to get.

The best reference for the behaviour of ap_get_brigade would appear to 
be core_input_filter. Its behaviour differs from this docstring; in 
particular, end of input is (sometimes) signalled with an eos bucket 
being returned in the brigade.

I propose clarifying the last line of that docstring:
* If block is APR_NONBLOCK_READ and no data is currently available,
* the brigade will be empty on return, and the function will return
* APR_SUCCESS.
*
* If no more data is available, the brigade will contain an eos bucket
* on return, and the function will return APR_SUCCESS. A subsequent
* call to ap_get_brigade will return APR_EOF without altering the
* brigade.
*
* In no case will APR_EAGAIN be returned. Other error conditions may
* be returned, with or without data in the brigade.
*
* Input filters are expected to conform to the above interface.
The above is not quite the behaviour of core_input_filter, but I think
it could be with a few minor tweaks, none of which will cost cycles.
As I read the code, it behaves as follows;
In MODE_READBYTES:
With APR_BLOCK_READ, on EOF the filter will first return
  a brigade containing an EOS bucket, with status APR_SUCCESS.
  A subsequent call to the filter will return an empty
  brigade with status APR_EOF. It should not otherwise
  return an empty brigade.
With APR_NONBLOCK_READ, an empty brigade will be returned with
  status APR_SUCCESS if the the socket has no data ready OR
  if an EOF was encountered. There appears to be no way to
  distinguish between these cases other than by doing a blocking
  read. (Note 1)
In MODE_EXHAUSTIVE:
  regardless of read blocking, the socket bucket is simply
  appended to the returned brigade, followed by an EOS bucket.
  APR_SUCCESS is always returned.
In MODE_GETLINE and MODE_SPECULATIVE:
With APR_BLOCK_READ, on EOF the filter will return an empty
  brigade with status APR_SUCCESS. A subsequent call to the filter
  will return an empty brigade with status APR_EOF. (Notes 2 and 3)
With APR_NONBLOCK_READ, the filter will behave as in MODE_READBYTES.
Note 1:
  I believe this behaviour to be incorrect. Line 285 of
  server/core_filters.c is:
else if (block == APR_BLOCK_READ  len == 0) {
  At this point, it has already checked for APR_EAGAIN and
  error status returns. Consequently, a zero-length read in
  either blocking or non-blocking mode ought to be an EOF.
  I think this should be changed to:
else if (len == 0) {
  which would insert the EOS bucket in the case of a non-blocking
  get_brigade. However, I don't know if it would confuse consumers
  who were not expecting it.
  It would actually probably be simpler to insert the eos bucket
  in the ctx-b brigade when it is built, rather than inserting
  it in response to various eof-like conditions later on in the
  code.
Note 2:
  In MODE_GETLINE, no attempt is made to insert an eos bucket in
  either blocking or non-blocking mode; in the case of EOF, I
  believe that the socket bucket will remove itself, leading to
  an eventual return of APR_EOF. This seems inconsistent with
  MODE_READBYTES.
Note 3:
  In MODE_SPECULATIVE, the eos bucket is deliberately not created.
  A comment says that the next will return an EOS bucket, but
  as far as I can see, once ctx-b has been emptied, as it will
  have been by the call to apr_bucket_delete at line 294, the
  next call to the filter will return an empty bucket with
  status APR_EOF.
Finally:
  I think there is a possibility that the socket bucket's read
  function will lose data, if the underlying call to apr_socket_recv
  returns a non-zero-length buffer along with a non-APR_SUCCESS
  status. However, this may not actually ever happen. It would
  only really be serious in the case of an APR_EAGAIN return.


Re: Moved the ldap connection timeout call (was: Re: svn commit: r160707 )

2005-04-26 Thread Rici Lake
On 26-Apr-05, at 6:58 PM, Brad Nicholes wrote:
  I just checked in a patch to move the ldap_set_option() call from 
trying to set the connection timeout globally to ldap connection 
specific.  I don't have a good way to test that it fixes the problem, 
but if somebody can verify the fix, I will propose it for backport.
You know that there is a bug in the OpenLDAP SDK up to about 2.2.21 
which causes OpenLDAP to crash when the global connection timeout is 
set (ITS #3487). It ends up doing a double free because it's copying 
pointers instead of contents in the global config structure. I've had 
to explicitly set LDAPConnectionTimeout -1 in order to prevent the 
default timeout being applied, on a couple of installations where I 
couldn't upgrade the SDK.




Re: RFC: Who owns a passed brigade?

2005-04-25 Thread Rici Lake
On 25-Apr-05, at 10:15 AM, Cliff Woolley wrote:
4. switch to allocate brigades out of the bucket allocator
Didn't we try #4 before and something bad happened?  I don't remember
what that would have been, I just feel certain #4 has been previously
discussed at least.
I think the something bad would be the buildup of brigades in the 
connection pool, if some filter continues to allocate a new brigade for 
every invocation.

Also, if a brigade was not emptied and the bucket resources needed to 
be freed (eg. heap buckets), those would build up as well.

Allocating the brigade out of the bucket allocator but continuing to 
register the cleanup with a pool (say, the request pool if that were 
appropriate) might work but it would be fragile.

The heated discussion at 
http://marc.theaimsgroup.com/?t=10403978553r=2w=2 is probably 
relevant.
 



Re: RFC: Who owns a passed brigade?

2005-04-25 Thread Rici Lake
On 25-Apr-05, at 8:29 AM, Joe Orton wrote:
I think it's fine to leave the *contents* of the brigade as undefined
after a call to ap_pass_brigade.  (that's essentially the case now 
since
it's not explicitly defined to be empty: I don't know of any
particular bugs which have been caused by this?)
Undefined contents make me nervous. It is very easy to define the
results: just do a cleanup in ap_pass_brigade. (This will also
protect against some resource leaks in failed error handling, for
example.)
I don't know of any particular bugs which have been caused by this
either, but it would be very easy to create one. Take a filter which
expects the brigade to be emptied and feed it into a filter which
does not empty the brigade. Then watch the output quadratically
expand.
I don't believe there are any such bugs in the distribution,
because the only output filter in the distribution that I can
find which does not empty its brigade is mod_case_filter.c
(see below). Furthermore, most of the interesting filters
in the distribution actually create a new brigade for every
invocation, which I think we can all agree is a bad idea.
Some filters, such as mod_deflate.c, do expect the brigade
to be cleaned. In this case, it is unlikely that a bug will
be noticed, because few third-party filters will get inserted
after mod_deflate.c.
If filters were modified to always use the same bucket brigade
for communication with downstream filters, you might start to
see this error with third-party filters which don't clean
their input brigade, unless every filter religiously calls
ap_brigade_cleanup after ap_pass_brigade. But if you were
going to do that, you might as well put the call to
ap_brigade_cleanup in ap_pass_brigade. The cost of an
unnecessary call is minimal, and it simplifies the api.
Note: I honestly believe mod_case_filter.c should be removed
from the distribution, unless someone is willing to rewrite
it as a proper model of how to write an output filter. It is not
useful as a module, and it's only use in the distribution is to
serve as an example for module authors, which it definitely is not.


Re: RFC: Who owns a passed brigade?

2005-04-25 Thread Rici Lake
On 25-Apr-05, at 12:05 PM, Joe Orton wrote:
Fragile, why?  That's exactly the right approach as far as I can see.
Because the pool might have a lifetime *longer than* the bucket 
allocator. OK, that's not likely in the current architecture, so it 
would be a concern only for non-httpd applications. That was a 
different heated debate, I think :)




Proposed patch: always cleanup brigades in ap_pass_brigade

2005-04-25 Thread Rici Lake
Regardless of any other changes to the brigade API, this seems to me to 
be a good idea:

Index: util_filter.c
===
--- util_filter.c   (revision 158730)
+++ util_filter.c   (working copy)
@@ -500,6 +500,7 @@
 AP_DECLARE(apr_status_t) ap_pass_brigade(ap_filter_t *next,
  apr_bucket_brigade *bb)
 {
+apr_status_t result = AP_NOBODY_WROTE;
 if (next) {
 apr_bucket *e;
 if ((e = APR_BRIGADE_LAST(bb))  APR_BUCKET_IS_EOS(e)  
next-r) {
@@ -523,9 +524,10 @@
 }
 }
 }
-return next-frec-filter_func.out_func(next, bb);
+result = next-frec-filter_func.out_func(next, bb);
+apr_brigade_cleanup(bb);
 }
-return AP_NOBODY_WROTE;
+return result;
 }

 AP_DECLARE(apr_status_t) ap_save_brigade(ap_filter_t *f,

Index: util_filter.h
===
--- util_filter.h   (revision 158730)
+++ util_filter.h   (working copy)
@@ -299,7 +299,8 @@
  * Pass the current bucket brigade down to the next filter on the 
filter
  * stack.  The filter returns an apr_status_t value.  If the 
bottom-most
  * filter doesn't write to the network, then ::AP_NOBODY_WROTE is 
returned.
- * The caller relinquishes ownership of the brigade.
+ * The caller retains ownership of the brigade. The brigade will be
+ * empty on return.
  * @param filter The next filter in the chain
  * @param bucket The current bucket brigade
  */



For Review: mod_ext_filter.c

2005-04-25 Thread Rici Lake
Patch to use a single temporary brigade instead of creating a new one 
on each invocation:
(also does a little code reuse)

Index: mod_ext_filter.c
===
--- mod_ext_filter.c(revision 158730)
+++ mod_ext_filter.c(working copy)
@@ -70,6 +70,7 @@
 #if APR_FILES_AS_SOCKETS
 apr_pollset_t *pollset;
 #endif
+apr_bucket_brigade *bb_tmp;
 } ef_ctx_t;
 module AP_MODULE_DECLARE_DATA ext_filter_module;
@@ -614,6 +615,12 @@
 }
 }
+/*
+ * Create a temporary brigade for output filter. This is not always
+ * used, but it's easier to just have it.
+ */
+ctx-bb_tmp = apr_brigade_create(f-r-pool, f-c-bucket_alloc);
+
 if (dc-debug = DBGLVL_SHOWOPTIONS) {
 ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, f-r,
   %sfiltering `%s' of type `%s' through `%s', cfg 
%s,
@@ -747,10 +754,8 @@
 apr_status_t rv;
 char buf[4096];
 apr_bucket *eos = NULL;
-apr_bucket_brigade *bb_tmp;

 dc = ctx-dc;
-bb_tmp = apr_brigade_create(r-pool, c-bucket_alloc);
 for (b = APR_BRIGADE_FIRST(bb);
  b != APR_BRIGADE_SENTINEL(bb);
@@ -769,15 +774,14 @@
 /* Good cast, we just tested len isn't negative */
 if (len  0 
-(rv = pass_data_to_filter(f, data, (apr_size_t)len, 
bb_tmp))
+(rv = pass_data_to_filter(f, data, (apr_size_t)len, 
ctx-bb_tmp))
 != APR_SUCCESS) {
 return rv;
 }
 }

 apr_brigade_cleanup(bb);
-APR_BRIGADE_CONCAT(bb, bb_tmp);
-apr_brigade_destroy(bb_tmp);
+APR_BRIGADE_CONCAT(bb, ctx-bb_tmp);
 if (eos) {
 /* close the child's stdin to signal that no more data is 
coming;
@@ -800,30 +804,12 @@
 }
 }

-do {
-len = sizeof(buf);
-rv = apr_file_read(ctx-proc-out,
-  buf,
-  len);
-if ((rv  !APR_STATUS_IS_EOF(rv)  
!APR_STATUS_IS_EAGAIN(rv)) ||
-dc-debug = DBGLVL_GORY) {
-ap_log_rerror(APLOG_MARK, APLOG_DEBUG, rv, r,
-  apr_file_read(child output), len % 
APR_SIZE_T_FMT,
-  !rv ? len : -1);
-}
-if (APR_STATUS_IS_EAGAIN(rv)) {
-if (eos) {
-/* should not occur, because we have an APR timeout in 
place */
-AP_DEBUG_ASSERT(1 != 1);
-}
-return APR_SUCCESS;
-}
-
-if (rv == APR_SUCCESS) {
-b = apr_bucket_heap_create(buf, len, NULL, 
c-bucket_alloc);
-APR_BRIGADE_INSERT_TAIL(bb, b);
-}
-} while (rv == APR_SUCCESS);
+rv = drain_available_output(f, bb);
+if (APR_STATUS_IS_EAGAIN(rv)) {
+AP_DEBUG_ASSERT(!eos);
+/* should not occur, because we have an APR timeout in place */
+return APR_SUCCESS;
+}

 if (!APR_STATUS_IS_EOF(rv)) {
 return rv;


For review: mod_include.c (only initialize once)

2005-04-25 Thread Rici Lake
I didn't reindent so that the patch would be clearer.
Index: mod_include.c
===
--- mod_include.c   (revision 158730)
+++ mod_include.c   (working copy)
@@ -3542,7 +3542,6 @@
 intern-end_seq_len = strlen(intern-end_seq);
 intern-undefined_echo = conf-undefined_echo;
 intern-undefined_echo_len = strlen(conf-undefined_echo);
-}
 if ((parent = ap_get_module_config(r-request_config, 
include_module))) {
 /* Kludge --- for nested includes, we want to keep the 
subprocess
@@ -3599,6 +3598,7 @@
 apr_table_setn(r-subprocess_env, QUERY_STRING_UNESCAPED,
   ap_escape_shell_cmd(r-pool, arg_copy));
 }
+}

 return send_parsed_content(f, b);
 }


Re: Proposed patch: always cleanup brigades in ap_pass_brigade

2005-04-25 Thread Rici Lake
On 25-Apr-05, at 1:30 PM, Justin Erenkrantz wrote:
--On Monday, April 25, 2005 12:47 PM -0500 Rici Lake 
[EMAIL PROTECTED] wrote:

Regardless of any other changes to the brigade API, this seems to me 
to
be a good idea:

Index: util_filter.c
===
--- util_filter.c   (revision 158730)
+++ util_filter.c   (working copy)
@@ -500,6 +500,7 @@
  AP_DECLARE(apr_status_t) ap_pass_brigade(ap_filter_t *next,
   apr_bucket_brigade *bb)
  {
+apr_status_t result = AP_NOBODY_WROTE;
  if (next) {
  apr_bucket *e;
  if ((e = APR_BRIGADE_LAST(bb))  APR_BUCKET_IS_EOS(e) 
next-r) {
@@ -523,9 +524,10 @@
  }
  }
  }
-return next-frec-filter_func.out_func(next, bb);
+result = next-frec-filter_func.out_func(next, bb);
+apr_brigade_cleanup(bb);
  }
-return AP_NOBODY_WROTE;
+return result;
  }
This introduces a really large overhead as we'd be calling the cleanup 
for *every* output filter we invoke.  So, -0..  -- justin
Can you quantify that large overhead? In the case of a 'compliant' 
filter, it consists of:
-- a function call
-- an if (!APR_BRIGADE_EMPTY) which fails
-- a return
-- and the loss of a possible tailcall optimization.

I can't imagine that all that amounts to as much as 100 cycles; if 
there were 20 filters in the chain and that were called every 4k of 
output (and both of those figures are wildly conservative), that would 
amount to half a cycle per byte. If you wanted to save the function 
call, you could replace

+apr_brigade_cleanup(bb);
with an open coded version of cleanup.
The point is that it's fine to say somewhere filters must clean their 
input brigades, but what if one doesn't? That will create a bug which 
pops up occasionally in the interaction between two unrelated modules. 
Good interfaces don't have such easily eliminated gotchas, imho.

In any event, if everyone knows that ap_pass_brigade is going to 
cleanup the brigade, then other redundant calls to apr_brigade_cleanup 
can be removed.



Re: Proposed patch: always cleanup brigades in ap_pass_brigade

2005-04-25 Thread Rici Lake
On 25-Apr-05, at 3:36 PM, Justin Erenkrantz wrote:
--On Monday, April 25, 2005 3:06 PM -0500 Rici Lake 
[EMAIL PROTECTED] wrote:

Can you quantify that large overhead? In the case of a 'compliant'
filter, it consists of:
With Joe's suggestion for 'creator must destroy', this step is wholly 
unnecessary and only overcomplicates things.   -- justin
The creator must destroy the brigade when it's done with it -- if I 
understand Joe correctly.

Creating and destroying a brigade on every invocation of a filter is a 
lot more overhead than a few extraneous unfulfilled while loops -- even 
if the brigade is allocated in a bucket_allocator. Even without the 
buildup of pool memory for the brigade objects, creating a brigade 
requires the registration (and later deregistration, during 
brigade_destroy) of a pool cleanup function.

If we accept that the contents of a brigade are undefined when 
ap_pass_brigade returns, the caller has three options:
-- call cleanup and reuse the brigade
-- call destroy (which will first call cleanup)
-- drop it on the floor and let destroy be called by the cleanup 
function

In no case can it make any use of the brigade's contents (they're 
undefined), so cleanup must be called just after ap_pass_brigade 
returns.

If ap_pass_brigade does the call itself, it reduces the number of 
places in the code where cleanup is called, and possibly frees 
resources slightly earlier in the case where the brigade creator 
dropped the brigade on the floor. The cost is a few extraneous if 
(!APR_BRIGADE_EMPTY(bb)) statements (or moral equivalent). This is 
actually a macro which expands to a fairly simple pointer comparison.

That's hardly a complication -- it's a simplification (from the 
perspective of the caller of ap_pass_brigade. And it's hardly an 
enormous overhead.

/me continues the tradition of creating flames out of bucket brigades. 
You'd think it would be the reverse :)





For review: teach mod_include.c to recycle brigades

2005-04-25 Thread Rici Lake
Index: mod_include.c
===
--- mod_include.c   (revision 158730)
+++ mod_include.c   (working copy)
@@ -3060,7 +3060,7 @@
 struct ssi_internal_ctx *intern = ctx-intern;
 request_rec *r = f-r;
 apr_bucket *b = APR_BRIGADE_FIRST(bb);
-apr_bucket_brigade *pass_bb;
+apr_bucket_brigade *pass_bb = ctx-pass_bb;
 apr_status_t rv = APR_SUCCESS;
 char *magic; /* magic pointer for sentinel use */
@@ -3076,9 +3076,6 @@
 return ap_pass_brigade(f-next, bb);
 }
-/* All stuff passed along has to be put into that brigade */
-pass_bb = apr_brigade_create(ctx-pool, f-c-bucket_alloc);
-
 /* initialization for this loop */
 intern-bytes_read = 0;
 intern-error = 0;
@@ -3145,7 +3142,6 @@
 if (!APR_BRIGADE_EMPTY(pass_bb)) {
 rv = ap_pass_brigade(f-next, pass_bb);
 if (rv != APR_SUCCESS) {
-apr_brigade_destroy(pass_bb);
 return rv;
 }
 }
@@ -3170,7 +3166,7 @@
 }
 if (rv != APR_SUCCESS) {
-apr_brigade_destroy(pass_bb);
+apr_brigade_cleanup(pass_bb);
 return rv;
 }
@@ -3384,7 +3380,7 @@
 DEBUG_INIT(ctx, f, pass_bb);
 rv = handle_func(ctx, f, pass_bb);
 if (rv != APR_SUCCESS) {
-apr_brigade_destroy(pass_bb);
+apr_brigade_cleanup(pass_bb);
 return rv;
 }
 }
@@ -3450,13 +3446,11 @@
 /* if something's left over, pass it along */
 if (!APR_BRIGADE_EMPTY(pass_bb)) {
-rv = ap_pass_brigade(f-next, pass_bb);
+return ap_pass_brigade(f-next, pass_bb);
 }
 else {
-rv = APR_SUCCESS;
-apr_brigade_destroy(pass_bb);
+return APR_SUCCESS;
 }
-return rv;
 }
@@ -3542,8 +3536,11 @@
 intern-end_seq_len = strlen(intern-end_seq);
 intern-undefined_echo = conf-undefined_echo;
 intern-undefined_echo_len = strlen(conf-undefined_echo);
+
+/* Initialize the pass brigade */
+ctx-pass_bb = apr_brigade_create(f-r-pool, 
f-c-bucket_alloc);
 }

 if ((parent = ap_get_module_config(r-request_config, 
include_module))) {
 /* Kludge --- for nested includes, we want to keep the 
subprocess
  * environment of the base document (for compatibility); that 
means
@@ -3599,6 +3596,6 @@
 apr_table_setn(r-subprocess_env, QUERY_STRING_UNESCAPED,
   ap_escape_shell_cmd(r-pool, arg_copy));
 }

 return send_parsed_content(f, b);
 }
Index: mod_include.h
===
--- mod_include.h   (revision 158730)
+++ mod_include.h   (working copy)
@@ -85,6 +85,9 @@
 /* currently configured time format */
 const char  *time_str;
+/* bucket brigade for passing to next filter */
+apr_bucket_brigade *pass_bb;
+
 /* pointer to internal (non-public) data, don't touch */
 struct ssi_internal_ctx *intern;
 } include_ctx_t;


Pool buckets, transient buckets and bucket_setaside

2005-04-25 Thread Rici Lake
Why do pool buckets do an automatic setaside? Should they?
No other bucket type does this; consequently, if you want to hold on to 
a bucket beyond the scope of the call in which you received the 
brigade, you must explicitly call bucket_setaside.

Without examining the bucket type, a practice which should normally be 
discouraged, it is impossible to know whether a given bucket will 
auto-setaside or not; consequently, bucket_setaside should always be 
called (possibly via ap_save_brigade).

So it would seem that either all bucket types should do auto-setaside 
or that none of them should. It would undoubtedly be convenient if all 
buckets did this -- it would remove the necessity for ap_save_brigade 
-- but I believe there is an issue with the model.

Consider a pool bucket which is placed into an existing brigade. If 
this brigade is not explicitly cleaned up, which would defuse the pool 
bucket's setaside mechanism, then both the brigade and the pool 
bucket's cleanup functions will run when the pool is cleaned up. The 
bucket's cleanup function is likely to run first, however, since it is 
likely to be newer than the brigade it was put into. That will create 
an unnecessary heap copy of the bucket, which will then be free'd when 
the pool cleanup for the brigade runs, slightly later.

If brigades are (almost) always explicitly cleaned or destroyed, and 
buckets are always explicitly setaside, then what is the point of the 
overhead of registering auto-setaside functions for every pool bucket?

Conversely, is there a reasonable mechanism for avoiding the 
unnecessary setaside and allowing the removal of the requirement to 
explicitly setaside buckets? 



Re: For review: teach mod_include.c to recycle brigades

2005-04-25 Thread Rici Lake
On 25-Apr-05, at 4:27 PM, André Malo wrote:
* Rici Lake wrote:
+1 in general, but ...
...put it into the internal structure, please. That's I've created it 
for
(binary compat and only making stuff public what matters the public).

nd
--
Muschelflucht-Zusatzeinrichtung.
Shell-Escape ist ja noch klar, aber `Zusatzeinrichtung'?
extension?
Feature.  -- gefunden in de.org.ccc

Oops, sorry. Revised patch, includes both the bug fix and the recycling 
of bucket brigades, and still insufficiently tested. Also includes an 
explicit cleanup, while we continue thrashing out that issue. (nd -- do 
you have a test suite, by any chance?)

Index: mod_include.c
===
--- mod_include.c   (revision 158730)
+++ mod_include.c   (working copy)
@@ -173,6 +173,7 @@
 apr_size_tbytes_read;
 apr_bucket_brigade *tmp_bb;
+apr_bucket_brigade *pass_bb;
 request_rec  *r;
 const char   *start_seq;
@@ -3060,7 +3061,7 @@
 struct ssi_internal_ctx *intern = ctx-intern;
 request_rec *r = f-r;
 apr_bucket *b = APR_BRIGADE_FIRST(bb);
-apr_bucket_brigade *pass_bb;
+apr_bucket_brigade *pass_bb = intern-pass_bb;
 apr_status_t rv = APR_SUCCESS;
 char *magic; /* magic pointer for sentinel use */
@@ -3076,9 +3077,6 @@
 return ap_pass_brigade(f-next, bb);
 }
-/* All stuff passed along has to be put into that brigade */
-pass_bb = apr_brigade_create(ctx-pool, f-c-bucket_alloc);
-
 /* initialization for this loop */
 intern-bytes_read = 0;
 intern-error = 0;
@@ -3145,7 +3143,6 @@
 if (!APR_BRIGADE_EMPTY(pass_bb)) {
 rv = ap_pass_brigade(f-next, pass_bb);
 if (rv != APR_SUCCESS) {
-apr_brigade_destroy(pass_bb);
 return rv;
 }
 }
@@ -3170,7 +3167,7 @@
 }
 if (rv != APR_SUCCESS) {
-apr_brigade_destroy(pass_bb);
+apr_brigade_cleanup(pass_bb);
 return rv;
 }
@@ -3384,7 +3381,7 @@
 DEBUG_INIT(ctx, f, pass_bb);
 rv = handle_func(ctx, f, pass_bb);
 if (rv != APR_SUCCESS) {
-apr_brigade_destroy(pass_bb);
+apr_brigade_cleanup(pass_bb);
 return rv;
 }
 }
@@ -3450,13 +3447,14 @@
 /* if something's left over, pass it along */
 if (!APR_BRIGADE_EMPTY(pass_bb)) {
+/* return ap_pass_brigade(f-next, pass_bb); */
 rv = ap_pass_brigade(f-next, pass_bb);
+apr_brigade_cleanup(pass_bb);
+return rv;
 }
 else {
-rv = APR_SUCCESS;
-apr_brigade_destroy(pass_bb);
+return APR_SUCCESS;
 }
-return rv;
 }
@@ -3542,8 +3540,10 @@
 intern-end_seq_len = strlen(intern-end_seq);
 intern-undefined_echo = conf-undefined_echo;
 intern-undefined_echo_len = strlen(conf-undefined_echo);
-}
+/* Initialize the pass brigade */
+intern-pass_bb = apr_brigade_create(f-r-pool, 
f-c-bucket_alloc);
+
 if ((parent = ap_get_module_config(r-request_config, 
include_module))) {
 /* Kludge --- for nested includes, we want to keep the 
subprocess
  * environment of the base document (for compatibility); that 
means
@@ -3599,6 +3599,7 @@
 apr_table_setn(r-subprocess_env, QUERY_STRING_UNESCAPED,
   ap_escape_shell_cmd(r-pool, arg_copy));
 }
+}

 return send_parsed_content(f, b);
 }


Re: Pool buckets, transient buckets and bucket_setaside

2005-04-25 Thread Rici Lake
On 25-Apr-05, at 5:04 PM, Cliff Woolley wrote:
Why do pool buckets do an automatic setaside? Should they?
They set themselves aside when the pool in which their data was 
allocated
is in the process of being destroyed.  This is necessary because no 
other
bucket type has a similarly unpredictable lifetime.
Surely this is equally true of any pool-related resource. For example, 
a file which has been opened and has a pool cleanup registered would 
have exactly the same lifetime as a pool-allocated string.

Even transient
buckets have a lifetime that is well-known: they last until the call 
chain
returns.  Pool buckets might not even live that long.  Don't think 
just in
terms of httpd's usage of buckets here -- this is an APR construct.
I hadn't considered the case of a pool being released within the call 
chain :) But that would still apply equally to a file, no?

Is that actually likely, though, even leaving httpd aside? Or, to put 
it another way, if you're not using bucket brigades in the httpd style, 
at *exactly what* point must you set aside a bucket.

If there were any way to have transient buckets automatically set
themselves aside, then I would argue that that should be done, too.  Of
course there isn't, since there is no such thing as a callback that 
says
the call chain has returned now.  But at least as it stands now, we 
have
a minimum guaranteed time that all of the buckets' data will live -- as
long as the call chain lives.
Actually, you could achieve this very easily if a brigade had a cleanup 
list -- in other words, if it were a sort of degenerate pool. Then you 
could register the transient's auto-setaside function against the 
brigade's cleanup list, as long as you were prepared to ensure that 
brigade_cleanup were called on the brigade when the call chain 
returned. That's what got me started thinking about this in the first 
place.

I think that lifetime and datatype are probably independent aspects of 
a bucket, and the design really ought, in some idealistic long-term 
definition of 'ought', to be refactored accordingly. *Any* resource 
might be pool-'allocated', might be transient, might be immortal. The 
games played with file buckets and the pool parameter to 
bucket_setaside are a demonstration, I think. Taking that view, pool 
buckets are not particularly special, or maybe I'm missing something 
obvious.

If file and socket buckets (and other such bucket types) could be 
taught to do auto-setaside, then ap_save_brigade could disappear and 
you could actually retain a partial brigade between filter calls with a 
BRIGADE_CONCAT. That has a certain charm, but I have the niggling 
feeling that it wouldn't actually work out in practice, because of the 
timing of cleanup calls I referred to earlier.

If brigades are (almost) always explicitly cleaned or destroyed, and
buckets are always explicitly setaside, then what is the point of the
overhead of registering auto-setaside functions for every pool bucket?
If registering a pool cleanup that doesn't get called (on a bucket type
that is rarely used in httpd, no less) has a measurable performance
impact, I'd be extremely surprised.
Well, me too, but then I was surprised at how much the overhead of an 
if empty seems to matter. :) However, if they were used a lot, the 
overhead might actually be measurable. It wouldn't be registering the 
cleanup, so much as cancelling it -- registering a cleanup only 
requires snapping a new link onto a chain, but cancelling it involves a 
linear scan through the cleanups, which could easily turn quadratic.



Re: Proposed patch: always cleanup brigades in ap_pass_brigade

2005-04-25 Thread Rici Lake
On 25-Apr-05, at 4:26 PM, Joe Orton wrote:
On Mon, Apr 25, 2005 at 03:58:59PM -0500, Rici Lake wrote:
If we accept that the contents of a brigade are undefined when
ap_pass_brigade returns, the caller has three options:
-- call cleanup and reuse the brigade
-- call destroy (which will first call cleanup)
-- drop it on the floor and let destroy be called by the cleanup 
function

In no case can it make any use of the brigade's contents (they're
undefined), so cleanup must be called just after ap_pass_brigade
returns.
But in the latter two cases, adding a apr_brigade_cleanup() call to
ap_pass_brigade() is completely redundant.  To me, you need to make the
argument that the former case is so prevalent that it's worth an
additional API guarantee, and the additional (small) overhead in in
ap_pass_brigade.  I don't see it really...
I guess that depends on whether most filters use and recycle pass 
brigades. I suspect that most internal filters don't (and possibly have 
the model which Joe Schaefer describes, where the brigade is simply 
passed through), whereas most third-party filters probably either do or 
should. Having now looked at quite a few output filters, I've come to 
the conclusion that the normal model is to create and destroy brigades 
on every invocation, which I can't help but think is a lot less 
efficient than it could be.

But I actually disagree with the premise: the issue is not how 
prevalent the case is, it is how likely it is to create a bug which is 
triggered by an interaction between unrelated modules. Mitigating the 
creation of such bugs is, in my mind, an important part of component 
API design; particularly if the mitigation is low-cost. Of course, 
everyone has their own concept of beauty, and mine is probably the 
least important of anyone participating in this debate.

In any event, a draft docstring. This does not reflect my view, and I 
don't know if there is actually a consensus view, but it may reflect 
the current state of affairs:

Index: util_filter.h
===
--- util_filter.h   (revision 158730)
+++ util_filter.h   (working copy)
@@ -103,8 +103,8 @@
  * @name Filter callbacks
  *
  * This function type is used for filter callbacks. It will be passed a
- * pointer to this filter, and a bucket containing the content to 
be
- * filtered.
+ * pointer to this filter, and a bucket brigade containing the 
content
+ * to be filtered.
  *
  * In filter-ctx, the callback will find its context. This context is
  * provided here, so that a filter may be installed multiple times, 
each
@@ -120,10 +120,15 @@
  * or output filter chains and before any data is generated to allow 
the
  * filter to prepare for processing.
  *
- * The *bucket structure (and all those referenced by -next and 
-prev)
- * should be considered const. The filter is allowed to modify the
- * next/prev to insert/remove/replace elements in the bucket list, but
- * the types and values of the individual buckets should not be 
altered.
+ * The bucket brigade always belongs to the caller, but the filter
+ * is free to use the buckets within it as it sees fit. Normally,
+ * the brigade will be returned empty. Buckets *may not* be retained
+ * between successive calls to the filter unless they have been
+ * set aside with a call apr_bucket_setaside. Typically this will
+ * be done with ap_save_brigade(). Buckets removed from the brigade
+ * become the responsibility of the filter, which must arrange for
+ * them to be deleted, either by doing so directly or by inserting
+ * them in a brigade which will subsequently be destroyed.
  *
  * For the input and output filters, the return value of a filter 
should be
  * an APR status value.  For the init function, the return value should
@@ -299,9 +304,13 @@
  * Pass the current bucket brigade down to the next filter on the 
filter
  * stack.  The filter returns an apr_status_t value.  If the 
bottom-most
  * filter doesn't write to the network, then ::AP_NOBODY_WROTE is 
returned.
- * The caller relinquishes ownership of the brigade.
  * @param filter The next filter in the chain
  * @param bucket The current bucket brigade
+ *
+ * @remark Ownership of the brigade is retained by the caller. On 
return,
+ * the contents of the brigade are UNDEFINED, and the caller 
must
+ * either call apr_brigade_cleanup or apr_brigade_destroy on
+ * the brigade.
  */
 AP_DECLARE(apr_status_t) ap_pass_brigade(ap_filter_t *filter,
  apr_bucket_brigade *bucket);



Module code review (long). Was Re: RFC: Who owns a passed brigade?

2005-04-22 Thread Rici Lake
On 22-Apr-05, at 6:19 AM, Nick Kew wrote:

However, I think it is a fallacy that a cleaned-up brigade is not too
big to wait for pool cleanup. The brigade itself is about four  
pointers;
and the corresponding pool cleanup is another four pointers, so  
that's a
total of around 32 bytes, not a lot. But suppose that a brigade is
created for every 8k of filtered data,
Are there applications like that in the wild?  I'd have thought that
sounds like the kind of case where you definitely want to re-use
one brigade, with a brigade_cleanup after each 8Kb pass_brigade.
Indeed. But the filter does not know, in general, how much data it will  
be called upon to filter, nor how many times it will be invoked. Being  
invoked once every 8 kb should be pretty common, though, since that  
seems to be the recommendation on how often a filter should pass a  
brigade.

Are there applications like that in the wild? I don't know, but it  
seems likely that there are. None of the following examples are  
definitive, but they seem illustrative.

I cite these examples only because they are available; it was not an  
attempt to criticize anyone's code in particular, but rather to show  
the result of the lack of clear documentation on how to write an output  
filter.

In summary, of five modules examined, three of them create new brigades  
on every invocation without destroying the previous one. One of them  
fails to empty the passed brigade. One of them destroys the passed  
brigade. Two of them do one-time initialization on every invocation of  
the filter.

The only output filter I looked at which seems to conform to  
brigades-are-retained, buckets-are-passed is mod_deflate. This is the  
loop from mod_deflate.c, which seems to be a reasonable model:

  1   static apr_status_t deflate_out_filter(ap_filter_t *f,
  2  apr_bucket_brigade *bb)
  3   {
  // ... declarations
  4   if (APR_BRIGADE_EMPTY(bb)) {
  5   return APR_SUCCESS;
  6   }
  7   if (!ctx) {
  // ... determine whether we should filter the content
  8   /* We're cool with filtering this. */
  9   ctx = f-ctx = apr_pcalloc(r-pool, sizeof(*ctx));
 10   ctx-bb = apr_brigade_create(r-pool, f-c-bucket_alloc);
  // ... more initialization
 11   }
 12   while (!APR_BRIGADE_EMPTY(bb))
 13   {
 14   e = APR_BRIGADE_FIRST(bb);
  // ... check if e is metadata, process data in e
 15   apr_bucket_delete(e);
 16   }
 17   apr_brigade_cleanup(bb);
 18   return APR_SUCCESS;
 19   }
Notes:
lines 4-6
   these only serve to avoid initialization in the event that the filter
   only receives empty brigades. I don't know if this is necessary.
line 7
   all initialization is performed only once, if the ctx structure has
   not yet been created
line 12, 15
   all buckets are removed from the brigade and deleted
line 17
   the call to cleanup is presumably redundant since there is no break
   in the while block. But there could be.
mod_case_filter is supposedly an example of how to write filters, so it  
is likely that its code will be copied by naive filter writers. Here is  
its loop, in essence:

  1   static apr_status_t CaseFilterOutFilter(ap_filter_t *f,
  2   apr_bucket_brigade *pbbIn)
  3   {
  // ... declarations
  4   pbbOut=apr_brigade_create(r-pool, c-bucket_alloc);
  5   for (pbktIn = APR_BRIGADE_FIRST(pbbIn);
  6pbktIn != APR_BRIGADE_SENTINEL(pbbIn);
  7pbktIn = APR_BUCKET_NEXT(pbktIn))
  8   {
  // ... check for EOS
  8   apr_bucket_read(pbktIn,data,len,APR_BLOCK_READ);
  // ... do the transform
 10   pbktOut = apr_bucket_heap_create(buf, len,  
apr_bucket_free,
 11c-bucket_alloc);
 12   APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);
 13   }
 14   return ap_pass_brigade(f-next,pbbOut);
 15   }

Notes:
line 4:
  a new brigade is created on every invocation.
lines 5-7:
  the input brigade is iterated but the buckets are not removed
line 14:
  the input brigade is not cleaned, and the output brigade is not  
destroyed.

OK, here's mod_charset_lite, which is flagged as a learning experience  
(for the author), but I believe can actually be found in the wild. The  
filter function itself was a bit much to include here (it does seem to  
empty its input brigade), but the key is in the following function  
which is called everytime the filter decides it has output available:

  1   static apr_status_t send_downstream(ap_filter_t *f, const char  
*tmp,
  2   apr_size_t len)
  3   {
  // ... declarations
  4   bb = apr_brigade_create(r-pool, c-bucket_alloc);
  5   b = apr_bucket_transient_create(tmp, len, c-bucket_alloc);
  6   APR_BRIGADE_INSERT_TAIL(bb, b);
  7   rv = 

Re: RFC: Who owns a passed brigade?

2005-04-22 Thread Rici Lake
On 22-Apr-05, at 9:32 AM, Joe Orton wrote:
The issue here is really which party can *destroy* a brigade, right?
Or perhaps which party *must* destroy a brigade. This is much less of 
an issue if neither party creates a new brigade on every filter 
invocation. But some do.

In the current code where brigades are never really destroyed the fact
that apr_brigade_cleanup() is called everywhere is not really harmful,
is it?
No, what would be harmful would be apr_brigade_cleanup() not being 
called on a non-empty brigade.

It looked to me like it was going to much simpler to adapt filters
shipped in httpd to match the creator-must-destroy model, than to match
a consumer-must-destroy model, despite the ap_pass_brigade docstring.
That seems to be the case.
It's certainly easier to audit for.
Sure, but it applies equally to buckets. It seems that the consensus is 
creator-must-destroy-brigades; consumer-must-remove-buckets. That's 
not easy to audit for, which is why I think ap_pass_brigade should 
guarantee the latter condition.

In creator-must-destroy it is still fine for a consumer to call
apr_brigade_cleanup() on a brigade it is passed, of course; that's just
equivalent to consuming all the buckets, after all.
Sure. Consequently, it's ok for ap_pass_brigade to do the call. Then a 
consumer is free to either consume the buckets or not.



RFC: Who owns a passed brigade?

2005-04-21 Thread Rici Lake
After a filter calls ap_pass_brigade(f-next, bb), who owns bb?
The name of the function might lead one to believe that ownership of 
the brigade was transferred, but inspection of distributed filters 
shows that it has not been; both mod_deflate and mod_include expect the 
brigade to be usable (and apparently empty) after return from 
ap_pass_brigade.

This would imply that the rule is that ownership of the brigade is 
retained by the caller, but that ownership of the buckets in the 
brigade is transferred to the callee.

If this is the case, I believe that it should be clearly stated in the 
API documentation. Otherwise, it is inevitable that people writing 
filters will get it wrong. (And it would also suggest that the function 
should be renamed ap_pass_buckets())

It also leads to the question of what the expected behaviour is if the 
callee does not empty the passed brigade. Is it reasonable for a called 
filter to expect to be able to leave a bucket in the passed brigade so 
as to receive it on the next call? That would seem eccentric, although 
I believe it is what would happen in the case of mod_deflate and 
mod_include, at least.

If the brigade is required to have been emptied when the receiving 
filter returns, then would  it not make more sense for ap_pass_brigade 
(or ap_pass_buckets) to call apr_brigade_cleanup just before returning? 
This would avoid the situation where a called filter religiously cleans 
up the brigade, in order to conform to the API expectations, but the 
calling filter also cleans up the brigade, in order to avoid worrying 
about misbehaving downstream filters.

If, on the other hand, buckets can be left in the brigade for retention 
between filter invocations, it must be clear that the passing filter 
must always use the same brigade for every call to ap_pass_brigade(), 
or otherwise arrange for buckets left in the brigade to be repassed.

I note that apr_brigade_split() seems to be oriented towards a model 
where ownership of the brigade is passed, since it creates a new 
brigade. For the model where brigade ownership is retained, it would be 
better to have a function which shifted buckets from one brigade to 
another one.

There are a number of possible bugs resulting from a lack of clarity on 
brigade ownership. For example:

-- if a filter believes that brigade ownership has been passed to it, 
it may call apr_brigade_destroy on the brigade. This will not cause any 
obvious misbehaviour, but the pool cleanup on the brigade will have 
been removed, and if the request terminates on an error, it is 
conceivable that heap buckets later added to the brigade will leak.

-- if a filter believes that it is passing ownership, it will not call 
apr_brigade_destroy on the brigade. This is only serious in the case 
where the filter creates a new brigade on every invocation (as, for 
example, mod_include does). In that case, the pool's cleanup list will 
grow dramatically for large requests. (There are a couple of issues in 
bugzilla which seem to be showing this symptom.)

-- as mentioned above, if a filter does not understand that it is being 
passed ownership of the buckets, it may leave them in the brigade and 
then receive them again on a subsequent invocation. However, the 
upstream filter may choose to send a different brigade on the 
subsequent invocation, so the filter cannot rely on this behaviour.

My vote would be for a clear statement that ownership of brigades is 
retained and ownership of buckets is transferred, along with a 
documented call to ap_brigade_cleanup() in ap_pass_brigade to precisely 
define the expected state of the brigade on return.



Re: RFC: Who owns a passed brigade?

2005-04-21 Thread Rici Lake
On 21-Apr-05, at 3:58 PM, Cliff Woolley wrote:
FWIW, the documentation says:
The caller relinquishes ownership of the brigade.
This obviously differs from what some of our own filters are doing -- 
and
from my memory of past history.  It makes sense that it should be this
way, though I think at some point in the past we just guaranteed that 
the
core_output_filter always emptied the brigades it was passed so it
wouldn't have mattered until other filters that didn't behave that way
started getting inserted in front of c_o_f.

I'll keep digging.
I had seen that note in the documentation; in fact, I wrote several 
filters on the basis of that statement but they misbehaved in odd ways 
which is why I started to dig into the question. It is clearly at 
variance from practice.

FWIW, I think the (apparent) practice, where the caller relinquishes 
ownership of the buckets but not the brigade itself, is more efficient 
since it avoids a lot of brigade construction and destruction. But it's 
a slightly odd convention.

I don't think it makes that much difference, provided that there is a 
clear statement and that everyone follows it.



Re: RFC: Who owns a passed brigade?

2005-04-21 Thread Rici Lake
On 21-Apr-05, at 5:51 PM, Nick Kew wrote:
Rici Lake wrote:
FWIW, I think the (apparent) practice, where the caller relinquishes
ownership of the buckets but not the brigade itself, is more efficient
since it avoids a lot of brigade construction and destruction.
Agreed.  And it works for any situation, as either party can do a
cleanup to keep resource-use down, and a cleaned up brigade is not
too big to wait for pool cleanup.
Either party can do a cleanup does not give either party the 
responsibility. If neither party did the cleanup, that would be a 
problem. Lack of clear responsibility is always a problem, and not just 
with computer programs.

I favour having ap_pass_brigade do it. If the brigade must be cleaned 
before ap_pass_brigade returns, then it might as well be done by 
ap_pass_brigade.

However, I think it is a fallacy that a cleaned-up brigade is not too 
big to wait for pool cleanup. The brigade itself is about four 
pointers; and the corresponding pool cleanup is another four pointers, 
so that's a total of around 32 bytes, not a lot. But suppose that a 
brigade is created for every 8k of filtered data, and you run 4 GB of 
data through that without destroying any brigades. That's half a 
million brigades and half a million registered cleanups (linearly 
searched on kill, so you better hope the cleanups are destroyed in the 
right order).

See http://issues.apache.org/bugzilla/show_bug.cgi?id=23567 and 
http://issues.apache.org/bugzilla/show_bug.cgi?id=33382; Paul 
Querna's patch to fix the latter bug clearly shows how you can drown in 
empty brigades (or rather, how to avoid drowning in them).

Logically it makes a lot of sense
to run brigade_cleanup once per pass_brigade, and it's harmless if
the brigade was already cleaned.
But it's
a slightly odd convention.
I don't think it makes that much difference, provided that there is a
clear statement and that everyone follows it.
Creator-is-responsible is simple logic.  Other options are more complex
and bug-prone, as we then have issues like whether to destroy on
every call or only on EOS.  Or maybe on FLUSH 
Also, creator-is-responsible is a logic that can apply symmetrically
to both input and output filters without mindbending.
I didn't mean to say that creator-is-responsible is odd. What I meant 
was that it is odd that ownership of the buckets gets transfered but 
not of the brigades. Creator-is-responsible is fine. Transferred 
ownership is fine. Part of one and part the other is odd. Not 
unworkable -- just odd. And thus in need of clear documentation.

Dropping destroy from the API altogether and leaving that to the pool
would also work, AFAICS.
-1, too fragile. See above bugzillas.


Re: AP_MODE_EATCRLF considered indigestible

2005-04-18 Thread Rici Lake
On 18-Apr-05, at 9:43 AM, Greg Ames wrote:
if the network data for pipelined request n+1 isn't already present 
during ap_read_request for request n, it is highly unlikely to arrive 
by the time we start writing the response for request n.
It may well arrive during the time we're writing the response, though, 
if it is based on information from the response.

  if a new request does arrive during that window it's no big deal.  
we might write one small packet to the network and nothing breaks.
I agree with this. I doubt whether the optimization actually buys 
enough to make it worth the trouble. Also, if we ever get it wrong and 
leave unflushed data at the end of a request, it will look like the 
request is stalled.



Re: mod_deflate setting Vary

2005-04-17 Thread Rici Lake
On 17-Apr-05, at 5:30 PM, Roy T. Fielding wrote:
On Apr 17, 2005, at 1:25 PM, [EMAIL PROTECTED] wrote:
+
+  *) mod_deflate: Merge the Vary header, isntead of Setting it. Fixes
+ applications that send the Vary Header themselves, and also 
apply
+ mod_defalte as an output filter. [Paul Querna]

-apr_table_setn(r-headers_out, Vary, Accept-Encoding);
+apr_table_mergen(r-headers_out, Vary, Accept-Encoding);

Is deflate setting Vary here because it is changing the
Content-Encoding, or is this part also done for Transfer-Encoding?
It's setting the Content-Encoding, at this point based on its 
interpretation
of the Accept-Encoding header. mod_deflate does not implement 
Transfer-Encoding
as far as I can see.

It doesn't quite get the Accept-Encoding right. It treats:
  Accept-Encoding: gzip;q=0
as accepting gzip encoding, and it treats:
  Accept-Encoding: *
as not accepting gzip encoding.
I could not find any bug reports complaining about this, so I don't 
think that fixing it is high priority. But I've put it on my tuit list 
anyway. For what that's worth.

The use of setn rather than mergen does create caching issues, although 
it happens to work with mod_negotiation (because mod_negotiation puts 
Vary: headers in err_headers_out instead of headers_out). I couldn't 
find any bug reports complaining about the caching issue either, but 
the fix is trivial.

It should not be setting Vary for changes to the Transfer-Encoding.
Roy



Re: svn commit: r159941 - httpd/httpd/branches/simple-conf/Makefile.in

2005-04-17 Thread Rici Lake
This didn't quite get it right, either. It will overwrite the .conf 
file unless the .conf file is httpd.conf (rather than progname.conf)

--- Makefile.in Sun Apr 17 20:51:18 2005
+++ Makefile.in.new Sun Apr 17 20:51:50 2005
@@ -78,7 +78,7 @@
if [ $$i = httpd.conf ]; then \
file=`echo $$i|sed 
s/.*.conf/$(PROGRAM_NAME).conf/`; \
fi; \
-   if test ! -f $(DESTDIR)$(sysconfdir)/$$i; then \
+   if test ! -f $(DESTDIR)$(sysconfdir)/$$file; then \
$(INSTALL_DATA) 
$(DESTDIR)$(sysconfdir)/original/$$i $(DESTDIR)$(sysconfdir)/$$file; \
fi; \
done ; \



Re: AP_MODE_EATCRLF considered indigestible

2005-04-15 Thread Rici Lake
On 15-Apr-05, at 11:06 AM, Justin Erenkrantz wrote:
The problem with this code is that it'll only read 8 characters worth 
of
CRLFs.  We'd probably want to retry the reads until we don't get more 
CRLF
pairs.  It's not determinate how many extra CRLFs we will get.
Are there browsers which send more than four extraneous CRLF's but 
still support keepalive?

If this code reads four CRLF's without finding any data, it will assume 
that there is no immediate incoming request, and send a flush down the 
filter chain. That does no harm aside from wasting some cycles; on the 
assumption that the number of cycles wasted is not that enormous and 
the number of browsers which exhibit that particular behaviour is not 
huge, I'd say it's not really worth completely retrying the read.

I believe that the current code may exhibit the opposite behaviour: it 
can falsely report that data is present and thereby omit the flush with 
the result that a response is not terminated until the following 
request. If the connection is not being pipelined, the following 
request is unlikely to show up until the response is terminated, so 
that would lead to a request which stalls very near to the end. I 
believe I may have actually seen this symptom in real life, now that I 
think about it.

So, yes, I'd be fine with removing EATCRLF for 2.2 if you provide a 
tested
patch.  ;-)  -- justin
OK, I'll start by at least compiling and testing what I posted.


Re: AP_MODE_EATCRLF considered indigestible

2005-04-15 Thread Rici Lake
On 15-Apr-05, at 10:56 AM, Greg Ames wrote:
the reason that this and the corresponding 1.3 BUFF logic exists is to 
minimize tinygrams - ip packets that are less than a full mtu size.  
tinygrams definately degrade network performance and trigger things 
like Nagle's algorithm.  in this particular case we are trying to 
combine response data from two or more different pipelined requests 
into a single packet.
Yes, I understood that. My question was (a) whether an HTTP response is 
every tiny enough to truly qualify as a tinygram, and (b) whether the 
work done to avoid sending them is worth the cost of not avoiding them.

Clearly, some HTTP responses are less than the MSS for some 
connections. But they are hardly one-byte packets. It would certainly 
be possible to simply set NODELAY on the socket to disable Nagle, and 
it's not obvious to me that there would be a huge degradation of 
network throughput resulting from this.

Still, if it truly is an issue, there has got to be a better way to 
accomplish it.



AP_MODE_EATCRLF considered indigestible

2005-04-14 Thread Rici Lake
I was taking a look at the implementation of the renamed (but still 
misleading) AP_MODE_EATCRLF, in an attempt to figure out what a 
conforming input filter ought to do if it's handed that mode.

Some comments:
1) The loop in server/core_filters.c:226
while (c  str + len) {
if (*c == APR_ASCII_LF)
c++;
else if (*c == APR_ASCII_CR  *(c + 1) == APR_ASCII_LF)
c += 2;
else
return APR_SUCCESS;
}
If the last character of the bucket being examined is an ASCII CR, this 
will first look beyond the end of the bucket's buffer to check for an 
LF; assuming it doesn't find one in the random byte it looks at, it 
will then return APR_SUCCESS even if the following bucket (if any) 
started with an LF. So that's presumably wrong on two counts.

2) The only place I can see that this mode is actually used is in 
modules/http/http_request.c, where it is used to decide whether or not 
to flush the filter chain before logging. the comment at line 226 of 
that file seems reasonable:
/*
 * We want to flush the last packet if this isn't a pipelining 
connection
 * *before* we start into logging.  Suppose that the logging causes 
a DNS
 * lookup to occur, which may have a high latency.  If we hold off 
on
 * this packet, then it'll appear like the link is stalled when 
really
 * it's the application that's stalled.
 */
Presumably that's true whether or not the connection is pipelining; why 
is it acceptable to make a link appeared stalled in a pipelining 
connection but not a keepalive connection?

3) In general, AP_MODE_EATCRLF seems to me to be a tight coupling 
between a particular function in http_request.c and a particular input 
filter (core_input_filter). The expense is added complexity for every 
input filter.

Removing the mode altogether would mean that either every request was 
flushed through the filter chain even in pipelining mode, or that 
ap_process_request would have to come up with some other way of 
detecting pending input. Unless there is good empirical evidence of the 
pipelined filter flush leading to performance problems, the first of 
these options would seem attractive.

I presume the main reason to avoid flushing between pipelined requests 
would be a sequence of WebDAV requests with short responses, although 
there is also the common case of browsers pipelining a request for 
favicon.ico before the page actually requested by the user. But are the 
responses so short that the flush would have an enormous impact? The 
favicon.ico response from a random iconless server comes out at 458 
bytes, for example.



mod_proxy handling of error returns from scheme handlers

2005-04-08 Thread Rici Lake
Just a query: I was looking at the code for mod_proxy.c and I wonder if 
the behaviour is really correct.

At line 632, it loops through the configured proxy hosts, trying 
proxy_run_scheme_handler on each one. If it receives any return code 
other than DECLINED or HTTP_BAD_GATEWAY, it accepts the result.

But then at line 656, having failed to find a proxy which will handle 
the request, it tries the request directly. In this case, it retries on 
any error return if max_attempts has not been reached. This would cause 
a retry, for example, on an authentication required status return, 
which seems odd.

Perhaps I am missing something.
Rici


ProxyAuthOverride

2005-04-08 Thread Rici Lake
This is a small patch to mod_proxy which implements a single directive:
ProxyAuthOverride 403|404|Off
It overrides 401 returns from the origin server and substitutes them 
with the indicated status code.

This may be an unusual use case, but it does come up once in a while 
when trying to proxy origin servers which insist on sending 401 to 
allow the user to upgrade to an authenticated session; in some cases it 
may be desirable to proxy only anonymous connections to such servers. 
(It would be important to scrub Authorisation: headers from such a 
request.)

A version for 2.0.52 has been lightly tested: 
http://primero.ricilake.net/proxy2.patch
(the patch applies cleanly to 2.0.53).

A version for TRUNK is also available: 
http://primero.ricilake.net/proxy21.patch but I am not completely 
certain about the interaction with balancers and the post_request hook.

Comments are welcome.


Re: mod_proxy handling of error returns from scheme handlers

2005-04-08 Thread Rici Lake
On 8-Apr-05, at 1:40 PM, Graham Leggett wrote:
In theory (if I understand it correctly) it wouldn't - because it 
would only try the direct connection on DECLINED or bad gateway - if 
authentication was denied, the result would be neither DECLINED nor 
bad gateway, and so the direct connection would never be tried in this 
case.
I guess it raises the issue of why the direct connection is attempted. 
Presumably it is possible that all the configured downstream proxies 
refused to pass on the request to the origin server, but that *this* 
server succeeds in passing the request to the origin server. In that 
case, the result could be anything that the origin server returned. No?



Re: simple-conf branch

2005-04-06 Thread Rici Lake
On 6-Apr-05, at 11:43 AM, Phil Lello wrote:
As part of this, it could be useful to generate a RunningConfig.cnf 
file as part httpd startup, which would be a merged config file with 
comments indicating which file set the option (and possibly which 
options have taken defaults).
You can get easily get most of that information with the version of 
mod_info in 2.1.



Re: Default Modules

2005-04-06 Thread Rici Lake
On 6-Apr-05, at 12:56 PM, Mads Toftum wrote:
mod_asis: yes - no
I'd prefer - most as it is rarely used but not totally useless.

Others mentioned mod_ssl which I think is too much trouble to be worth
enabling other than when requested explicitly - there's the whole 
crypto
regs issue and it does link in another lib, which is something that I
prefer limiting to when it is actually needed.
So what are the criteria for yes, no, most and all?
I think it would be more productive to come up with common
criteria than to argue about individual modules.
Perhaps all should just go away. It obviously does not
really mean all and it is hard to come up with a good
description of what it does mean.
As I said earlier, a lot of people seem to be surprised at
what all does not include (ssl and proxy, for example).
And as no tool is provided to list what all and most
actually do, people are pretty well left to their own
intuitions.
The current list (as produced by the shell script I pasted
in an earlier message) is:
3. all  cern_meta
3. all  log_forensic
3. all  mime_magic
3. all  unique_id
3. all  usertrack
3. all  version
So what do those six modules have in common?


Re: Flushing lingering CGI processes

2005-03-31 Thread Rici Lake
On 31-Mar-05, at 6:16 AM, Nick wrote:
As a refinement;
The timer firstly sends a signal to the CGI program which may be 
caught by the CGI program to generate some form of output. This output 
will prove whether the http connection is still open. If the CGI 
program doesn't output anything, a TERM is sent. If still no output 
and process is still running, a KILL is sent.
This seems overly complex to me, and furthermore a bit difficult to 
actually implement in a CGI. How is a CGI to know whether it is hung? 
If it could figure that out, it could kill itself easily enough, but I 
believe it to be a computationally intractable problem related to the 
halting problem; certainly it is not something I'd like to try in a 
bash script. :)

It seems to me that a simple timer would be sufficient; however, it 
would be nice to be able to configure individual CGIs (or directories 
of CGIs) with longer or shorter timers; I have seen instances of 
webapps with CGIs which compute for several minutes, or even longer. 
(Presumably, the users of such webapps are very patient or really need 
the results.)

Rici


Re: Reformat httpd-win.conf

2005-03-30 Thread Rici Lake
On 30-Mar-05, at 12:52 PM, Sander Striker wrote:
#EnableSendfile off
I'd suggest uncommenting that for a win32 config file.


Re: svn commit: r125465 httpd/httpd/trunk/modules/ldap/util_ldap.c

2005-03-22 Thread Rici Lake
On 22-Mar-05, at 5:17 PM, Graham Leggett wrote:
This is also broken - the LDAPTrustedClientCert is supposed to have 
scope on a per directory basis.

To fix this, we would need to add a directory config creator, and a 
directory config merger, is that correct?

Sure, but then there is still a problem with .htaccess files. The 
client_certs array will have been allocated in the request pool during 
the merge operation for the .htaccess file, but it then may be copied 
into the global connection cache; currently, the code only copies the 
array header (at line 525). So you'd end up with dangling pointers 
after the request finished. If, on the other hand, the strings were 
copied into the server or config pools, then they would slowly consume 
memory. The most plausible solution might be to manually manage 
connection cache memory with malloc instead of using pool-allocated 
memory.

Although now that I look at the file again, I see that it never would 
have worked anyway because at line 1546, the function 
util_ldap_set_trusted_client_cert stores the certificate in 
st-global_certs instead of st-client_certs.

I also wonder about the two apr_array_append calls at line 1671 in 
util_ldap_merge_config. The second one would mean that the client certs 
specified in LDAPTrustedClientCerts would be appended to the list of 
client certs inherited from some containing section. This might be 
counter-intuitive if the certs are supposed to be directory scoped. I'm 
not sure what the use case for this directive would be, so it's hard to 
know for sure.



util_ldap.c redux

2005-03-22 Thread Rici Lake
While I've got util_ldap.c open in front of me, I also noticed that the 
following code occurs a number of times:

/*
 * If any LDAP operation fails due to LDAP_SERVER_DOWN, control 
returns here.
 */
start_over:
if (failures++  10) {
return result;
}
if (LDAP_SUCCESS != (result = util_ldap_connection_open(r, ldc))) {
return result;
}
// do something with ldc and goto start_over if it returns 
LDAP_SERVER_DOWN

Now, util_ldap_connection_open either returns immediately, if the 
connection appears to be bound, or it tries to bind the connection, 
looping 10 times on LDAP_SERVER_DOWN (line 337, I agree with the 
comment at that point in the code).

So in order for the start_over loop to occur more than twice, it would 
be necessary that:

  some operation returns LDAP_SERVER_DOWN;
  util_ldap_connection_open succeeds in reopening the connection
with less than 10 retries;
  but the operation returns LDAP_SERVER_DOWN again.
Not only does this seem unlikely, it also seems like a situation where 
it would be better to return a 503 than to keep retrying. The correct 
logic would be something like:

  1. if the connection is bound, try the operation
  2. if the operation returns LDAP_SERVER_DOWN or the connection was 
not bound,
 call util_ldap_connection_open and if it returns success, try the 
operation

I'm not sure under what circumstances various LDAP SDKs return 
LDAP_SERVER_DOWN, but I suppose it would generally either be because:
 1) an existing connection was dropped, perhaps because of an idle 
time-out
 2) there is no LDAP server listening on the interface at all and
the connection was refused
 3) the LDAP server failed to accept the connection within some timeout.

In the first case, one or two retries are in order. In the second case, 
retrying is unlikely to change the situation at all. And in the third 
case, the user is likely to have long since gone somewhere else. So I 
think that 10 retries within connection_open is excessive as well; 
three is probably more than enough.



Re: svn commit: r125465 httpd/httpd/trunk/modules/ldap/util_ldap.c

2005-03-21 Thread Rici Lake
The referenced commit added two OR_ALL directives to util_ldap:
  LDAPTrustedClientCert
  LDAPTrustedMode
However, that module has no per-directory config struct; all 
configuration is stored in per-server structs.

Consequently, as far as I can see, the OR_ALL directives are 
unworkable. In particular, LDAPTrustedClientCert, if invoked in an 
.htaccess file, will:
  a) store a pointer to request-pool-allocated strings in a 
config-pool-allocated array
  b) tromp on changes being made in another thread reading an .htaccess 
file in the same vhost.

Even if it were not for that, it seems odd to me that the configuration 
merge function appends the parent array of client certs to the current 
array of client certs; I think that is counter to the idea that a 
directory might have its own local collection of client certs. 
(Accumulating the global certs seems more reasonable.)

As it stands, if my analysis is correct, the use of 
LDAPTrustedClientCert in an .htaccess file would probably cause a child 
to segfault; in a threaded mpm, the consequences would be even more 
serious. Until a more thorough review is done, it would probably be a 
good idea to change the directives to RSRC_CONF, or disable them 
altogether.

Rici


Re: do we still want sendfile enabled with our default conf files?

2005-03-17 Thread Rici Lake
Bill,
What's the performance problem?
There was someone on #apache yesterday who claimed to experience the 
following problem: On a local switched network, Apache2 on Win2K3 
transferred at 800kbps, while IIS on the same box transferred at 
8-11mbps. When he replaced the cable to the server with a shorter one, 
Apache speeded up to 4 mpbs but IIS still ran at full speed. He tried 
it with both Windows and Mac OS X clients, and experienced the same 
symptom.

Could this be related to sendfile?
Rici.
On 17-Mar-05, at 11:12 AM, Bill Stoddard wrote:
Joshua Slive wrote:
Jeff Trawick wrote:
IMHO, all default conf files except for highperformance[-std].conf
should have the EnableSendfile off directive uncommented, under the
theory that people who want the extra performance can take an action
to re-enable sendfile and then pay attention to the web server
behavior afterwards but people who just want the darn thing to work
shouldn't be bothered.  Some percentage of the latter group will have
problems they are ill-equipped to handle, because some feature that 
is
not important to them is enabled.

We've discussed this before and nothing ever happened.  I'm willing 
to
bet that the experiences of a few more unlucky users since the last
discussion may have shifted the balance.  Comments?
+1, as before.
 From the users' perspective, sendfile results in unexplained 
corruption or uninterpretable error messages pretty much any time a 
network filesystem is used to host content (and random other times on 
win32).
Hey Joshua,
Guess I've not been paying attention... what type of problems have you 
seen with sendfile (TransmitFile) on Win32?  We do have a performance 
problem (will leave out details unless your interested) but I was not 
aware of corruption problems...

Bill



Re: do we still want sendfile enabled with our default conf files?

2005-03-17 Thread Rici Lake
OK, the person came back, I got him to disable sendfile, and apache 
started transferring at full speed.

I believe sendfile should be disabled by default at least on Windows.
Rici
On 17-Mar-05, at 11:48 AM, Rici Lake wrote:
Bill,
What's the performance problem?
There was someone on #apache yesterday who claimed to experience the 
following problem: On a local switched network, Apache2 on Win2K3 
transferred at 800kbps, while IIS on the same box transferred at 
8-11mbps. When he replaced the cable to the server with a shorter one, 
Apache speeded up to 4 mpbs but IIS still ran at full speed. He tried 
it with both Windows and Mac OS X clients, and experienced the same 
symptom.

Could this be related to sendfile?
Rici.
On 17-Mar-05, at 11:12 AM, Bill Stoddard wrote:
Joshua Slive wrote:
Jeff Trawick wrote:
IMHO, all default conf files except for highperformance[-std].conf
should have the EnableSendfile off directive uncommented, under 
the
theory that people who want the extra performance can take an action
to re-enable sendfile and then pay attention to the web server
behavior afterwards but people who just want the darn thing to work
shouldn't be bothered.  Some percentage of the latter group will 
have
problems they are ill-equipped to handle, because some feature that 
is
not important to them is enabled.

We've discussed this before and nothing ever happened.  I'm willing 
to
bet that the experiences of a few more unlucky users since the last
discussion may have shifted the balance.  Comments?
+1, as before.
 From the users' perspective, sendfile results in unexplained 
corruption or uninterpretable error messages pretty much any time a 
network filesystem is used to host content (and random other times 
on win32).
Hey Joshua,
Guess I've not been paying attention... what type of problems have 
you seen with sendfile (TransmitFile) on Win32?  We do have a 
performance problem (will leave out details unless your interested) 
but I was not aware of corruption problems...

Bill



ProxyRemoteMatch brokenness

2005-01-19 Thread Rici Lake
The ProxyRemoteMatch directive is supposed to use a regex to redirect 
certain proxy requests to a remote proxy server (as I understand the 
documentation). I actually needed that for a configuration (see below) 
and was puzzled to find that it doesn't work: (line numbers from 
APACHE_2_0_BRANCH, because I can't find a web interface to svn)

389 :  p2 = ap_strchr_c(ents[i].scheme, ':');  /* is it a partial URL? 
*/
390 :  if (strcmp(ents[i].scheme, *) == 0 ||
391 :  (ents[i].use_regex  ap_regexec(ents[i].regexp, url, 
0,NULL, 0)) ||
392 :  (p2 == NULL  strcasecmp(scheme, ents[i].scheme) == 0) ||
393 :  (p2 != NULL 
394 :   strncasecmp(url, ents[i].scheme, strlen(ents[i].scheme)) == 
0)) {

ap_regexec returns 0 on success, so the condition in line 391 matches 
if the regex didn't match, which is the reverse of the expected 
behaviour. Changing line 391 to

391 :  (ents[i].use_regex  ap_regexec(ents[i].regexp, url, 
0,NULL, 0) == 0) ||

produced the expected behaviour [1].
This code seems to have been unchanged since the directive was 
introduced in 2.0.35, almost three years ago, and I cannot find any 
relevant bug reports; in fact, googling for ProxyRemoteMatch did not 
yield any indication that anyone has ever tried to use the directive, 
much less succeeded. This would seem to be a reasonable case for 
deleting the directive, although I am now using it (with a patched 
httpd, of course) so I would be slightly resistant to this.

The particular configuration I'm using is to tunnel http and https 
through an ssh tunnel to a gateway machine inside a firewalled network; 
the relevant hostnames are not in the external DNS, of course, so I 
wanted to proxy only requests which match the internal domain suffix. 
This is not a feature of any browser I know of; browsers seem to have 
proxy exceptions rather than proxy affirmations. So I ended up with two 
proxy servers, one on a gateway in my local network, and another one on 
the gateway in the remote network; the two gateways are connected 
through an ssh tunnel:

Local gateway:
  ProxyRequests On
  Proxy *
Order allow,deny
allow from 10.
  /Proxy
  ProxyRemoteMatch ^(http://)?[^/]*.internal.dns.suffix 
http://localhost:
  # (http://)? is to also allow proxying of CONNECT. Surprisingly, it 
works.

Remote gateway (separate apache instance, this is almost the entire 
config file):

  Listen localhost:
  ProxyRequests On
  AllowCONNECT various ports listening for https
where the ssh tunnel connects port  on the local gateway to 
localhost: on the remote gateway.

If anyone has a better way of accomplishing this, I'll withdraw my 
objection to deprecating ProxyRemoteMatch.

---
[1]: I would personally have written that code somewhat differently to 
avoid doing non-regex matches in the regex case:

  if (ents[i].use_regex) {
 if (ap_regexec(ents[i].regexp, url, 0, NULL, 0) != 0) continue;
  } else if (strcmp(ents[i].scheme, *) != 0) {
 if (ap_strchr_c(ents[i].scheme, ':') == 0) {
 if (strcasecmp(scheme, ents[i].scheme) != 0) continue;
 } else {
 if (strncasecmp(url, ents[i].scheme, strlen(ents[i].scheme) != 
0) continue;
 }
  }
  /* if we get here, we've got a match */




Proposal: mod_acceptfilter

2004-12-13 Thread Rici Lake
With reference to [Re: svn commit: r111596]
The following proposal attempts to provide a simple way of configuring
a server which supports multiple protocols (http, ftp, nntp) while
allowing the administrator and/or packager to configure OS kernel-
implemented accept filters, as provided by FreeBSD and to a lesser
extent by Linux. (And probably other OS's.)
On 13-Dec-04, at 11:00 AM, Paul Querna wrote:
I just tested it here, and it Fails on FreeBSD if accf_http is loaded 
in the kernel.

Perhaps both the accept filter and defer accept code belongs somewhere 
in modules/http/* ?
It seems to me that accept filters need to be configured for specific
protocols and in a way consistent with the specific operating system.
After an interesting discussion on IRC, I volunteered to write up the
following proposal (which I take full responsibility for; I don't
claim that it represents any consensus).
The essential point here is that accept filters need to be established
before the call to accept(). Consequently, the setting will be specific
to the listener. In the current architecture, the protocol module is not
invoked until too late to make this determination. It therefore seems
reasonable to provide a separate module, mod_acceptfilter, which handles
the configuration of accept filters.
However, protocol modules will need to indicate which protocol they
handle. This could be done by a callback which registers the protocol[s]
which they handle, or it could be done by using the name of the protocol
module directly as a key. I personally prefer the former solution, but
the second one might be possible for legacy-compatibility.
The proposal is then:
1. Core directives
1.a: The Listen directive is extended to allow the optional 
specification
 of a protocol:

  Listen [transport/][ip:]port [protocol]
where
  transport is either a domain specification from protocols(5),
like tcp or udp, or it might even (in some future time)
be a specification like namedpipe (although the use of
transports which do not identify themselves with ip numbers
implies significant changes). (This is not core to the proposal.)
  port is either a number or a service from services(5)
  protocol is either the name of a protocol registered by a protocol
module or the name of the protocol module itself.
  If protocol is omitted, then if a service name is provided and a
  protocol module has registered that name, then that name is used
  to bind the protocol module to the listener. Otherwise, http is
  assumed.
1,b: The assignment of protocols to listeners can be overridden in a
  VirtualHost section by means of the new Protocol directive:
  Protocol registered-protocol or module-name
  The argument is interpreted as above.
1.c: protocol modules register themselves as providing a protocol
  by providing a hook to ap_register_protocol; this is called with
  the name of a protocol, and the module either returns OK or DECLINED.
2. mod_acceptfilter
This new module would implement two directives;
  DefaultAcceptFilter (protocol | *) (os-dependent identifier+ | 
none)

  AcceptFilter (os-dependent identifier+ | none)
The first of these is only available in the main configuration section; 
the
second only in VirtualHost sections. Although this specification 
makes it
possible to use the same directive name for both options, I personally 
believe
that that particular form of parsimony is confusing for people writing
configuration files.

If there is an AcceptFilter directive within the VirtualHost section
applicable to a particular ip:port, that directive will apply. 
Otherwise,
if there is a DefaultAcceptFilter for the protocol bound to the 
VirtualHost
section, that directive will apply. Otherwise, none is assumed.

mod_acceptfilter will then use the first AcceptFilter in the list which 
is
available, using an interface provided by APR. If no accept filter is
available, none will be used (that is, no accept filter will be placed
on the connection.) This is not to be considered an error.

The intent is to allow Apache packagers for specific OS distributions to
create reasonable default accept filter configurations for the 
particular of
the particular OS. The FreeBSD mechanism, for example, is extensible 
through
kernel modules. So, a FreeBSD packager who had also implemented
an accf_ldap filter could provide the directive:

  DefaultAcceptFilter ldap accf_ldap accf_data
If some protocol module had registered as providing the ldap protocol,
mod_acceptfilter would attempt to enable accf_ldap on listeners assigned
to the ldap protocol; failing that, it would attempt to enable 
accf_data.
In the event that the config file was being used on, for example, Linux,
APR might be able to substitute the Linux equivalent of accf_data.

Regardless of the DefaultAcceptFilter specifications, an Apache
administrator with specific needs could simply place an AcceptFilter
directive in a particular virtual host.
-
Config file 

Proposal: Listen sections

2004-12-13 Thread Rici Lake
(An addendum to the previous mod_acceptfilter proposal.)
From my experience on #apache, it appears that Mere Mortals have
a great deal of difficulty successfully configuring virtual hosts.
I think this is understandable; the algorithm used to allocate
requests to virtual host sections is complicated, and it conflates
two radically different concepts: the assignment of a listener to
a particular ip:port (or a particular ip), and the assignment of
named virtual hosts within an ip[:port] listener.
Successfully configuring a name virtual host, in the current
configuration system, requires co-ordinating three directives
which often appear in different parts of the configuration:
Listen, NameVirtualHost, and VirtualHost
Minor discrepancies in the ip:port specifications in these
directives can cause named virtual hosts to fail to work
without providing any useful error messages.
In addition, not all protocols support the concept of named
virtual hosts.
I'm interested in peoples' reactions to the following proposal:
Directive:
  Listen ([transport /](ip | * | _default_)[: port])+
Context: main configuration
This is the combination of an existing Listen directive with an
existing (non-named) Virtual Host directive; that is, you could
replace
Listen 1.2.3.4:80
VirtualHost 1.2.3.4:80
  #...
/VirtualHost
with
Listen 1.2.3.4:80
  # ...
/Listen
Directive:
  NameVirtualHost (name | _default_)+
Context: Listen section
This provides the definition of a single named virtual host within
a single Listener; it replaces the existing NameVirtualHost and
VirtualHost directives. Exactly one NameVirtualHost must be
tagged with the _default_ name; if there is one it will be used
if no name matches or no virtual host name is provided by the
client.
It is an error to provide a NameVirtualHost in a Listen section
for a protocol which does not support named virtual hosts.
It is an error to provide more than one NameVirtualHost _default_
section in the same Listen section.
If no NameVirtualHost _default_ is provided, the first 
NameVirtualHost
section will be used as a default, for backwards pseudo-compatibility,
but a warning will be generated.

---
Example configurations:
1) Simple configuration, a number of virtual hosts with
   different document roots:
Current:

Listen *:80
#...
#...
NameVirtualHost *:80
#...
#...
VirtualHost *:80
  ServerName this.does.not.match.any.name.com
  DocumentRoot /var/www/noname
  DirectoryIndex stopUsingNetscape2.html
/VirtualHost
VirtualHost *:80
  ServerName www.example.com
  ServerAlias example.com
  DocumentRoot /var/www/example
/VirtualHost
VirtualHost *:80
  ServerName www.example2.com
  ServerAlias example2.com
  DocumentRoot /var/www/example2
/VirtualHost
Proposed:
-
Listen *:http
  NameVirtualHost _default_
DocumentRoot /var/www/noname
DirectoryIndex stopUsingNetscape2.html
  /NameVirtualHost
  NameVirtualHost www.example.com example.com
DocumentRoot /var/www/example
  /NameVirtualHost
  NameVirtualHost www.example2.com example2.com
DocumentRoot /var/www/example2
  /NameVirtualHost
/Listen

2) A configuration with named virtual hosts on one
ip and non named virtual hosts on another one
Current:

Listen *:80
#...
#...
NameVirtualHost 1.2.3.4:80
#...
#...
VirtualHost 1.2.3.4:80
  ServerName this.does.not.match.any.name.com
  DocumentRoot /var/www/noname
  DirectoryIndex stopUsingNetscape2.html
/VirtualHost
VirtualHost 1.2.3.4:80
  ServerName www.example.com
  ServerAlias example.com
  DocumentRoot /var/www/example
/VirtualHost
VirtualHost 1.2.3.4:80
  ServerName www.example2.com
  ServerAlias example2.com
  DocumentRoot /var/www/example2
/VirtualHost
VirtualHost 1.2.3.5:80
  # This is not a named virtual host, but it is
  # not locally obvious
  ServerName www.example3.com
  DocumentRoot /var/www/example3
/VirtualHost
Proposed:
-
Listen 1.2.3.4:http
  NameVirtualHost _default_
DocumentRoot /var/www/noname
DirectoryIndex stopUsingNetscape2.html
  /NameVirtualHost
  NameVirtualHost www.example.com example.com
DocumentRoot /var/www/example
  /NameVirtualHost
  NameVirtualHost www.example2.com example2.com
DocumentRoot /var/www/example2
  /NameVirtualHost
/Listen
Listen 1.2.3.5:http
  ServerName www.example3.com
  DocumentRoot /var/www/example3
/Listen
--
3) I am aware that this proposal does not provide for the
current possibility of doing something like:
Listen 1.2.3.4:80
Listen 192.168.1.1:80
NameVirtualHost *:80
VirtualHost 1.2.3.4:80
  # Only available on the external interface
  ServerName www.example.com
/VirtualHost
VirtualHost *:80
  # Available on both interfaces
  ServerName www.example2.com
/VirtualHost
VirtualHost 192.168.1.1:80
  # Only available on the internal interface
  ServerName www.example3.com
/VirtualHost
I don't believe this is a common configuration case, although
it certainly exists. Such 

Re: mod_comment (was: svn commit: r106879 ...)

2004-11-30 Thread Rici Lake
On 29-Nov-04, at 11:02 AM, William A. Rowe, Jr. wrote:
Yes, but our API dictates that they are looking for ...
(in the case of this module) - which is a bad thing IMHO (and in
Greg's and Ryan's as well)...
Sorry, I don't understand that. The search for  is in the first line
of the directive; arg is simply the arg delivered by the config 
apparatus.
The point of the following offending code is to ensure that the
directive was:
  Comment This is a comment
and not:
  Comment This is a comment
I'm actually neutral on whether it is worth doing this check, but all 
the
other section directives do it, and while it seems a bit puritanical, it
does enforce the vaguely HTML-like section syntax.

This is the offending code;
endp = ap_strrchr_c(arg, '');
if (endp == NULL) {
return unclosed_directive(cmd);
}
So how is that different from parsing any other standard directive? It
is no different than mod_access, for example, checking that the first
argument to Allow is the word from.
and this is the good code;
*(ap_directive_t **)dummy = NULL;
return ap_soak_end_container(cmd, Comment);
It will be noted that this module does not in any way refer to
the fp hidden away in the structs. All it does is use the
public API. If the internal functioning of that API changes, the
module will continue to work.
The problem is, I doubt this module inserts its contents into
the config tree.
That is true, it doesn't. The *(ap_directive_t **)dummy = NULL;
statement tells the config apparatus that there are no directives
to insert. This is the part of the module that I personally object
to; that behaviour is non-standard, undocumented, and a kludge.
However, everybody does it.
So if we spew the contents (using mod_info,
or any other tool based on our config tree API) we lose the
contents.
I would be happy to fix this :) It is worth noting that EXEC_ON_READ
directives are all removed from the config_tree, so that mod_info
cannot show useful things like LoadModule directives.


Re: svn commit: r106879 - /httpd/httpd/trunk/CHANGES /httpd/httpd/trunk/server/core.c

2004-11-29 Thread Rici Lake
On 29-Nov-04, at 4:05 AM, André Malo wrote:
* Paul Querna wrote:
What is wrong with IfDefine 0?
It's ugly and non-intuitive.
Just out of curiousity, what do people think is wrong with the
Comment directive implemented by this micro-module:
https://ssl.bulix.org/projects/rici/file/apache/mod_comment.c?rev=49



Re: mod_comment (was: svn commit: r106879 ...)

2004-11-29 Thread Rici Lake
On 29-Nov-04, at 10:10 AM, Greg Stein wrote:
So no... I don't like mod_comment. It uses a feature that I dearly
wish to see removed from httpd.
I agree 100% on removing that feature. If it were gone, something
like mod_comment would have to be implemented in the core, I suppose.
However, I was not really proposing it as a module (except maybe
cheekily); rather as a proposed syntax which seems to me to be
more user-friendly than IfDefine 0. Of course, tastes differ.
I'll leave my comments on rewriting the config parsing (which I
agree needs to be done) for a longer message; Paul Querna and I
have been talking about this, too. However, I don't think it is
fair to simply point the finger at the invasive mod_perl.
For example, mod_macro is a very simple, easily audited module,
which is extremely useful, and which requires exactly the same
access to the configuration stream. By the same token, the
environment variable expansion in config files is a hack which
would have been better put into an optional module.
All of this suggests that some mechanism ought to exist for modules
to intervene in the configuration process; I think we are
completely in agreement that the current mechanism is not the
right one.
Rici


Re: svn commit: r106879 - /httpd/httpd/trunk/CHANGES /httpd/httpd/trunk/server/core.c

2004-11-29 Thread Rici Lake
Rephrasing the question slightly:
On 29-Nov-04, at 9:44 AM, Rici Lake wrote:
Just out of curiousity, what do people think is wrong with the
Comment directive implemented by this micro-module:
I meant to ask: What do people think about a Comment
directive, which would simply absorb the configuration
stream up to the matching /Comment line, parsing only
to the extent necessary to match section opens and closes?
Thanks to Greg Stein for pushing me to be clearer about the
question.
Rici


Re: cvs commit: httpd-2.0/server protocol.c

2004-10-25 Thread Rici Lake
On 25-Oct-04, at 3:37 AM, Graham Leggett wrote:

  +else if (r-connection-keepalive != 
AP_CONN_KEEPALIVE) {
  +ap_log_rerror(APLOG_MARK, APLOG_NOTICE, rv, r,
  +  request line read error.);
  +}
Is it possible to put a more descriptive message than request line 
read error. If I was to see that in a logfile, I would have 
absolutely no idea whatsoever what it was trying to tell me :(
I believe that ap_log_rerror() inserts the OS error description, 
resulting in something like:

[client 127.0.0.1] (70007)The timeout specified has expired: request 
line read error

Of course, this is OS dependent.


Re: cvs commit: httpd-2.0/server protocol.c

2004-10-25 Thread Rici Lake
On 25-Oct-04, at 11:04 PM, Roy T. Fielding wrote:
This is not an error that the server admin can solve -- it is normal
life on the Internet.  We really shouldn't be logging it except when
on DEBUG level.
That was my first reaction, too. However, Ivan Ristic pointed out that
(in some cases, anyway) it is the result of a DoS attack, and there may
be something a server admin can do about it. Or, if not, at least they
might know why their server is suddenly performing badly.
For example, we had a problem report on #apache a couple of days ago
which turned out, after considerable investigation, to be the result
of a single host ip issuing hundreds of request connections in a few
minutes. Whether this was a deliberate attack or simply a buggy
client is not clear (to me) but the temporary solution of blocking
the ip address was certainly within the server admin's abilities.


Re: cvs commit: httpd-2.0/server protocol.c

2004-10-25 Thread Rici Lake
With all due respect, I don't think the following scenario
will be logged by the patch; it only reports when the attempt
to read the initial request line fails, not when the socket
is closed prior to data transfer terminating. At least, that
was the intent.
I'll leave it to Ivan to try and make the case within the
context of mod_security, if he chooses to. Reinstating the
notification message was basically his request in the first
place.
On 26-Oct-04, at 12:14 AM, [EMAIL PROTECTED] wrote:
 All major browsers will abandon pending connect threads
 for a web page whenever you hit the BACK button, as well.
 Connects in progress at the socket level will still complete
 but no data will be sent because the threads have all died.
 Happens 24x7x365.



Re: Is there a limit to using with-module directive ???

2004-10-18 Thread Rici Lake
On 18-Oct-04, at 9:03 PM, Bennett, Tony - CNF wrote:
It only builds the module specified in the last --with-module 
directive.
 Is this a limitation with that directive ???
Yes, the directive only handles one module.


Possible error in apr_rgetline_core

2004-09-21 Thread Rici Lake
Lines 250-263 of server/protocol.c:
250/* Would this overrun our buffer?  If so, we'll die. */
251if (n  bytes_handled + len) {
252*read = bytes_handled;
253if (*s) {
254/* ensure this string is terminated */
255if (bytes_handled  n) {
256(*s)[bytes_handled-1] = '\0';
257}
258else {
259(*s)[n-1] = '\0';
260}
261}
262return APR_ENOSPC;
263}
The first time through this loop, bytes_handled will be 0. If the 
buffer was provided, rather than being allocated by ap_rgetline_core, 
and the first read exceeded the maximum length (n), then line 256 will 
set the byte *before* the buffer to 0.

I believe that the change introduced at revision 1.152 is incorrect; 
the test at line 255 ensures that the buffer will not be overrun, so 
the change from bytes_handled to bytes_handled-1 in line 256 was 
unnecessary.

Rici


Re: Possible error in apr_rgetline_core

2004-09-21 Thread Rici Lake
On 21-Sep-04, at 11:03 AM, Joe Orton wrote:
pass 2: bytes_handled = 8191, *s = 8191-byte region
Quite right: I hadn't looked closely enough at the case where there was 
no user-supplied buffer :(

Ok, bytes_handled can never be  n, right?
So the following ought to work:
255if (bytes_handled  0) {
256(*s)[bytes_handled-1] = '\0';
257}
258else {
259(*s)[0] = '\0';
260}



Re: Bug 31126: Reiser4

2004-09-09 Thread Rici Lake
I want to make it clear here that I have never used Reiser4, so on a 
personal level the issue is somewhat academic.

On 9-Sep-04, at 1:11 AM, André Malo wrote:
SUSv3/POSIX disagrees as well. So I think, Mr. Reiser should just fix,
what's broken. It's not such an uncommon case to test an arbitrary file
path, if it's possible to open it.
So, Apache is only going to work on POSIX systems, you're saying?
Even under POSIX, there are cases where this particular piece of error 
analysis could fail. For example, the redundant open on 
pathname/.htaccess could return ENAMETOOLONG even though pathname was 
within limits and names a regular file. (Admittedly, that is a 
pathological case, but...)

EACCES is just the wrong error, because it has different semantics.
I tend to agree, but it's not totally clear. Reiser4's files can also 
function as directories, so ENOTDIR is incorrect. ENOENT seems like a 
reasonable choice, but POSIX also mandates the return of EACCES instead 
of ENOENT if search permissions are not present. I don't know enough 
about the Reiser filesystem to know how one specifies search 
permissions on a file which can also be a directory.

It is worth noting that in the particular circumstances which give 
rise
to this error, ap_directory_walk could tell if the path refers to a
directory or a file; it should be possible to avoid walking too far,
although there may well be edge cases I haven't thought of.

- Race condition.
What about just changing line 930 of server/request.c from
   if (r-finfo.filetype
to
   if (r-finfo.filetype  *r-path_info
All that will do is not optimise on the last segment; I don't see how 
not performing the optimization could create a race condition that 
didn't already exist.



Re: Bug 31126: Reiser4

2004-09-09 Thread Rici Lake
On 9-Sep-04, at 12:48 PM, William A. Rowe, Jr. wrote:
You miss a case with this patch...
Consider a request for /content/foo/index.php/extra-path-info
The path doesn't end where you think it does, and directory walk
would usually handle this case.  Perhaps a patch to perform an
extra sanity check on EACCES results is a better fix, at least
introduced by a compile-time flag?
I'm sure you know the code better than I do. I thought that
r-info.filetype would be 0 in this case, since extra-path-info
doesn't exist; in which case the optimisation is already being skipped.
But I might be completely out to lunch.
R,




Re: readTrivial enhancement request

2004-09-08 Thread Rici Lake
On 6-Sep-04, at 10:37 AM, Ivan Ristic wrote:
[ The request is trivial to implement (at least I think so),
  but the feature itself is very important. ]
Perhaps I don't understand the request, but wouldn't it be
straightforward for a module like mod_security to implement
this feature by using one of the connection hooks, perhaps
create_connection? Or even by registering an input filter
at the beginning of the chain?
I'm assuming that the kind of DoS attacks you are looking for
include, for example, sending request lines one octet at a time,
and that the intent is to trigger only on the first
request in a persistent connection, although conceivably it
would be a good idea to start a timeout when the first byte of
a subsequent request line comes in, in case the denial of
service attack is even more subtle. Reporting timeouts in
between requests in a persistent connection would trigger
the warning in normal operation with a compliant browser,
which is presumably undesirable.
Rici


Re: Bug 31126: Reiser4

2004-09-08 Thread Rici Lake
I put the gory details on Bugzilla 
(http://nagoya.apache.org/bugzilla/show_bug.cgi?id=31126); I honestly 
don't know what the best way to fix this problem is, though.

Basically, ap_directory_walk will, under certain circumstances, attempt 
to read an .htaccess file from a complete filepath; that is, given the 
path /path/to/file, it will *also* try /path/to/file/.htaccess. This is 
not because it has been lied to by the filesystem; it does this to 
avoid excess calls to lstat() when it thinks it can get away with it.

ap_parse_htaccess compensates for this by treating ENOTDIR as though it 
were ENOENT. The filesystems I have kicking around all seem to return 
ENOTDIR in this case, but apparently Reiser4 returns EACCES. I don't 
think this is wrong -- others may disagree -- but returning a 
security violation rather than revealing the (non)existence of a 
resource is a common approach. Given the semantics of the Reiser4 
filesystem, it is possibly even a reasonable choice. In any event, 
trying to interpret the precise semantics of error returns from random 
filesystems is a mug's game.

It is worth noting that in the particular circumstances which give rise 
to this error, ap_directory_walk could tell if the path refers to a 
directory or a file; it should be possible to avoid walking too far, 
although there may well be edge cases I haven't thought of. Otherwise, 
not optimising away the lstat() in last step of the directory walk may 
be the best solution.

Alternatively, of course, ap_parse_htaccess could simply regard *any* 
error as equivalent to the non-existence of the .htaccess file.



Re: Bug 18388: cookies

2004-08-29 Thread Rici Lake
Is it metadata about the body? In many cases, a cookie is
semantically information about the session; it might have
nothing to do with the particular body at all.
On 29-Aug-04, at 5:57 PM, Jim Jagielski wrote:
I myself would define the cookie header as an entity header,
since it *is* meta data about the body, but I can also see
it as a more traditional response header as well.
But wouldn't adding new info about the response (either
as a response header or entity header) invalidate it
actually *being* 304 (Not Modified)? In both cases,
it *is* new information about the response, in which
case the full response has, in a sense, been
modifed?



[PATCH] Corrected some problems with mod_info.c

2004-08-28 Thread Rici Lake
Bug report with patch file attached: 
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=30919

mod_info seems to get all confused in cases like this:
Location /x
Order deny,allow
Allow from all
AuthType Basic
AuthName Foo
AuthUserFile .htpasswd
LimitExcept PUT
   require valid-user
/LimitExcept
/Location
The Location block is terminated with /LimitExcept and the 
configuration scan terminates prematurely.

Changing the block to this makes it work:
Location /x
Order deny,allow
Allow from all
AuthType Basic
AuthName Foo
LimitExcept PUT
   require valid-user
/LimitExcept
AuthUserFile .htpasswd
/Location
This particularly affects viewing configuration directives in vhosts.


Re: The Byterange filter -- a new design -- feel free to rip it to shreds

2004-08-02 Thread Rici Lake
On 1-Aug-04, at 4:41 PM, Justin Erenkrantz wrote:
Instead, what mod_proxy could do is expose a user directive that says 
'no representation transformation allowed' and act accordingly: bypass 
all filters except for the network filters.
In fact, if the filter chain cannot optimise byte ranges, there is a 
good argument for simply rejecting the byte-range request altogether, 
or at least to have a configuration directive which would allow that.

Otherwise, the server cluster is open to a very simple DoS attack: 
repeatedly request one byte in an ISO image and wait for the cluster 
backbone to collapse.



Re: [patch] perchild.c (Re: setjmp/longjmp vs try/throw/catch)

2004-07-20 Thread Rici Lake
On 20-Jul-04, at 3:33 PM, Tsuyoshi SASAMOTO wrote:
Please refer to this discussion about thread safety of setjmp/longjmp:
 
http://groups.google.com/groups? 
[EMAIL PROTECTED]

The signal-to-noise ratio in that thread is very low :)
It is clear that setjmp/longjmp are *not* signal-safe; furthermore,  
signals interact badly with threads. It is certainly not ok to use  
longjmp in a signal handler, since there is no way of knowing which  
thread a signal handler might be running in. This does not make  
setjmp/longjmp thread unsafe (IMHO).

As Posix memorably states, using longjmp with a jump buffer from a  
different thread context is a questionable practice ... not worthy of  
standardization.

R.


Re: Better error logging for mod_access

2004-06-09 Thread Rici Lake
Fair enough. I guess I am being sensitive here, because the
last time I submitted a patch to some other project, I did
it with -u and got told that I should use -c. :)
Anyway, I apologise for being grumpy and look forward to comments
on the patch itself.
On 9-Jun-04, at 8:29 AM, Greg Marr wrote:
At 10:18 PM 6/8/2004, Rici Lake wrote:
The patch is now posted to bugzilla as [Bug 29450]. I believe that 
conforms to the patches.html document cited below. Although that 
document says -C3 is acceptable, I have submitted it in the 
preferential -u format (which I also prefer, actually).
It says use -u, but if you absolutely *can't* do -u, then of the 
remaining options, -C3 is the best, or at least, that is what it is 
meant to say.




Re: Better error logging for mod_access

2004-06-05 Thread Rici Lake
Quoting Joshua Slive [EMAIL PROTECTED]:

 On Sat, 5 Jun 2004, Jeff Trawick wrote:
  What about enabling via a per-dir directive?  Looks very useful, but some
 sites
might have a significant number of new entries in error log serving no
 useful
  purpose (for them).
 
 The problem with per-dir is that the user might not know exactly where in
 the config file he should be editting.  If they did, they probably
 wouldn't have a problem finding the relevant allow/deny in the first
 place.

I agree.

 
 What about keeping the old error message, but logging an *additional*
 message at loglevel debug with the additional info?

That is certainly possible but I was trying to make things as easy as possible 
for newbies. I think having both modules available, and using a -D flag to 
select which one to load would probably work out better. But that is just one 
person's opinion.

 
 Two other notes:
 - Personally, I hate arbitrary path-size limitations.  They always bite
 you when you least expect it.  What about dynamically sizing this?

That was how I did it the first time, and then I thought that someone would 
complain about the possible storage usage.

Actually, the best thing would be to use the pointer from the directive 
structure, but I could not convince myself that it was in the right pool. If 
someone who understands Apache 2's configuration system better than me said 
the filename in the directive structure is allocated in a pool which is still 
live when the directive is being used, I would happily change it.

 - You'll get more people to read your patches if you use diff -u.

Tastes vary. I can post it that way if you prefer. But two people reading it 
in 11 minutes isn't bad :)

Rici.