Re: Inclusion of mpm-itk into HEAD
On 27-Jun-07, at 1:01 PM, Joshua Slive wrote: On 6/27/07, Nick Kew [EMAIL PROTECTED] wrote: This is a problem that could be solved by documentation. Maybe not quite as simple, but when the alternative is accepting new connections whilst running as root. Here's a start: http://wiki.apache.org/httpd/Recipes/Privilege_Separation I just added a section to: http://wiki.apache.org/httpd/Recipes/ Different_UserIDs_Using_Reverse_Proxy which could also use a bit of polishing. It uses mod_rewrite to do the proxying, rather than virtual hosts, which makes it possible to avoid having to change the proxy server's configuration when adding or deleting user servers. If the user servers are listening on high ports, then they can be started as the user/group rather than as root, and the owner could have quite a bit of flexibility in configuring their server. It's quite possible that less reliance on .htaccess files would actually compensate for the additional cost of running multiple server instances.
Re: Inclusion of mpm-itk into HEAD
On 27-Jun-07, at 6:07 PM, Joshua Slive wrote: Good point. I moved some of this discussion into its own section, since it applies equally to the main example. Yes, that's much better. I also removed your comments about needing separate LockFile/etc locations, since its not true in recent versions. (These files are created with the pid of the parent process appended to ensure they are unique.) I actually tested that config file, and one of the stupid mistakes I made was not putting a PidFile directive into each config. As far as I can see, the ScoreBoardFile also needs a unique name; only the lock file has the pid appended. Also, I had to manually set UseCanonicalName Off, even though the manual says that's the 2.2 default. (This is a 2.2.4 installation from the FreeBSD port, although I didn't set up a jailed test.) In any event, if you're not starting the user servers as root, which seems like the most secure setup, then it's quite possible that they will not have write access to the /var subdirectories. If they were started in a (chroot) jail then none of that would be an issue. The other issues I ran into were fairly minor: I had to map the various modules and the mime.types file into places where the user servers could see them. I personally think this is quite a practical solution for mass virtual hosting, but it would take a bit more work to document and test. It might well be useful to have a sort of kiosk version of apache httpd, in which particular settings (for example, the listening interfaces and some of the mpm tuning directives) were locked down in some fashion. (Not everyone is going to want to use FreeBSD jails.) The goal would be to allow the user to directly write their own httpd configuration, thus avoiding the need to use .htaccess files.
Re: [mod_wombat] UserData Metatables
Brian McCallister wrote: I am leaning towards changing all the userdata types (request_rec, server_rec, conn_rec, apr_table_t) in mod_wombat to have anonymous metatables rather than the current setup where they share a metatable. +1 The drawback is that we lose the ~type safety of being able to use luaL_checkuserdata(...) to pop userdata with a specific metatable out of the stack. There are other ways of doing this. One good way is to keep the metatable as the environment of the instance methods (and, if necessary, of the metamethods). Then the check is really simple (and quite a lot faster than luaL_checkuserdata). Code (from memory, I don't have my computer with me): static Foo* check_self (lua_State *L) { if (!lua_getmetatable(L, 1) || !lua_rawequal(L, 1, LUA_ENVIRONINDEX)) { luaL_typerror(L, 1, Foo); } lua_pop(L, 1); return lua_touserdata(L, 1); } The self-test is the most common, but it is often also desirable to check individual parameters for specific types, either prescriptively or polymorphically. This can be done safely by keeping the type's name as a field in the metatable, say __typename, and also using the luaL_newmetatable mechanism in parallel. luaL_newmetable just stores the metatable in the registry with the type name as a key, so we can do the same thing also putting the type name as a key in the metatable: int my_newmetatable (lua_State *L, const char* typename) { lua_pushstring(L, typename); lua_pushvalue(L, -1); /* dup the name */ lua_gettable(L, LUA_REGISTRYINDEX); /* ... name meta-or-nil */ switch (lua_type(L, -1)) { default: { return luaL_error(L, Invalid registration of %s: not a table, typename); } case LUA_TTABLE: { /* Make sure it satisifies the constraint */ lua_getfield(L, -1, __typename); if (!lua_rawequal(L, -1, -3)) luaL_error(L, Invalid registration of %s: typename not set, typename); /* Everything's ok, fix the stack up */ lua_pop(L, 1); lua_replace(L, -2); return 0; } case LUA_TNIL: { /* Need a new table */ lua_pop(L, 1): /* S: ... name */ lua_createtable(L, 0, 4); /* S: ... name meta */ lua_insert(L, -2); /* S: ... meta name */ lua_pushvalue(L, -1): /* S: ... meta name name */ lua_setfield(L, -3, __typename); /* S: ... meta name */ lua_pushvalue(L, -2); /* S: ... meta name meta */ lua_settable(L, LUA_REGISTRYINDEX); /* S: ... meta */ return 1; } } } Now, we can verify that a metatable is correct by getting the __typename field out of it, and making sure that the registered typename is that metatable: /* Only use this with arguments (positive index); see * lauxlib for how to normalize the index if necessary * * Returns true if the indicated argument is some * valid type; in that case, leaves the typename on * top of the stack. Otherwise, returns 0 and does * not change the stack */ int *my_checksometype (lua_State *L, int argindex) { if (lua_getmetatable(L, argindex)) { lua_getfield(L, -1, __typename); /* ... meta name */ lua_pushvalue(L, -1); /* ... meta name name */ lua_rawget(L, LUA_REGISTRYINDEX); /* ... meta name registered-table */ if (lua_rawequal(L, -1, -3)) { /* OK, fix the stack */ lua_pop(L, 1); lua_replace(L, -2); return 1; } lua_pop(L, 3); /* restore the stack */ } return 0; /* Not ours */ } This leaves the typename on top of the stack for possible polymorphism. If we're just interested in a single type, we can just check with strcmp: static void* my_checktype (lua_State *L, int argindex, const char *typename) { if (my_checksometype(L, argindex) 0 == strcmp(lua_tostring(L, -1), typename)) { lua_pop(L, 1); /* Ditch the typename */ return lua_touserdata(L, argindex); } luaL_typerror(L, argindex, typename); return NULL; /* Never reached */ } A simple way to do polymorphism would be to just run through the possibilities, but if there are a lot of them, it's also possible to use a dispatch table (i.e. a Lua table) kept as an upvalue, with the typename as a key. /* See example below. int my_dispatch (lua_State *L, int argindex, int vindex, const char *msg) { const luaL_Reg *entry; if (!my_checksometype(L, argindex)) { /* In case we want to dispatch on lua types */ if (lua_isnoneornil(L, argindex)) lua_pushliteral(L, nil); else lua_pushstring(L, lua_typename(L, lua_type(L, argindex))); } /* Look it up in the dispatch table */ lua_rawget(L, vindex); entry = lua_touserdata(L, -1); if (entry) { /* Got it, just call it */ lua_pop(L, 1); return entry-func(L); } return luaL_typerror(L, argindex, msg); } To use this, we need to set up a dispatch table, and attach it to a method. A helper function comes in handy. This function expects the stack to end with the method table followed by 'nups' upvalues, and puts the dispatch table as upvalue
Re: [PATCH] mod_wombat: add table_get and table_set
A few comments intermingled into the patch: Brian McCallister wrote: On Apr 30, 2007, at 2:02 PM, Akins, Brian wrote: --- apr_lua.c (revision 0) +++ apr_lua.c (revision 0) @@ -0,0 +1,55 @@ +#include apr.h +#include apr_tables.h + +#include lua.h +#include lauxlib.h +#include lualib.h + +#define lua_unboxpointer(L,i) (*(void **)(lua_touserdata(L, i))) + +static apr_table_t* check_apr_table(lua_State* L, int index) { +luaL_checkudata(L, index, Apr.Table); +apr_table_t* t = (apr_table_t*)lua_unboxpointer(L, index); +return t; +} + +static int lua_table_set(lua_State* L) { +apr_table_t *t = check_apr_table(L, 1); +const char* key = luaL_checkstring(L, 2); +const char* val = luaL_checkstring(L, 3); + +apr_table_set(t, key, val); In Lua, setting to nil is equivalent to deletion. So I think this should be: if (lua_isnoneornil(L, 3) apr_table_unset(t, key); else { const char *val = luaL_checkstring(L, 3); apr_table_set(t, key, val); } +return 0; +} + +static int lua_table_get(lua_State* L) { +apr_table_t *t = check_apr_table(L, 1); +const char* key = luaL_checkstring(L, 2); +const char *val = apr_table_get(t, key); val might be NULL; the function should return nil in that case: if (val == NULL) lua_pushnil(L); else lua_pushstring(L, val); +lua_pushstring(L, val); +return 1; +} + +static const luaL_reg lua_table_methods[] = { +{set, lua_table_set}, +{get, lua_table_get}, +{0, 0} +}; Even though these are static, we might want to be careful in naming as these are reaching into lua's namespace (lua_* and luaL_*). Agreed. Also, it's misleading -- they are APR tables, not Lua tables. + + +int apr_lua_init(lua_State *L, apr_pool_t *p) { +luaL_newmetatable(L, Apr.Table); +luaL_openlib(L, apr_table, lua_table_methods, 0); +lua_pushstring(L, __index); +lua_pushstring(L, get); +lua_gettable(L, 2); +lua_settable(L, 1); + +lua_pushstring(L, __newindex); +lua_pushstring(L, set); +lua_gettable(L, 2); +lua_settable(L, 1); + +return 0; +} This is poor Lua style. You shouldn't assume (or require) the stack to be empty at the start of the function. Use top-relative (negative) indices instead -- they are no slower. Also use lua_{get,set}field for clarity, and luaL_register (luaL_openlib is deprecated). luaL_register(L, apr_table, lua_table_methods); luaL_newmetatable(L, Apr.Table); lua_getfield(L, -2, get); lua_setfield(L, -2, __index); lua_getfield(L, -2, set); lua_setfield(L, -2, __newindex);
Re: SSL downloads faster than non SSL?
On 3-Aug-05, at 8:11 AM, William A. Rowe, Jr. wrote: In the APR library, yes, we translate 'apr_sendfile' to TransmitFile() on win32. Some other magic occurs to obtain a file handle which can be passed to TransmitFile. But there are enough flaws in the TF() api that perhaps this would be better defaulted to 'off'. +1
Re: How can shared modules parse the configuration tree(ap_conftree)???
On 27-Jun-05, at 4:47 AM, luca regini wrote: I see that the symbols ap_conftree and ap_top_module are not defined in any header but only in the source code. Mod_info uses this symbols. I need to do something similar to what mod_info does but in a shared module. Is there a way to do it?? The fact that ap_conftree and ap_top_module are not defined in any header does mean that the API is not stable and probably will change??? I need to parse the configuration tree because i would like to chance programmatically the value of a cookie in an input filter. However the right value for the cookie is directory - dependent and per directory configuration is not available in an input filter. So my idea whas to desume the right value parsing the per directory directives i was interested in at module startup. That's a really bad idea. The fact that it is difficult should be a good hint that it is contrary to the design of Apache. I gather the problem you're running into is that the headers are read before the configuration merge is done (for reasons I think are pretty obvious). However, cookies are certainly not used until the configuration merge is done. So why not do whatever you want to do with the cookies in a hook which runs at an appropriate time? There are lots to choose from. The input filter could put whatever information it wanted to pass to the hook in the module's per-request structure.
Re: 2.1.5 available for testing
On 20-Jun-05, at 8:55 AM, jean-frederic clere wrote: and GET with content-length. I think that is not forbidden in the rfc... But what about returning HTTP_BAD_REQUEST if Content-Length is not 0? I don't believe rfc2616 forbids GET requests to have entity-bodies, so it could be argued that rejecting such a request a priori is non-conformant. Obviously, if the handler for the request does not expect an entity-body, which would be the case in all the handlers I've seen, the request ought to be rejected. On the other hand, from a pragmatic point of view, proxying GETs with entity-bodies to unknown downstream servers may cause them to misbehave. So there is a good argument for mod_proxy to reject such requests unless specifically instructed otherwise.
Re: Multiple AAA providers
On 27-May-05, at 10:53 AM, Jess Holle wrote: Russell Howe wrote:Jess Holle wrote: Is there any remaining/ongoing interest in this development area? The need to authenticate a single resource against multiple disparate (non-failover/non-redundant) LDAP servers looms large and I'd like to think that this would be part of Apache 2.2 soon... [I'd rather not have to hack this in in a narrow, special-cased, hackish way myself...] I have a JAAS LoginModule which I wrote for Jetty that does exactly this (if I understand what you mean, that is :). At work, I have our website authentication first checking OpenLDAP, then falling back to Win2k Active Directory. [By disparate/non-failover/non-redundant, I mean that each LDAP would be checked for a given user until that user entry was found (at which point no other LDAPs would be checked for the given user regardless of the success/failure of the bind). This differs from strictly failover LDAPs wherein Apache keeps trying to contact LDAP URLs until it finds one that responds (is up) and then just uses that one as the LDAP -- we have that now but it does not help in these use cases.] I want to be able to do the same from Apache, and am pretty tempted to start coding up a module to do it. That would be a great grand unified theory (and I see it as useful) but what I care most about is multiple LDAPs. If we could just have the existing mod_auth_ldap handle multiple LDAPs (beyond in a strict failover capacity) that would be *huge*. If we can't get the grand unified approach, I'd at least like to see multiple LDAP handling. I'm very interested in implementing this myself. To make what I'm doing more generally useful, I'd like to know what people expect from the implementation of Require after a multiple LDAP search. Should you be able to put the ldap server name in a Require? Or are you only concerned with require valid-user?
Re: [1.3 PATCH] rework suppress-error-charset feature
On 1-May-05, at 7:04 PM, Jeff Trawick wrote: I found a description of the problem: When Apache issues a redirect in response to a client request, the response includes some actual text to be displayed in case the client can't (or doesn't) automatically follow the redirection. Apache ordinarily labels this text according to the character set which it uses, which is ISO-8859-1. However, if the redirection is to a page that uses a different character set, some broken browser versions will try to use the character set from the redirection text rather than the actual page. This can result in Greek, for instance, being incorrectly rendered. If the action is required to compensate for a browser bug, wouldn't it be better to leave it as an environment variable and set it with BrowserMatch?
Re: Proposed patch: always cleanup brigades in ap_pass_brigade
On 26-Apr-05, at 5:28 AM, Joe Orton wrote: This is really just a should we provide extra API guarantees to prevent users shooting themselves in the foot debate, right? Let's agree to disagree :) Fair enough. I guess where I'm coming from is that I'm a user of the API, and my feet have been shot at enough times already :) In any event, a draft docstring. This does not reflect my view, and I don't know if there is actually a consensus view, but it may reflect the current state of affairs: It got linewrapped (format=flowed?) but otherwise I think this looks good - thanks a lot. Sorry about that. The first time I saw RFC 2646, I checked the date to see whether it was a prank. Sadly, no -- it's standards track. And the problem with standards is that people *will* go and implement them, no matter how dumb they might be... Final chance for any dissenter to object? Else I'll commit this... My final thoughts: We've agreed, I think, on the following: 1) The caller retains ownership of the brigade. The callee *must not* destroy it. 2) The caller has made a contract that every bucket in the brigade will remain valid until the callee returns. It does not guarantee anything beyond that, so the callee *must* explicitly setaside any buckets it wants to retain. 3) The callee may manipulate the contents of the brigade as it sees fit, during the span of the call. That is, it may insert and remove buckets, pass the brigade to another filter, read buckets (possibly morphing the brigade), etc. However, if it removes a bucket, that bucket becomes its responsibility. I hope that's clear from the docstring I wrote. So the only point of disagreement is the state of the brigade on return. Number 3 above does not permit any semantics to be attached to the state of the brigade on return; any such semantics would have to be a private agreement between caller and callee, and the filter mechanism does not lend itself to such private agreements because it is not easy to guarantee that no other filter intervenes in the filter chain. If someone wanted to stack their own filters in this manner, they could just call the filter function directly rather than using ap_pass_brigade. So the current state of play is: the callee MAY empty the brigade; the caller MUST cleanup the brigade if it wishes to reuse it. This leads to precisely the duplication of calls to apr_brigade_cleanup that people seem to be objecting to. Better in that case would be that the callee MUST empty the brigade (and, in accordance with (2) and (3) above, either delete or setaside the buckets it removes.) In that case, the caller need not cleanup the brigade; it is permitted to assume that it is empty on return and need not waste cycles on a pointless apr_brigade_cleanup. If there is any chance of agreement on this, I can produce a modified docstring on a moment's notice. :) Rici
Moving on to input filters :)
The docstring in util_filter.h for ap_get_brigade says: * Get the current bucket brigade from the next filter on the filter * stack. The filter returns an apr_status_t value. If the bottom-most * filter doesn't read from the network, then ::AP_NOBODY_READ is returned. * The bucket brigade will be empty when there is nothing left to get. The best reference for the behaviour of ap_get_brigade would appear to be core_input_filter. Its behaviour differs from this docstring; in particular, end of input is (sometimes) signalled with an eos bucket being returned in the brigade. I propose clarifying the last line of that docstring: * If block is APR_NONBLOCK_READ and no data is currently available, * the brigade will be empty on return, and the function will return * APR_SUCCESS. * * If no more data is available, the brigade will contain an eos bucket * on return, and the function will return APR_SUCCESS. A subsequent * call to ap_get_brigade will return APR_EOF without altering the * brigade. * * In no case will APR_EAGAIN be returned. Other error conditions may * be returned, with or without data in the brigade. * * Input filters are expected to conform to the above interface. The above is not quite the behaviour of core_input_filter, but I think it could be with a few minor tweaks, none of which will cost cycles. As I read the code, it behaves as follows; In MODE_READBYTES: With APR_BLOCK_READ, on EOF the filter will first return a brigade containing an EOS bucket, with status APR_SUCCESS. A subsequent call to the filter will return an empty brigade with status APR_EOF. It should not otherwise return an empty brigade. With APR_NONBLOCK_READ, an empty brigade will be returned with status APR_SUCCESS if the the socket has no data ready OR if an EOF was encountered. There appears to be no way to distinguish between these cases other than by doing a blocking read. (Note 1) In MODE_EXHAUSTIVE: regardless of read blocking, the socket bucket is simply appended to the returned brigade, followed by an EOS bucket. APR_SUCCESS is always returned. In MODE_GETLINE and MODE_SPECULATIVE: With APR_BLOCK_READ, on EOF the filter will return an empty brigade with status APR_SUCCESS. A subsequent call to the filter will return an empty brigade with status APR_EOF. (Notes 2 and 3) With APR_NONBLOCK_READ, the filter will behave as in MODE_READBYTES. Note 1: I believe this behaviour to be incorrect. Line 285 of server/core_filters.c is: else if (block == APR_BLOCK_READ len == 0) { At this point, it has already checked for APR_EAGAIN and error status returns. Consequently, a zero-length read in either blocking or non-blocking mode ought to be an EOF. I think this should be changed to: else if (len == 0) { which would insert the EOS bucket in the case of a non-blocking get_brigade. However, I don't know if it would confuse consumers who were not expecting it. It would actually probably be simpler to insert the eos bucket in the ctx-b brigade when it is built, rather than inserting it in response to various eof-like conditions later on in the code. Note 2: In MODE_GETLINE, no attempt is made to insert an eos bucket in either blocking or non-blocking mode; in the case of EOF, I believe that the socket bucket will remove itself, leading to an eventual return of APR_EOF. This seems inconsistent with MODE_READBYTES. Note 3: In MODE_SPECULATIVE, the eos bucket is deliberately not created. A comment says that the next will return an EOS bucket, but as far as I can see, once ctx-b has been emptied, as it will have been by the call to apr_bucket_delete at line 294, the next call to the filter will return an empty bucket with status APR_EOF. Finally: I think there is a possibility that the socket bucket's read function will lose data, if the underlying call to apr_socket_recv returns a non-zero-length buffer along with a non-APR_SUCCESS status. However, this may not actually ever happen. It would only really be serious in the case of an APR_EAGAIN return.
Re: Moved the ldap connection timeout call (was: Re: svn commit: r160707 )
On 26-Apr-05, at 6:58 PM, Brad Nicholes wrote: I just checked in a patch to move the ldap_set_option() call from trying to set the connection timeout globally to ldap connection specific. I don't have a good way to test that it fixes the problem, but if somebody can verify the fix, I will propose it for backport. You know that there is a bug in the OpenLDAP SDK up to about 2.2.21 which causes OpenLDAP to crash when the global connection timeout is set (ITS #3487). It ends up doing a double free because it's copying pointers instead of contents in the global config structure. I've had to explicitly set LDAPConnectionTimeout -1 in order to prevent the default timeout being applied, on a couple of installations where I couldn't upgrade the SDK.
Re: RFC: Who owns a passed brigade?
On 25-Apr-05, at 10:15 AM, Cliff Woolley wrote: 4. switch to allocate brigades out of the bucket allocator Didn't we try #4 before and something bad happened? I don't remember what that would have been, I just feel certain #4 has been previously discussed at least. I think the something bad would be the buildup of brigades in the connection pool, if some filter continues to allocate a new brigade for every invocation. Also, if a brigade was not emptied and the bucket resources needed to be freed (eg. heap buckets), those would build up as well. Allocating the brigade out of the bucket allocator but continuing to register the cleanup with a pool (say, the request pool if that were appropriate) might work but it would be fragile. The heated discussion at http://marc.theaimsgroup.com/?t=10403978553r=2w=2 is probably relevant.
Re: RFC: Who owns a passed brigade?
On 25-Apr-05, at 8:29 AM, Joe Orton wrote: I think it's fine to leave the *contents* of the brigade as undefined after a call to ap_pass_brigade. (that's essentially the case now since it's not explicitly defined to be empty: I don't know of any particular bugs which have been caused by this?) Undefined contents make me nervous. It is very easy to define the results: just do a cleanup in ap_pass_brigade. (This will also protect against some resource leaks in failed error handling, for example.) I don't know of any particular bugs which have been caused by this either, but it would be very easy to create one. Take a filter which expects the brigade to be emptied and feed it into a filter which does not empty the brigade. Then watch the output quadratically expand. I don't believe there are any such bugs in the distribution, because the only output filter in the distribution that I can find which does not empty its brigade is mod_case_filter.c (see below). Furthermore, most of the interesting filters in the distribution actually create a new brigade for every invocation, which I think we can all agree is a bad idea. Some filters, such as mod_deflate.c, do expect the brigade to be cleaned. In this case, it is unlikely that a bug will be noticed, because few third-party filters will get inserted after mod_deflate.c. If filters were modified to always use the same bucket brigade for communication with downstream filters, you might start to see this error with third-party filters which don't clean their input brigade, unless every filter religiously calls ap_brigade_cleanup after ap_pass_brigade. But if you were going to do that, you might as well put the call to ap_brigade_cleanup in ap_pass_brigade. The cost of an unnecessary call is minimal, and it simplifies the api. Note: I honestly believe mod_case_filter.c should be removed from the distribution, unless someone is willing to rewrite it as a proper model of how to write an output filter. It is not useful as a module, and it's only use in the distribution is to serve as an example for module authors, which it definitely is not.
Re: RFC: Who owns a passed brigade?
On 25-Apr-05, at 12:05 PM, Joe Orton wrote: Fragile, why? That's exactly the right approach as far as I can see. Because the pool might have a lifetime *longer than* the bucket allocator. OK, that's not likely in the current architecture, so it would be a concern only for non-httpd applications. That was a different heated debate, I think :)
Proposed patch: always cleanup brigades in ap_pass_brigade
Regardless of any other changes to the brigade API, this seems to me to be a good idea: Index: util_filter.c === --- util_filter.c (revision 158730) +++ util_filter.c (working copy) @@ -500,6 +500,7 @@ AP_DECLARE(apr_status_t) ap_pass_brigade(ap_filter_t *next, apr_bucket_brigade *bb) { +apr_status_t result = AP_NOBODY_WROTE; if (next) { apr_bucket *e; if ((e = APR_BRIGADE_LAST(bb)) APR_BUCKET_IS_EOS(e) next-r) { @@ -523,9 +524,10 @@ } } } -return next-frec-filter_func.out_func(next, bb); +result = next-frec-filter_func.out_func(next, bb); +apr_brigade_cleanup(bb); } -return AP_NOBODY_WROTE; +return result; } AP_DECLARE(apr_status_t) ap_save_brigade(ap_filter_t *f, Index: util_filter.h === --- util_filter.h (revision 158730) +++ util_filter.h (working copy) @@ -299,7 +299,8 @@ * Pass the current bucket brigade down to the next filter on the filter * stack. The filter returns an apr_status_t value. If the bottom-most * filter doesn't write to the network, then ::AP_NOBODY_WROTE is returned. - * The caller relinquishes ownership of the brigade. + * The caller retains ownership of the brigade. The brigade will be + * empty on return. * @param filter The next filter in the chain * @param bucket The current bucket brigade */
For Review: mod_ext_filter.c
Patch to use a single temporary brigade instead of creating a new one on each invocation: (also does a little code reuse) Index: mod_ext_filter.c === --- mod_ext_filter.c(revision 158730) +++ mod_ext_filter.c(working copy) @@ -70,6 +70,7 @@ #if APR_FILES_AS_SOCKETS apr_pollset_t *pollset; #endif +apr_bucket_brigade *bb_tmp; } ef_ctx_t; module AP_MODULE_DECLARE_DATA ext_filter_module; @@ -614,6 +615,12 @@ } } +/* + * Create a temporary brigade for output filter. This is not always + * used, but it's easier to just have it. + */ +ctx-bb_tmp = apr_brigade_create(f-r-pool, f-c-bucket_alloc); + if (dc-debug = DBGLVL_SHOWOPTIONS) { ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, f-r, %sfiltering `%s' of type `%s' through `%s', cfg %s, @@ -747,10 +754,8 @@ apr_status_t rv; char buf[4096]; apr_bucket *eos = NULL; -apr_bucket_brigade *bb_tmp; dc = ctx-dc; -bb_tmp = apr_brigade_create(r-pool, c-bucket_alloc); for (b = APR_BRIGADE_FIRST(bb); b != APR_BRIGADE_SENTINEL(bb); @@ -769,15 +774,14 @@ /* Good cast, we just tested len isn't negative */ if (len 0 -(rv = pass_data_to_filter(f, data, (apr_size_t)len, bb_tmp)) +(rv = pass_data_to_filter(f, data, (apr_size_t)len, ctx-bb_tmp)) != APR_SUCCESS) { return rv; } } apr_brigade_cleanup(bb); -APR_BRIGADE_CONCAT(bb, bb_tmp); -apr_brigade_destroy(bb_tmp); +APR_BRIGADE_CONCAT(bb, ctx-bb_tmp); if (eos) { /* close the child's stdin to signal that no more data is coming; @@ -800,30 +804,12 @@ } } -do { -len = sizeof(buf); -rv = apr_file_read(ctx-proc-out, - buf, - len); -if ((rv !APR_STATUS_IS_EOF(rv) !APR_STATUS_IS_EAGAIN(rv)) || -dc-debug = DBGLVL_GORY) { -ap_log_rerror(APLOG_MARK, APLOG_DEBUG, rv, r, - apr_file_read(child output), len % APR_SIZE_T_FMT, - !rv ? len : -1); -} -if (APR_STATUS_IS_EAGAIN(rv)) { -if (eos) { -/* should not occur, because we have an APR timeout in place */ -AP_DEBUG_ASSERT(1 != 1); -} -return APR_SUCCESS; -} - -if (rv == APR_SUCCESS) { -b = apr_bucket_heap_create(buf, len, NULL, c-bucket_alloc); -APR_BRIGADE_INSERT_TAIL(bb, b); -} -} while (rv == APR_SUCCESS); +rv = drain_available_output(f, bb); +if (APR_STATUS_IS_EAGAIN(rv)) { +AP_DEBUG_ASSERT(!eos); +/* should not occur, because we have an APR timeout in place */ +return APR_SUCCESS; +} if (!APR_STATUS_IS_EOF(rv)) { return rv;
For review: mod_include.c (only initialize once)
I didn't reindent so that the patch would be clearer. Index: mod_include.c === --- mod_include.c (revision 158730) +++ mod_include.c (working copy) @@ -3542,7 +3542,6 @@ intern-end_seq_len = strlen(intern-end_seq); intern-undefined_echo = conf-undefined_echo; intern-undefined_echo_len = strlen(conf-undefined_echo); -} if ((parent = ap_get_module_config(r-request_config, include_module))) { /* Kludge --- for nested includes, we want to keep the subprocess @@ -3599,6 +3598,7 @@ apr_table_setn(r-subprocess_env, QUERY_STRING_UNESCAPED, ap_escape_shell_cmd(r-pool, arg_copy)); } +} return send_parsed_content(f, b); }
Re: Proposed patch: always cleanup brigades in ap_pass_brigade
On 25-Apr-05, at 1:30 PM, Justin Erenkrantz wrote: --On Monday, April 25, 2005 12:47 PM -0500 Rici Lake [EMAIL PROTECTED] wrote: Regardless of any other changes to the brigade API, this seems to me to be a good idea: Index: util_filter.c === --- util_filter.c (revision 158730) +++ util_filter.c (working copy) @@ -500,6 +500,7 @@ AP_DECLARE(apr_status_t) ap_pass_brigade(ap_filter_t *next, apr_bucket_brigade *bb) { +apr_status_t result = AP_NOBODY_WROTE; if (next) { apr_bucket *e; if ((e = APR_BRIGADE_LAST(bb)) APR_BUCKET_IS_EOS(e) next-r) { @@ -523,9 +524,10 @@ } } } -return next-frec-filter_func.out_func(next, bb); +result = next-frec-filter_func.out_func(next, bb); +apr_brigade_cleanup(bb); } -return AP_NOBODY_WROTE; +return result; } This introduces a really large overhead as we'd be calling the cleanup for *every* output filter we invoke. So, -0.. -- justin Can you quantify that large overhead? In the case of a 'compliant' filter, it consists of: -- a function call -- an if (!APR_BRIGADE_EMPTY) which fails -- a return -- and the loss of a possible tailcall optimization. I can't imagine that all that amounts to as much as 100 cycles; if there were 20 filters in the chain and that were called every 4k of output (and both of those figures are wildly conservative), that would amount to half a cycle per byte. If you wanted to save the function call, you could replace +apr_brigade_cleanup(bb); with an open coded version of cleanup. The point is that it's fine to say somewhere filters must clean their input brigades, but what if one doesn't? That will create a bug which pops up occasionally in the interaction between two unrelated modules. Good interfaces don't have such easily eliminated gotchas, imho. In any event, if everyone knows that ap_pass_brigade is going to cleanup the brigade, then other redundant calls to apr_brigade_cleanup can be removed.
Re: Proposed patch: always cleanup brigades in ap_pass_brigade
On 25-Apr-05, at 3:36 PM, Justin Erenkrantz wrote: --On Monday, April 25, 2005 3:06 PM -0500 Rici Lake [EMAIL PROTECTED] wrote: Can you quantify that large overhead? In the case of a 'compliant' filter, it consists of: With Joe's suggestion for 'creator must destroy', this step is wholly unnecessary and only overcomplicates things. -- justin The creator must destroy the brigade when it's done with it -- if I understand Joe correctly. Creating and destroying a brigade on every invocation of a filter is a lot more overhead than a few extraneous unfulfilled while loops -- even if the brigade is allocated in a bucket_allocator. Even without the buildup of pool memory for the brigade objects, creating a brigade requires the registration (and later deregistration, during brigade_destroy) of a pool cleanup function. If we accept that the contents of a brigade are undefined when ap_pass_brigade returns, the caller has three options: -- call cleanup and reuse the brigade -- call destroy (which will first call cleanup) -- drop it on the floor and let destroy be called by the cleanup function In no case can it make any use of the brigade's contents (they're undefined), so cleanup must be called just after ap_pass_brigade returns. If ap_pass_brigade does the call itself, it reduces the number of places in the code where cleanup is called, and possibly frees resources slightly earlier in the case where the brigade creator dropped the brigade on the floor. The cost is a few extraneous if (!APR_BRIGADE_EMPTY(bb)) statements (or moral equivalent). This is actually a macro which expands to a fairly simple pointer comparison. That's hardly a complication -- it's a simplification (from the perspective of the caller of ap_pass_brigade. And it's hardly an enormous overhead. /me continues the tradition of creating flames out of bucket brigades. You'd think it would be the reverse :)
For review: teach mod_include.c to recycle brigades
Index: mod_include.c === --- mod_include.c (revision 158730) +++ mod_include.c (working copy) @@ -3060,7 +3060,7 @@ struct ssi_internal_ctx *intern = ctx-intern; request_rec *r = f-r; apr_bucket *b = APR_BRIGADE_FIRST(bb); -apr_bucket_brigade *pass_bb; +apr_bucket_brigade *pass_bb = ctx-pass_bb; apr_status_t rv = APR_SUCCESS; char *magic; /* magic pointer for sentinel use */ @@ -3076,9 +3076,6 @@ return ap_pass_brigade(f-next, bb); } -/* All stuff passed along has to be put into that brigade */ -pass_bb = apr_brigade_create(ctx-pool, f-c-bucket_alloc); - /* initialization for this loop */ intern-bytes_read = 0; intern-error = 0; @@ -3145,7 +3142,6 @@ if (!APR_BRIGADE_EMPTY(pass_bb)) { rv = ap_pass_brigade(f-next, pass_bb); if (rv != APR_SUCCESS) { -apr_brigade_destroy(pass_bb); return rv; } } @@ -3170,7 +3166,7 @@ } if (rv != APR_SUCCESS) { -apr_brigade_destroy(pass_bb); +apr_brigade_cleanup(pass_bb); return rv; } @@ -3384,7 +3380,7 @@ DEBUG_INIT(ctx, f, pass_bb); rv = handle_func(ctx, f, pass_bb); if (rv != APR_SUCCESS) { -apr_brigade_destroy(pass_bb); +apr_brigade_cleanup(pass_bb); return rv; } } @@ -3450,13 +3446,11 @@ /* if something's left over, pass it along */ if (!APR_BRIGADE_EMPTY(pass_bb)) { -rv = ap_pass_brigade(f-next, pass_bb); +return ap_pass_brigade(f-next, pass_bb); } else { -rv = APR_SUCCESS; -apr_brigade_destroy(pass_bb); +return APR_SUCCESS; } -return rv; } @@ -3542,8 +3536,11 @@ intern-end_seq_len = strlen(intern-end_seq); intern-undefined_echo = conf-undefined_echo; intern-undefined_echo_len = strlen(conf-undefined_echo); + +/* Initialize the pass brigade */ +ctx-pass_bb = apr_brigade_create(f-r-pool, f-c-bucket_alloc); } if ((parent = ap_get_module_config(r-request_config, include_module))) { /* Kludge --- for nested includes, we want to keep the subprocess * environment of the base document (for compatibility); that means @@ -3599,6 +3596,6 @@ apr_table_setn(r-subprocess_env, QUERY_STRING_UNESCAPED, ap_escape_shell_cmd(r-pool, arg_copy)); } return send_parsed_content(f, b); } Index: mod_include.h === --- mod_include.h (revision 158730) +++ mod_include.h (working copy) @@ -85,6 +85,9 @@ /* currently configured time format */ const char *time_str; +/* bucket brigade for passing to next filter */ +apr_bucket_brigade *pass_bb; + /* pointer to internal (non-public) data, don't touch */ struct ssi_internal_ctx *intern; } include_ctx_t;
Pool buckets, transient buckets and bucket_setaside
Why do pool buckets do an automatic setaside? Should they? No other bucket type does this; consequently, if you want to hold on to a bucket beyond the scope of the call in which you received the brigade, you must explicitly call bucket_setaside. Without examining the bucket type, a practice which should normally be discouraged, it is impossible to know whether a given bucket will auto-setaside or not; consequently, bucket_setaside should always be called (possibly via ap_save_brigade). So it would seem that either all bucket types should do auto-setaside or that none of them should. It would undoubtedly be convenient if all buckets did this -- it would remove the necessity for ap_save_brigade -- but I believe there is an issue with the model. Consider a pool bucket which is placed into an existing brigade. If this brigade is not explicitly cleaned up, which would defuse the pool bucket's setaside mechanism, then both the brigade and the pool bucket's cleanup functions will run when the pool is cleaned up. The bucket's cleanup function is likely to run first, however, since it is likely to be newer than the brigade it was put into. That will create an unnecessary heap copy of the bucket, which will then be free'd when the pool cleanup for the brigade runs, slightly later. If brigades are (almost) always explicitly cleaned or destroyed, and buckets are always explicitly setaside, then what is the point of the overhead of registering auto-setaside functions for every pool bucket? Conversely, is there a reasonable mechanism for avoiding the unnecessary setaside and allowing the removal of the requirement to explicitly setaside buckets?
Re: For review: teach mod_include.c to recycle brigades
On 25-Apr-05, at 4:27 PM, André Malo wrote: * Rici Lake wrote: +1 in general, but ... ...put it into the internal structure, please. That's I've created it for (binary compat and only making stuff public what matters the public). nd -- Muschelflucht-Zusatzeinrichtung. Shell-Escape ist ja noch klar, aber `Zusatzeinrichtung'? extension? Feature. -- gefunden in de.org.ccc Oops, sorry. Revised patch, includes both the bug fix and the recycling of bucket brigades, and still insufficiently tested. Also includes an explicit cleanup, while we continue thrashing out that issue. (nd -- do you have a test suite, by any chance?) Index: mod_include.c === --- mod_include.c (revision 158730) +++ mod_include.c (working copy) @@ -173,6 +173,7 @@ apr_size_tbytes_read; apr_bucket_brigade *tmp_bb; +apr_bucket_brigade *pass_bb; request_rec *r; const char *start_seq; @@ -3060,7 +3061,7 @@ struct ssi_internal_ctx *intern = ctx-intern; request_rec *r = f-r; apr_bucket *b = APR_BRIGADE_FIRST(bb); -apr_bucket_brigade *pass_bb; +apr_bucket_brigade *pass_bb = intern-pass_bb; apr_status_t rv = APR_SUCCESS; char *magic; /* magic pointer for sentinel use */ @@ -3076,9 +3077,6 @@ return ap_pass_brigade(f-next, bb); } -/* All stuff passed along has to be put into that brigade */ -pass_bb = apr_brigade_create(ctx-pool, f-c-bucket_alloc); - /* initialization for this loop */ intern-bytes_read = 0; intern-error = 0; @@ -3145,7 +3143,6 @@ if (!APR_BRIGADE_EMPTY(pass_bb)) { rv = ap_pass_brigade(f-next, pass_bb); if (rv != APR_SUCCESS) { -apr_brigade_destroy(pass_bb); return rv; } } @@ -3170,7 +3167,7 @@ } if (rv != APR_SUCCESS) { -apr_brigade_destroy(pass_bb); +apr_brigade_cleanup(pass_bb); return rv; } @@ -3384,7 +3381,7 @@ DEBUG_INIT(ctx, f, pass_bb); rv = handle_func(ctx, f, pass_bb); if (rv != APR_SUCCESS) { -apr_brigade_destroy(pass_bb); +apr_brigade_cleanup(pass_bb); return rv; } } @@ -3450,13 +3447,14 @@ /* if something's left over, pass it along */ if (!APR_BRIGADE_EMPTY(pass_bb)) { +/* return ap_pass_brigade(f-next, pass_bb); */ rv = ap_pass_brigade(f-next, pass_bb); +apr_brigade_cleanup(pass_bb); +return rv; } else { -rv = APR_SUCCESS; -apr_brigade_destroy(pass_bb); +return APR_SUCCESS; } -return rv; } @@ -3542,8 +3540,10 @@ intern-end_seq_len = strlen(intern-end_seq); intern-undefined_echo = conf-undefined_echo; intern-undefined_echo_len = strlen(conf-undefined_echo); -} +/* Initialize the pass brigade */ +intern-pass_bb = apr_brigade_create(f-r-pool, f-c-bucket_alloc); + if ((parent = ap_get_module_config(r-request_config, include_module))) { /* Kludge --- for nested includes, we want to keep the subprocess * environment of the base document (for compatibility); that means @@ -3599,6 +3599,7 @@ apr_table_setn(r-subprocess_env, QUERY_STRING_UNESCAPED, ap_escape_shell_cmd(r-pool, arg_copy)); } +} return send_parsed_content(f, b); }
Re: Pool buckets, transient buckets and bucket_setaside
On 25-Apr-05, at 5:04 PM, Cliff Woolley wrote: Why do pool buckets do an automatic setaside? Should they? They set themselves aside when the pool in which their data was allocated is in the process of being destroyed. This is necessary because no other bucket type has a similarly unpredictable lifetime. Surely this is equally true of any pool-related resource. For example, a file which has been opened and has a pool cleanup registered would have exactly the same lifetime as a pool-allocated string. Even transient buckets have a lifetime that is well-known: they last until the call chain returns. Pool buckets might not even live that long. Don't think just in terms of httpd's usage of buckets here -- this is an APR construct. I hadn't considered the case of a pool being released within the call chain :) But that would still apply equally to a file, no? Is that actually likely, though, even leaving httpd aside? Or, to put it another way, if you're not using bucket brigades in the httpd style, at *exactly what* point must you set aside a bucket. If there were any way to have transient buckets automatically set themselves aside, then I would argue that that should be done, too. Of course there isn't, since there is no such thing as a callback that says the call chain has returned now. But at least as it stands now, we have a minimum guaranteed time that all of the buckets' data will live -- as long as the call chain lives. Actually, you could achieve this very easily if a brigade had a cleanup list -- in other words, if it were a sort of degenerate pool. Then you could register the transient's auto-setaside function against the brigade's cleanup list, as long as you were prepared to ensure that brigade_cleanup were called on the brigade when the call chain returned. That's what got me started thinking about this in the first place. I think that lifetime and datatype are probably independent aspects of a bucket, and the design really ought, in some idealistic long-term definition of 'ought', to be refactored accordingly. *Any* resource might be pool-'allocated', might be transient, might be immortal. The games played with file buckets and the pool parameter to bucket_setaside are a demonstration, I think. Taking that view, pool buckets are not particularly special, or maybe I'm missing something obvious. If file and socket buckets (and other such bucket types) could be taught to do auto-setaside, then ap_save_brigade could disappear and you could actually retain a partial brigade between filter calls with a BRIGADE_CONCAT. That has a certain charm, but I have the niggling feeling that it wouldn't actually work out in practice, because of the timing of cleanup calls I referred to earlier. If brigades are (almost) always explicitly cleaned or destroyed, and buckets are always explicitly setaside, then what is the point of the overhead of registering auto-setaside functions for every pool bucket? If registering a pool cleanup that doesn't get called (on a bucket type that is rarely used in httpd, no less) has a measurable performance impact, I'd be extremely surprised. Well, me too, but then I was surprised at how much the overhead of an if empty seems to matter. :) However, if they were used a lot, the overhead might actually be measurable. It wouldn't be registering the cleanup, so much as cancelling it -- registering a cleanup only requires snapping a new link onto a chain, but cancelling it involves a linear scan through the cleanups, which could easily turn quadratic.
Re: Proposed patch: always cleanup brigades in ap_pass_brigade
On 25-Apr-05, at 4:26 PM, Joe Orton wrote: On Mon, Apr 25, 2005 at 03:58:59PM -0500, Rici Lake wrote: If we accept that the contents of a brigade are undefined when ap_pass_brigade returns, the caller has three options: -- call cleanup and reuse the brigade -- call destroy (which will first call cleanup) -- drop it on the floor and let destroy be called by the cleanup function In no case can it make any use of the brigade's contents (they're undefined), so cleanup must be called just after ap_pass_brigade returns. But in the latter two cases, adding a apr_brigade_cleanup() call to ap_pass_brigade() is completely redundant. To me, you need to make the argument that the former case is so prevalent that it's worth an additional API guarantee, and the additional (small) overhead in in ap_pass_brigade. I don't see it really... I guess that depends on whether most filters use and recycle pass brigades. I suspect that most internal filters don't (and possibly have the model which Joe Schaefer describes, where the brigade is simply passed through), whereas most third-party filters probably either do or should. Having now looked at quite a few output filters, I've come to the conclusion that the normal model is to create and destroy brigades on every invocation, which I can't help but think is a lot less efficient than it could be. But I actually disagree with the premise: the issue is not how prevalent the case is, it is how likely it is to create a bug which is triggered by an interaction between unrelated modules. Mitigating the creation of such bugs is, in my mind, an important part of component API design; particularly if the mitigation is low-cost. Of course, everyone has their own concept of beauty, and mine is probably the least important of anyone participating in this debate. In any event, a draft docstring. This does not reflect my view, and I don't know if there is actually a consensus view, but it may reflect the current state of affairs: Index: util_filter.h === --- util_filter.h (revision 158730) +++ util_filter.h (working copy) @@ -103,8 +103,8 @@ * @name Filter callbacks * * This function type is used for filter callbacks. It will be passed a - * pointer to this filter, and a bucket containing the content to be - * filtered. + * pointer to this filter, and a bucket brigade containing the content + * to be filtered. * * In filter-ctx, the callback will find its context. This context is * provided here, so that a filter may be installed multiple times, each @@ -120,10 +120,15 @@ * or output filter chains and before any data is generated to allow the * filter to prepare for processing. * - * The *bucket structure (and all those referenced by -next and -prev) - * should be considered const. The filter is allowed to modify the - * next/prev to insert/remove/replace elements in the bucket list, but - * the types and values of the individual buckets should not be altered. + * The bucket brigade always belongs to the caller, but the filter + * is free to use the buckets within it as it sees fit. Normally, + * the brigade will be returned empty. Buckets *may not* be retained + * between successive calls to the filter unless they have been + * set aside with a call apr_bucket_setaside. Typically this will + * be done with ap_save_brigade(). Buckets removed from the brigade + * become the responsibility of the filter, which must arrange for + * them to be deleted, either by doing so directly or by inserting + * them in a brigade which will subsequently be destroyed. * * For the input and output filters, the return value of a filter should be * an APR status value. For the init function, the return value should @@ -299,9 +304,13 @@ * Pass the current bucket brigade down to the next filter on the filter * stack. The filter returns an apr_status_t value. If the bottom-most * filter doesn't write to the network, then ::AP_NOBODY_WROTE is returned. - * The caller relinquishes ownership of the brigade. * @param filter The next filter in the chain * @param bucket The current bucket brigade + * + * @remark Ownership of the brigade is retained by the caller. On return, + * the contents of the brigade are UNDEFINED, and the caller must + * either call apr_brigade_cleanup or apr_brigade_destroy on + * the brigade. */ AP_DECLARE(apr_status_t) ap_pass_brigade(ap_filter_t *filter, apr_bucket_brigade *bucket);
Module code review (long). Was Re: RFC: Who owns a passed brigade?
On 22-Apr-05, at 6:19 AM, Nick Kew wrote: However, I think it is a fallacy that a cleaned-up brigade is not too big to wait for pool cleanup. The brigade itself is about four pointers; and the corresponding pool cleanup is another four pointers, so that's a total of around 32 bytes, not a lot. But suppose that a brigade is created for every 8k of filtered data, Are there applications like that in the wild? I'd have thought that sounds like the kind of case where you definitely want to re-use one brigade, with a brigade_cleanup after each 8Kb pass_brigade. Indeed. But the filter does not know, in general, how much data it will be called upon to filter, nor how many times it will be invoked. Being invoked once every 8 kb should be pretty common, though, since that seems to be the recommendation on how often a filter should pass a brigade. Are there applications like that in the wild? I don't know, but it seems likely that there are. None of the following examples are definitive, but they seem illustrative. I cite these examples only because they are available; it was not an attempt to criticize anyone's code in particular, but rather to show the result of the lack of clear documentation on how to write an output filter. In summary, of five modules examined, three of them create new brigades on every invocation without destroying the previous one. One of them fails to empty the passed brigade. One of them destroys the passed brigade. Two of them do one-time initialization on every invocation of the filter. The only output filter I looked at which seems to conform to brigades-are-retained, buckets-are-passed is mod_deflate. This is the loop from mod_deflate.c, which seems to be a reasonable model: 1 static apr_status_t deflate_out_filter(ap_filter_t *f, 2 apr_bucket_brigade *bb) 3 { // ... declarations 4 if (APR_BRIGADE_EMPTY(bb)) { 5 return APR_SUCCESS; 6 } 7 if (!ctx) { // ... determine whether we should filter the content 8 /* We're cool with filtering this. */ 9 ctx = f-ctx = apr_pcalloc(r-pool, sizeof(*ctx)); 10 ctx-bb = apr_brigade_create(r-pool, f-c-bucket_alloc); // ... more initialization 11 } 12 while (!APR_BRIGADE_EMPTY(bb)) 13 { 14 e = APR_BRIGADE_FIRST(bb); // ... check if e is metadata, process data in e 15 apr_bucket_delete(e); 16 } 17 apr_brigade_cleanup(bb); 18 return APR_SUCCESS; 19 } Notes: lines 4-6 these only serve to avoid initialization in the event that the filter only receives empty brigades. I don't know if this is necessary. line 7 all initialization is performed only once, if the ctx structure has not yet been created line 12, 15 all buckets are removed from the brigade and deleted line 17 the call to cleanup is presumably redundant since there is no break in the while block. But there could be. mod_case_filter is supposedly an example of how to write filters, so it is likely that its code will be copied by naive filter writers. Here is its loop, in essence: 1 static apr_status_t CaseFilterOutFilter(ap_filter_t *f, 2 apr_bucket_brigade *pbbIn) 3 { // ... declarations 4 pbbOut=apr_brigade_create(r-pool, c-bucket_alloc); 5 for (pbktIn = APR_BRIGADE_FIRST(pbbIn); 6pbktIn != APR_BRIGADE_SENTINEL(pbbIn); 7pbktIn = APR_BUCKET_NEXT(pbktIn)) 8 { // ... check for EOS 8 apr_bucket_read(pbktIn,data,len,APR_BLOCK_READ); // ... do the transform 10 pbktOut = apr_bucket_heap_create(buf, len, apr_bucket_free, 11c-bucket_alloc); 12 APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut); 13 } 14 return ap_pass_brigade(f-next,pbbOut); 15 } Notes: line 4: a new brigade is created on every invocation. lines 5-7: the input brigade is iterated but the buckets are not removed line 14: the input brigade is not cleaned, and the output brigade is not destroyed. OK, here's mod_charset_lite, which is flagged as a learning experience (for the author), but I believe can actually be found in the wild. The filter function itself was a bit much to include here (it does seem to empty its input brigade), but the key is in the following function which is called everytime the filter decides it has output available: 1 static apr_status_t send_downstream(ap_filter_t *f, const char *tmp, 2 apr_size_t len) 3 { // ... declarations 4 bb = apr_brigade_create(r-pool, c-bucket_alloc); 5 b = apr_bucket_transient_create(tmp, len, c-bucket_alloc); 6 APR_BRIGADE_INSERT_TAIL(bb, b); 7 rv =
Re: RFC: Who owns a passed brigade?
On 22-Apr-05, at 9:32 AM, Joe Orton wrote: The issue here is really which party can *destroy* a brigade, right? Or perhaps which party *must* destroy a brigade. This is much less of an issue if neither party creates a new brigade on every filter invocation. But some do. In the current code where brigades are never really destroyed the fact that apr_brigade_cleanup() is called everywhere is not really harmful, is it? No, what would be harmful would be apr_brigade_cleanup() not being called on a non-empty brigade. It looked to me like it was going to much simpler to adapt filters shipped in httpd to match the creator-must-destroy model, than to match a consumer-must-destroy model, despite the ap_pass_brigade docstring. That seems to be the case. It's certainly easier to audit for. Sure, but it applies equally to buckets. It seems that the consensus is creator-must-destroy-brigades; consumer-must-remove-buckets. That's not easy to audit for, which is why I think ap_pass_brigade should guarantee the latter condition. In creator-must-destroy it is still fine for a consumer to call apr_brigade_cleanup() on a brigade it is passed, of course; that's just equivalent to consuming all the buckets, after all. Sure. Consequently, it's ok for ap_pass_brigade to do the call. Then a consumer is free to either consume the buckets or not.
RFC: Who owns a passed brigade?
After a filter calls ap_pass_brigade(f-next, bb), who owns bb? The name of the function might lead one to believe that ownership of the brigade was transferred, but inspection of distributed filters shows that it has not been; both mod_deflate and mod_include expect the brigade to be usable (and apparently empty) after return from ap_pass_brigade. This would imply that the rule is that ownership of the brigade is retained by the caller, but that ownership of the buckets in the brigade is transferred to the callee. If this is the case, I believe that it should be clearly stated in the API documentation. Otherwise, it is inevitable that people writing filters will get it wrong. (And it would also suggest that the function should be renamed ap_pass_buckets()) It also leads to the question of what the expected behaviour is if the callee does not empty the passed brigade. Is it reasonable for a called filter to expect to be able to leave a bucket in the passed brigade so as to receive it on the next call? That would seem eccentric, although I believe it is what would happen in the case of mod_deflate and mod_include, at least. If the brigade is required to have been emptied when the receiving filter returns, then would it not make more sense for ap_pass_brigade (or ap_pass_buckets) to call apr_brigade_cleanup just before returning? This would avoid the situation where a called filter religiously cleans up the brigade, in order to conform to the API expectations, but the calling filter also cleans up the brigade, in order to avoid worrying about misbehaving downstream filters. If, on the other hand, buckets can be left in the brigade for retention between filter invocations, it must be clear that the passing filter must always use the same brigade for every call to ap_pass_brigade(), or otherwise arrange for buckets left in the brigade to be repassed. I note that apr_brigade_split() seems to be oriented towards a model where ownership of the brigade is passed, since it creates a new brigade. For the model where brigade ownership is retained, it would be better to have a function which shifted buckets from one brigade to another one. There are a number of possible bugs resulting from a lack of clarity on brigade ownership. For example: -- if a filter believes that brigade ownership has been passed to it, it may call apr_brigade_destroy on the brigade. This will not cause any obvious misbehaviour, but the pool cleanup on the brigade will have been removed, and if the request terminates on an error, it is conceivable that heap buckets later added to the brigade will leak. -- if a filter believes that it is passing ownership, it will not call apr_brigade_destroy on the brigade. This is only serious in the case where the filter creates a new brigade on every invocation (as, for example, mod_include does). In that case, the pool's cleanup list will grow dramatically for large requests. (There are a couple of issues in bugzilla which seem to be showing this symptom.) -- as mentioned above, if a filter does not understand that it is being passed ownership of the buckets, it may leave them in the brigade and then receive them again on a subsequent invocation. However, the upstream filter may choose to send a different brigade on the subsequent invocation, so the filter cannot rely on this behaviour. My vote would be for a clear statement that ownership of brigades is retained and ownership of buckets is transferred, along with a documented call to ap_brigade_cleanup() in ap_pass_brigade to precisely define the expected state of the brigade on return.
Re: RFC: Who owns a passed brigade?
On 21-Apr-05, at 3:58 PM, Cliff Woolley wrote: FWIW, the documentation says: The caller relinquishes ownership of the brigade. This obviously differs from what some of our own filters are doing -- and from my memory of past history. It makes sense that it should be this way, though I think at some point in the past we just guaranteed that the core_output_filter always emptied the brigades it was passed so it wouldn't have mattered until other filters that didn't behave that way started getting inserted in front of c_o_f. I'll keep digging. I had seen that note in the documentation; in fact, I wrote several filters on the basis of that statement but they misbehaved in odd ways which is why I started to dig into the question. It is clearly at variance from practice. FWIW, I think the (apparent) practice, where the caller relinquishes ownership of the buckets but not the brigade itself, is more efficient since it avoids a lot of brigade construction and destruction. But it's a slightly odd convention. I don't think it makes that much difference, provided that there is a clear statement and that everyone follows it.
Re: RFC: Who owns a passed brigade?
On 21-Apr-05, at 5:51 PM, Nick Kew wrote: Rici Lake wrote: FWIW, I think the (apparent) practice, where the caller relinquishes ownership of the buckets but not the brigade itself, is more efficient since it avoids a lot of brigade construction and destruction. Agreed. And it works for any situation, as either party can do a cleanup to keep resource-use down, and a cleaned up brigade is not too big to wait for pool cleanup. Either party can do a cleanup does not give either party the responsibility. If neither party did the cleanup, that would be a problem. Lack of clear responsibility is always a problem, and not just with computer programs. I favour having ap_pass_brigade do it. If the brigade must be cleaned before ap_pass_brigade returns, then it might as well be done by ap_pass_brigade. However, I think it is a fallacy that a cleaned-up brigade is not too big to wait for pool cleanup. The brigade itself is about four pointers; and the corresponding pool cleanup is another four pointers, so that's a total of around 32 bytes, not a lot. But suppose that a brigade is created for every 8k of filtered data, and you run 4 GB of data through that without destroying any brigades. That's half a million brigades and half a million registered cleanups (linearly searched on kill, so you better hope the cleanups are destroyed in the right order). See http://issues.apache.org/bugzilla/show_bug.cgi?id=23567 and http://issues.apache.org/bugzilla/show_bug.cgi?id=33382; Paul Querna's patch to fix the latter bug clearly shows how you can drown in empty brigades (or rather, how to avoid drowning in them). Logically it makes a lot of sense to run brigade_cleanup once per pass_brigade, and it's harmless if the brigade was already cleaned. But it's a slightly odd convention. I don't think it makes that much difference, provided that there is a clear statement and that everyone follows it. Creator-is-responsible is simple logic. Other options are more complex and bug-prone, as we then have issues like whether to destroy on every call or only on EOS. Or maybe on FLUSH Also, creator-is-responsible is a logic that can apply symmetrically to both input and output filters without mindbending. I didn't mean to say that creator-is-responsible is odd. What I meant was that it is odd that ownership of the buckets gets transfered but not of the brigades. Creator-is-responsible is fine. Transferred ownership is fine. Part of one and part the other is odd. Not unworkable -- just odd. And thus in need of clear documentation. Dropping destroy from the API altogether and leaving that to the pool would also work, AFAICS. -1, too fragile. See above bugzillas.
Re: AP_MODE_EATCRLF considered indigestible
On 18-Apr-05, at 9:43 AM, Greg Ames wrote: if the network data for pipelined request n+1 isn't already present during ap_read_request for request n, it is highly unlikely to arrive by the time we start writing the response for request n. It may well arrive during the time we're writing the response, though, if it is based on information from the response. if a new request does arrive during that window it's no big deal. we might write one small packet to the network and nothing breaks. I agree with this. I doubt whether the optimization actually buys enough to make it worth the trouble. Also, if we ever get it wrong and leave unflushed data at the end of a request, it will look like the request is stalled.
Re: mod_deflate setting Vary
On 17-Apr-05, at 5:30 PM, Roy T. Fielding wrote: On Apr 17, 2005, at 1:25 PM, [EMAIL PROTECTED] wrote: + + *) mod_deflate: Merge the Vary header, isntead of Setting it. Fixes + applications that send the Vary Header themselves, and also apply + mod_defalte as an output filter. [Paul Querna] -apr_table_setn(r-headers_out, Vary, Accept-Encoding); +apr_table_mergen(r-headers_out, Vary, Accept-Encoding); Is deflate setting Vary here because it is changing the Content-Encoding, or is this part also done for Transfer-Encoding? It's setting the Content-Encoding, at this point based on its interpretation of the Accept-Encoding header. mod_deflate does not implement Transfer-Encoding as far as I can see. It doesn't quite get the Accept-Encoding right. It treats: Accept-Encoding: gzip;q=0 as accepting gzip encoding, and it treats: Accept-Encoding: * as not accepting gzip encoding. I could not find any bug reports complaining about this, so I don't think that fixing it is high priority. But I've put it on my tuit list anyway. For what that's worth. The use of setn rather than mergen does create caching issues, although it happens to work with mod_negotiation (because mod_negotiation puts Vary: headers in err_headers_out instead of headers_out). I couldn't find any bug reports complaining about the caching issue either, but the fix is trivial. It should not be setting Vary for changes to the Transfer-Encoding. Roy
Re: svn commit: r159941 - httpd/httpd/branches/simple-conf/Makefile.in
This didn't quite get it right, either. It will overwrite the .conf file unless the .conf file is httpd.conf (rather than progname.conf) --- Makefile.in Sun Apr 17 20:51:18 2005 +++ Makefile.in.new Sun Apr 17 20:51:50 2005 @@ -78,7 +78,7 @@ if [ $$i = httpd.conf ]; then \ file=`echo $$i|sed s/.*.conf/$(PROGRAM_NAME).conf/`; \ fi; \ - if test ! -f $(DESTDIR)$(sysconfdir)/$$i; then \ + if test ! -f $(DESTDIR)$(sysconfdir)/$$file; then \ $(INSTALL_DATA) $(DESTDIR)$(sysconfdir)/original/$$i $(DESTDIR)$(sysconfdir)/$$file; \ fi; \ done ; \
Re: AP_MODE_EATCRLF considered indigestible
On 15-Apr-05, at 11:06 AM, Justin Erenkrantz wrote: The problem with this code is that it'll only read 8 characters worth of CRLFs. We'd probably want to retry the reads until we don't get more CRLF pairs. It's not determinate how many extra CRLFs we will get. Are there browsers which send more than four extraneous CRLF's but still support keepalive? If this code reads four CRLF's without finding any data, it will assume that there is no immediate incoming request, and send a flush down the filter chain. That does no harm aside from wasting some cycles; on the assumption that the number of cycles wasted is not that enormous and the number of browsers which exhibit that particular behaviour is not huge, I'd say it's not really worth completely retrying the read. I believe that the current code may exhibit the opposite behaviour: it can falsely report that data is present and thereby omit the flush with the result that a response is not terminated until the following request. If the connection is not being pipelined, the following request is unlikely to show up until the response is terminated, so that would lead to a request which stalls very near to the end. I believe I may have actually seen this symptom in real life, now that I think about it. So, yes, I'd be fine with removing EATCRLF for 2.2 if you provide a tested patch. ;-) -- justin OK, I'll start by at least compiling and testing what I posted.
Re: AP_MODE_EATCRLF considered indigestible
On 15-Apr-05, at 10:56 AM, Greg Ames wrote: the reason that this and the corresponding 1.3 BUFF logic exists is to minimize tinygrams - ip packets that are less than a full mtu size. tinygrams definately degrade network performance and trigger things like Nagle's algorithm. in this particular case we are trying to combine response data from two or more different pipelined requests into a single packet. Yes, I understood that. My question was (a) whether an HTTP response is every tiny enough to truly qualify as a tinygram, and (b) whether the work done to avoid sending them is worth the cost of not avoiding them. Clearly, some HTTP responses are less than the MSS for some connections. But they are hardly one-byte packets. It would certainly be possible to simply set NODELAY on the socket to disable Nagle, and it's not obvious to me that there would be a huge degradation of network throughput resulting from this. Still, if it truly is an issue, there has got to be a better way to accomplish it.
AP_MODE_EATCRLF considered indigestible
I was taking a look at the implementation of the renamed (but still misleading) AP_MODE_EATCRLF, in an attempt to figure out what a conforming input filter ought to do if it's handed that mode. Some comments: 1) The loop in server/core_filters.c:226 while (c str + len) { if (*c == APR_ASCII_LF) c++; else if (*c == APR_ASCII_CR *(c + 1) == APR_ASCII_LF) c += 2; else return APR_SUCCESS; } If the last character of the bucket being examined is an ASCII CR, this will first look beyond the end of the bucket's buffer to check for an LF; assuming it doesn't find one in the random byte it looks at, it will then return APR_SUCCESS even if the following bucket (if any) started with an LF. So that's presumably wrong on two counts. 2) The only place I can see that this mode is actually used is in modules/http/http_request.c, where it is used to decide whether or not to flush the filter chain before logging. the comment at line 226 of that file seems reasonable: /* * We want to flush the last packet if this isn't a pipelining connection * *before* we start into logging. Suppose that the logging causes a DNS * lookup to occur, which may have a high latency. If we hold off on * this packet, then it'll appear like the link is stalled when really * it's the application that's stalled. */ Presumably that's true whether or not the connection is pipelining; why is it acceptable to make a link appeared stalled in a pipelining connection but not a keepalive connection? 3) In general, AP_MODE_EATCRLF seems to me to be a tight coupling between a particular function in http_request.c and a particular input filter (core_input_filter). The expense is added complexity for every input filter. Removing the mode altogether would mean that either every request was flushed through the filter chain even in pipelining mode, or that ap_process_request would have to come up with some other way of detecting pending input. Unless there is good empirical evidence of the pipelined filter flush leading to performance problems, the first of these options would seem attractive. I presume the main reason to avoid flushing between pipelined requests would be a sequence of WebDAV requests with short responses, although there is also the common case of browsers pipelining a request for favicon.ico before the page actually requested by the user. But are the responses so short that the flush would have an enormous impact? The favicon.ico response from a random iconless server comes out at 458 bytes, for example.
mod_proxy handling of error returns from scheme handlers
Just a query: I was looking at the code for mod_proxy.c and I wonder if the behaviour is really correct. At line 632, it loops through the configured proxy hosts, trying proxy_run_scheme_handler on each one. If it receives any return code other than DECLINED or HTTP_BAD_GATEWAY, it accepts the result. But then at line 656, having failed to find a proxy which will handle the request, it tries the request directly. In this case, it retries on any error return if max_attempts has not been reached. This would cause a retry, for example, on an authentication required status return, which seems odd. Perhaps I am missing something. Rici
ProxyAuthOverride
This is a small patch to mod_proxy which implements a single directive: ProxyAuthOverride 403|404|Off It overrides 401 returns from the origin server and substitutes them with the indicated status code. This may be an unusual use case, but it does come up once in a while when trying to proxy origin servers which insist on sending 401 to allow the user to upgrade to an authenticated session; in some cases it may be desirable to proxy only anonymous connections to such servers. (It would be important to scrub Authorisation: headers from such a request.) A version for 2.0.52 has been lightly tested: http://primero.ricilake.net/proxy2.patch (the patch applies cleanly to 2.0.53). A version for TRUNK is also available: http://primero.ricilake.net/proxy21.patch but I am not completely certain about the interaction with balancers and the post_request hook. Comments are welcome.
Re: mod_proxy handling of error returns from scheme handlers
On 8-Apr-05, at 1:40 PM, Graham Leggett wrote: In theory (if I understand it correctly) it wouldn't - because it would only try the direct connection on DECLINED or bad gateway - if authentication was denied, the result would be neither DECLINED nor bad gateway, and so the direct connection would never be tried in this case. I guess it raises the issue of why the direct connection is attempted. Presumably it is possible that all the configured downstream proxies refused to pass on the request to the origin server, but that *this* server succeeds in passing the request to the origin server. In that case, the result could be anything that the origin server returned. No?
Re: simple-conf branch
On 6-Apr-05, at 11:43 AM, Phil Lello wrote: As part of this, it could be useful to generate a RunningConfig.cnf file as part httpd startup, which would be a merged config file with comments indicating which file set the option (and possibly which options have taken defaults). You can get easily get most of that information with the version of mod_info in 2.1.
Re: Default Modules
On 6-Apr-05, at 12:56 PM, Mads Toftum wrote: mod_asis: yes - no I'd prefer - most as it is rarely used but not totally useless. Others mentioned mod_ssl which I think is too much trouble to be worth enabling other than when requested explicitly - there's the whole crypto regs issue and it does link in another lib, which is something that I prefer limiting to when it is actually needed. So what are the criteria for yes, no, most and all? I think it would be more productive to come up with common criteria than to argue about individual modules. Perhaps all should just go away. It obviously does not really mean all and it is hard to come up with a good description of what it does mean. As I said earlier, a lot of people seem to be surprised at what all does not include (ssl and proxy, for example). And as no tool is provided to list what all and most actually do, people are pretty well left to their own intuitions. The current list (as produced by the shell script I pasted in an earlier message) is: 3. all cern_meta 3. all log_forensic 3. all mime_magic 3. all unique_id 3. all usertrack 3. all version So what do those six modules have in common?
Re: Flushing lingering CGI processes
On 31-Mar-05, at 6:16 AM, Nick wrote: As a refinement; The timer firstly sends a signal to the CGI program which may be caught by the CGI program to generate some form of output. This output will prove whether the http connection is still open. If the CGI program doesn't output anything, a TERM is sent. If still no output and process is still running, a KILL is sent. This seems overly complex to me, and furthermore a bit difficult to actually implement in a CGI. How is a CGI to know whether it is hung? If it could figure that out, it could kill itself easily enough, but I believe it to be a computationally intractable problem related to the halting problem; certainly it is not something I'd like to try in a bash script. :) It seems to me that a simple timer would be sufficient; however, it would be nice to be able to configure individual CGIs (or directories of CGIs) with longer or shorter timers; I have seen instances of webapps with CGIs which compute for several minutes, or even longer. (Presumably, the users of such webapps are very patient or really need the results.) Rici
Re: Reformat httpd-win.conf
On 30-Mar-05, at 12:52 PM, Sander Striker wrote: #EnableSendfile off I'd suggest uncommenting that for a win32 config file.
Re: svn commit: r125465 httpd/httpd/trunk/modules/ldap/util_ldap.c
On 22-Mar-05, at 5:17 PM, Graham Leggett wrote: This is also broken - the LDAPTrustedClientCert is supposed to have scope on a per directory basis. To fix this, we would need to add a directory config creator, and a directory config merger, is that correct? Sure, but then there is still a problem with .htaccess files. The client_certs array will have been allocated in the request pool during the merge operation for the .htaccess file, but it then may be copied into the global connection cache; currently, the code only copies the array header (at line 525). So you'd end up with dangling pointers after the request finished. If, on the other hand, the strings were copied into the server or config pools, then they would slowly consume memory. The most plausible solution might be to manually manage connection cache memory with malloc instead of using pool-allocated memory. Although now that I look at the file again, I see that it never would have worked anyway because at line 1546, the function util_ldap_set_trusted_client_cert stores the certificate in st-global_certs instead of st-client_certs. I also wonder about the two apr_array_append calls at line 1671 in util_ldap_merge_config. The second one would mean that the client certs specified in LDAPTrustedClientCerts would be appended to the list of client certs inherited from some containing section. This might be counter-intuitive if the certs are supposed to be directory scoped. I'm not sure what the use case for this directive would be, so it's hard to know for sure.
util_ldap.c redux
While I've got util_ldap.c open in front of me, I also noticed that the following code occurs a number of times: /* * If any LDAP operation fails due to LDAP_SERVER_DOWN, control returns here. */ start_over: if (failures++ 10) { return result; } if (LDAP_SUCCESS != (result = util_ldap_connection_open(r, ldc))) { return result; } // do something with ldc and goto start_over if it returns LDAP_SERVER_DOWN Now, util_ldap_connection_open either returns immediately, if the connection appears to be bound, or it tries to bind the connection, looping 10 times on LDAP_SERVER_DOWN (line 337, I agree with the comment at that point in the code). So in order for the start_over loop to occur more than twice, it would be necessary that: some operation returns LDAP_SERVER_DOWN; util_ldap_connection_open succeeds in reopening the connection with less than 10 retries; but the operation returns LDAP_SERVER_DOWN again. Not only does this seem unlikely, it also seems like a situation where it would be better to return a 503 than to keep retrying. The correct logic would be something like: 1. if the connection is bound, try the operation 2. if the operation returns LDAP_SERVER_DOWN or the connection was not bound, call util_ldap_connection_open and if it returns success, try the operation I'm not sure under what circumstances various LDAP SDKs return LDAP_SERVER_DOWN, but I suppose it would generally either be because: 1) an existing connection was dropped, perhaps because of an idle time-out 2) there is no LDAP server listening on the interface at all and the connection was refused 3) the LDAP server failed to accept the connection within some timeout. In the first case, one or two retries are in order. In the second case, retrying is unlikely to change the situation at all. And in the third case, the user is likely to have long since gone somewhere else. So I think that 10 retries within connection_open is excessive as well; three is probably more than enough.
Re: svn commit: r125465 httpd/httpd/trunk/modules/ldap/util_ldap.c
The referenced commit added two OR_ALL directives to util_ldap: LDAPTrustedClientCert LDAPTrustedMode However, that module has no per-directory config struct; all configuration is stored in per-server structs. Consequently, as far as I can see, the OR_ALL directives are unworkable. In particular, LDAPTrustedClientCert, if invoked in an .htaccess file, will: a) store a pointer to request-pool-allocated strings in a config-pool-allocated array b) tromp on changes being made in another thread reading an .htaccess file in the same vhost. Even if it were not for that, it seems odd to me that the configuration merge function appends the parent array of client certs to the current array of client certs; I think that is counter to the idea that a directory might have its own local collection of client certs. (Accumulating the global certs seems more reasonable.) As it stands, if my analysis is correct, the use of LDAPTrustedClientCert in an .htaccess file would probably cause a child to segfault; in a threaded mpm, the consequences would be even more serious. Until a more thorough review is done, it would probably be a good idea to change the directives to RSRC_CONF, or disable them altogether. Rici
Re: do we still want sendfile enabled with our default conf files?
Bill, What's the performance problem? There was someone on #apache yesterday who claimed to experience the following problem: On a local switched network, Apache2 on Win2K3 transferred at 800kbps, while IIS on the same box transferred at 8-11mbps. When he replaced the cable to the server with a shorter one, Apache speeded up to 4 mpbs but IIS still ran at full speed. He tried it with both Windows and Mac OS X clients, and experienced the same symptom. Could this be related to sendfile? Rici. On 17-Mar-05, at 11:12 AM, Bill Stoddard wrote: Joshua Slive wrote: Jeff Trawick wrote: IMHO, all default conf files except for highperformance[-std].conf should have the EnableSendfile off directive uncommented, under the theory that people who want the extra performance can take an action to re-enable sendfile and then pay attention to the web server behavior afterwards but people who just want the darn thing to work shouldn't be bothered. Some percentage of the latter group will have problems they are ill-equipped to handle, because some feature that is not important to them is enabled. We've discussed this before and nothing ever happened. I'm willing to bet that the experiences of a few more unlucky users since the last discussion may have shifted the balance. Comments? +1, as before. From the users' perspective, sendfile results in unexplained corruption or uninterpretable error messages pretty much any time a network filesystem is used to host content (and random other times on win32). Hey Joshua, Guess I've not been paying attention... what type of problems have you seen with sendfile (TransmitFile) on Win32? We do have a performance problem (will leave out details unless your interested) but I was not aware of corruption problems... Bill
Re: do we still want sendfile enabled with our default conf files?
OK, the person came back, I got him to disable sendfile, and apache started transferring at full speed. I believe sendfile should be disabled by default at least on Windows. Rici On 17-Mar-05, at 11:48 AM, Rici Lake wrote: Bill, What's the performance problem? There was someone on #apache yesterday who claimed to experience the following problem: On a local switched network, Apache2 on Win2K3 transferred at 800kbps, while IIS on the same box transferred at 8-11mbps. When he replaced the cable to the server with a shorter one, Apache speeded up to 4 mpbs but IIS still ran at full speed. He tried it with both Windows and Mac OS X clients, and experienced the same symptom. Could this be related to sendfile? Rici. On 17-Mar-05, at 11:12 AM, Bill Stoddard wrote: Joshua Slive wrote: Jeff Trawick wrote: IMHO, all default conf files except for highperformance[-std].conf should have the EnableSendfile off directive uncommented, under the theory that people who want the extra performance can take an action to re-enable sendfile and then pay attention to the web server behavior afterwards but people who just want the darn thing to work shouldn't be bothered. Some percentage of the latter group will have problems they are ill-equipped to handle, because some feature that is not important to them is enabled. We've discussed this before and nothing ever happened. I'm willing to bet that the experiences of a few more unlucky users since the last discussion may have shifted the balance. Comments? +1, as before. From the users' perspective, sendfile results in unexplained corruption or uninterpretable error messages pretty much any time a network filesystem is used to host content (and random other times on win32). Hey Joshua, Guess I've not been paying attention... what type of problems have you seen with sendfile (TransmitFile) on Win32? We do have a performance problem (will leave out details unless your interested) but I was not aware of corruption problems... Bill
ProxyRemoteMatch brokenness
The ProxyRemoteMatch directive is supposed to use a regex to redirect certain proxy requests to a remote proxy server (as I understand the documentation). I actually needed that for a configuration (see below) and was puzzled to find that it doesn't work: (line numbers from APACHE_2_0_BRANCH, because I can't find a web interface to svn) 389 : p2 = ap_strchr_c(ents[i].scheme, ':'); /* is it a partial URL? */ 390 : if (strcmp(ents[i].scheme, *) == 0 || 391 : (ents[i].use_regex ap_regexec(ents[i].regexp, url, 0,NULL, 0)) || 392 : (p2 == NULL strcasecmp(scheme, ents[i].scheme) == 0) || 393 : (p2 != NULL 394 : strncasecmp(url, ents[i].scheme, strlen(ents[i].scheme)) == 0)) { ap_regexec returns 0 on success, so the condition in line 391 matches if the regex didn't match, which is the reverse of the expected behaviour. Changing line 391 to 391 : (ents[i].use_regex ap_regexec(ents[i].regexp, url, 0,NULL, 0) == 0) || produced the expected behaviour [1]. This code seems to have been unchanged since the directive was introduced in 2.0.35, almost three years ago, and I cannot find any relevant bug reports; in fact, googling for ProxyRemoteMatch did not yield any indication that anyone has ever tried to use the directive, much less succeeded. This would seem to be a reasonable case for deleting the directive, although I am now using it (with a patched httpd, of course) so I would be slightly resistant to this. The particular configuration I'm using is to tunnel http and https through an ssh tunnel to a gateway machine inside a firewalled network; the relevant hostnames are not in the external DNS, of course, so I wanted to proxy only requests which match the internal domain suffix. This is not a feature of any browser I know of; browsers seem to have proxy exceptions rather than proxy affirmations. So I ended up with two proxy servers, one on a gateway in my local network, and another one on the gateway in the remote network; the two gateways are connected through an ssh tunnel: Local gateway: ProxyRequests On Proxy * Order allow,deny allow from 10. /Proxy ProxyRemoteMatch ^(http://)?[^/]*.internal.dns.suffix http://localhost: # (http://)? is to also allow proxying of CONNECT. Surprisingly, it works. Remote gateway (separate apache instance, this is almost the entire config file): Listen localhost: ProxyRequests On AllowCONNECT various ports listening for https where the ssh tunnel connects port on the local gateway to localhost: on the remote gateway. If anyone has a better way of accomplishing this, I'll withdraw my objection to deprecating ProxyRemoteMatch. --- [1]: I would personally have written that code somewhat differently to avoid doing non-regex matches in the regex case: if (ents[i].use_regex) { if (ap_regexec(ents[i].regexp, url, 0, NULL, 0) != 0) continue; } else if (strcmp(ents[i].scheme, *) != 0) { if (ap_strchr_c(ents[i].scheme, ':') == 0) { if (strcasecmp(scheme, ents[i].scheme) != 0) continue; } else { if (strncasecmp(url, ents[i].scheme, strlen(ents[i].scheme) != 0) continue; } } /* if we get here, we've got a match */
Proposal: mod_acceptfilter
With reference to [Re: svn commit: r111596] The following proposal attempts to provide a simple way of configuring a server which supports multiple protocols (http, ftp, nntp) while allowing the administrator and/or packager to configure OS kernel- implemented accept filters, as provided by FreeBSD and to a lesser extent by Linux. (And probably other OS's.) On 13-Dec-04, at 11:00 AM, Paul Querna wrote: I just tested it here, and it Fails on FreeBSD if accf_http is loaded in the kernel. Perhaps both the accept filter and defer accept code belongs somewhere in modules/http/* ? It seems to me that accept filters need to be configured for specific protocols and in a way consistent with the specific operating system. After an interesting discussion on IRC, I volunteered to write up the following proposal (which I take full responsibility for; I don't claim that it represents any consensus). The essential point here is that accept filters need to be established before the call to accept(). Consequently, the setting will be specific to the listener. In the current architecture, the protocol module is not invoked until too late to make this determination. It therefore seems reasonable to provide a separate module, mod_acceptfilter, which handles the configuration of accept filters. However, protocol modules will need to indicate which protocol they handle. This could be done by a callback which registers the protocol[s] which they handle, or it could be done by using the name of the protocol module directly as a key. I personally prefer the former solution, but the second one might be possible for legacy-compatibility. The proposal is then: 1. Core directives 1.a: The Listen directive is extended to allow the optional specification of a protocol: Listen [transport/][ip:]port [protocol] where transport is either a domain specification from protocols(5), like tcp or udp, or it might even (in some future time) be a specification like namedpipe (although the use of transports which do not identify themselves with ip numbers implies significant changes). (This is not core to the proposal.) port is either a number or a service from services(5) protocol is either the name of a protocol registered by a protocol module or the name of the protocol module itself. If protocol is omitted, then if a service name is provided and a protocol module has registered that name, then that name is used to bind the protocol module to the listener. Otherwise, http is assumed. 1,b: The assignment of protocols to listeners can be overridden in a VirtualHost section by means of the new Protocol directive: Protocol registered-protocol or module-name The argument is interpreted as above. 1.c: protocol modules register themselves as providing a protocol by providing a hook to ap_register_protocol; this is called with the name of a protocol, and the module either returns OK or DECLINED. 2. mod_acceptfilter This new module would implement two directives; DefaultAcceptFilter (protocol | *) (os-dependent identifier+ | none) AcceptFilter (os-dependent identifier+ | none) The first of these is only available in the main configuration section; the second only in VirtualHost sections. Although this specification makes it possible to use the same directive name for both options, I personally believe that that particular form of parsimony is confusing for people writing configuration files. If there is an AcceptFilter directive within the VirtualHost section applicable to a particular ip:port, that directive will apply. Otherwise, if there is a DefaultAcceptFilter for the protocol bound to the VirtualHost section, that directive will apply. Otherwise, none is assumed. mod_acceptfilter will then use the first AcceptFilter in the list which is available, using an interface provided by APR. If no accept filter is available, none will be used (that is, no accept filter will be placed on the connection.) This is not to be considered an error. The intent is to allow Apache packagers for specific OS distributions to create reasonable default accept filter configurations for the particular of the particular OS. The FreeBSD mechanism, for example, is extensible through kernel modules. So, a FreeBSD packager who had also implemented an accf_ldap filter could provide the directive: DefaultAcceptFilter ldap accf_ldap accf_data If some protocol module had registered as providing the ldap protocol, mod_acceptfilter would attempt to enable accf_ldap on listeners assigned to the ldap protocol; failing that, it would attempt to enable accf_data. In the event that the config file was being used on, for example, Linux, APR might be able to substitute the Linux equivalent of accf_data. Regardless of the DefaultAcceptFilter specifications, an Apache administrator with specific needs could simply place an AcceptFilter directive in a particular virtual host. - Config file
Proposal: Listen sections
(An addendum to the previous mod_acceptfilter proposal.) From my experience on #apache, it appears that Mere Mortals have a great deal of difficulty successfully configuring virtual hosts. I think this is understandable; the algorithm used to allocate requests to virtual host sections is complicated, and it conflates two radically different concepts: the assignment of a listener to a particular ip:port (or a particular ip), and the assignment of named virtual hosts within an ip[:port] listener. Successfully configuring a name virtual host, in the current configuration system, requires co-ordinating three directives which often appear in different parts of the configuration: Listen, NameVirtualHost, and VirtualHost Minor discrepancies in the ip:port specifications in these directives can cause named virtual hosts to fail to work without providing any useful error messages. In addition, not all protocols support the concept of named virtual hosts. I'm interested in peoples' reactions to the following proposal: Directive: Listen ([transport /](ip | * | _default_)[: port])+ Context: main configuration This is the combination of an existing Listen directive with an existing (non-named) Virtual Host directive; that is, you could replace Listen 1.2.3.4:80 VirtualHost 1.2.3.4:80 #... /VirtualHost with Listen 1.2.3.4:80 # ... /Listen Directive: NameVirtualHost (name | _default_)+ Context: Listen section This provides the definition of a single named virtual host within a single Listener; it replaces the existing NameVirtualHost and VirtualHost directives. Exactly one NameVirtualHost must be tagged with the _default_ name; if there is one it will be used if no name matches or no virtual host name is provided by the client. It is an error to provide a NameVirtualHost in a Listen section for a protocol which does not support named virtual hosts. It is an error to provide more than one NameVirtualHost _default_ section in the same Listen section. If no NameVirtualHost _default_ is provided, the first NameVirtualHost section will be used as a default, for backwards pseudo-compatibility, but a warning will be generated. --- Example configurations: 1) Simple configuration, a number of virtual hosts with different document roots: Current: Listen *:80 #... #... NameVirtualHost *:80 #... #... VirtualHost *:80 ServerName this.does.not.match.any.name.com DocumentRoot /var/www/noname DirectoryIndex stopUsingNetscape2.html /VirtualHost VirtualHost *:80 ServerName www.example.com ServerAlias example.com DocumentRoot /var/www/example /VirtualHost VirtualHost *:80 ServerName www.example2.com ServerAlias example2.com DocumentRoot /var/www/example2 /VirtualHost Proposed: - Listen *:http NameVirtualHost _default_ DocumentRoot /var/www/noname DirectoryIndex stopUsingNetscape2.html /NameVirtualHost NameVirtualHost www.example.com example.com DocumentRoot /var/www/example /NameVirtualHost NameVirtualHost www.example2.com example2.com DocumentRoot /var/www/example2 /NameVirtualHost /Listen 2) A configuration with named virtual hosts on one ip and non named virtual hosts on another one Current: Listen *:80 #... #... NameVirtualHost 1.2.3.4:80 #... #... VirtualHost 1.2.3.4:80 ServerName this.does.not.match.any.name.com DocumentRoot /var/www/noname DirectoryIndex stopUsingNetscape2.html /VirtualHost VirtualHost 1.2.3.4:80 ServerName www.example.com ServerAlias example.com DocumentRoot /var/www/example /VirtualHost VirtualHost 1.2.3.4:80 ServerName www.example2.com ServerAlias example2.com DocumentRoot /var/www/example2 /VirtualHost VirtualHost 1.2.3.5:80 # This is not a named virtual host, but it is # not locally obvious ServerName www.example3.com DocumentRoot /var/www/example3 /VirtualHost Proposed: - Listen 1.2.3.4:http NameVirtualHost _default_ DocumentRoot /var/www/noname DirectoryIndex stopUsingNetscape2.html /NameVirtualHost NameVirtualHost www.example.com example.com DocumentRoot /var/www/example /NameVirtualHost NameVirtualHost www.example2.com example2.com DocumentRoot /var/www/example2 /NameVirtualHost /Listen Listen 1.2.3.5:http ServerName www.example3.com DocumentRoot /var/www/example3 /Listen -- 3) I am aware that this proposal does not provide for the current possibility of doing something like: Listen 1.2.3.4:80 Listen 192.168.1.1:80 NameVirtualHost *:80 VirtualHost 1.2.3.4:80 # Only available on the external interface ServerName www.example.com /VirtualHost VirtualHost *:80 # Available on both interfaces ServerName www.example2.com /VirtualHost VirtualHost 192.168.1.1:80 # Only available on the internal interface ServerName www.example3.com /VirtualHost I don't believe this is a common configuration case, although it certainly exists. Such
Re: mod_comment (was: svn commit: r106879 ...)
On 29-Nov-04, at 11:02 AM, William A. Rowe, Jr. wrote: Yes, but our API dictates that they are looking for ... (in the case of this module) - which is a bad thing IMHO (and in Greg's and Ryan's as well)... Sorry, I don't understand that. The search for is in the first line of the directive; arg is simply the arg delivered by the config apparatus. The point of the following offending code is to ensure that the directive was: Comment This is a comment and not: Comment This is a comment I'm actually neutral on whether it is worth doing this check, but all the other section directives do it, and while it seems a bit puritanical, it does enforce the vaguely HTML-like section syntax. This is the offending code; endp = ap_strrchr_c(arg, ''); if (endp == NULL) { return unclosed_directive(cmd); } So how is that different from parsing any other standard directive? It is no different than mod_access, for example, checking that the first argument to Allow is the word from. and this is the good code; *(ap_directive_t **)dummy = NULL; return ap_soak_end_container(cmd, Comment); It will be noted that this module does not in any way refer to the fp hidden away in the structs. All it does is use the public API. If the internal functioning of that API changes, the module will continue to work. The problem is, I doubt this module inserts its contents into the config tree. That is true, it doesn't. The *(ap_directive_t **)dummy = NULL; statement tells the config apparatus that there are no directives to insert. This is the part of the module that I personally object to; that behaviour is non-standard, undocumented, and a kludge. However, everybody does it. So if we spew the contents (using mod_info, or any other tool based on our config tree API) we lose the contents. I would be happy to fix this :) It is worth noting that EXEC_ON_READ directives are all removed from the config_tree, so that mod_info cannot show useful things like LoadModule directives.
Re: svn commit: r106879 - /httpd/httpd/trunk/CHANGES /httpd/httpd/trunk/server/core.c
On 29-Nov-04, at 4:05 AM, André Malo wrote: * Paul Querna wrote: What is wrong with IfDefine 0? It's ugly and non-intuitive. Just out of curiousity, what do people think is wrong with the Comment directive implemented by this micro-module: https://ssl.bulix.org/projects/rici/file/apache/mod_comment.c?rev=49
Re: mod_comment (was: svn commit: r106879 ...)
On 29-Nov-04, at 10:10 AM, Greg Stein wrote: So no... I don't like mod_comment. It uses a feature that I dearly wish to see removed from httpd. I agree 100% on removing that feature. If it were gone, something like mod_comment would have to be implemented in the core, I suppose. However, I was not really proposing it as a module (except maybe cheekily); rather as a proposed syntax which seems to me to be more user-friendly than IfDefine 0. Of course, tastes differ. I'll leave my comments on rewriting the config parsing (which I agree needs to be done) for a longer message; Paul Querna and I have been talking about this, too. However, I don't think it is fair to simply point the finger at the invasive mod_perl. For example, mod_macro is a very simple, easily audited module, which is extremely useful, and which requires exactly the same access to the configuration stream. By the same token, the environment variable expansion in config files is a hack which would have been better put into an optional module. All of this suggests that some mechanism ought to exist for modules to intervene in the configuration process; I think we are completely in agreement that the current mechanism is not the right one. Rici
Re: svn commit: r106879 - /httpd/httpd/trunk/CHANGES /httpd/httpd/trunk/server/core.c
Rephrasing the question slightly: On 29-Nov-04, at 9:44 AM, Rici Lake wrote: Just out of curiousity, what do people think is wrong with the Comment directive implemented by this micro-module: I meant to ask: What do people think about a Comment directive, which would simply absorb the configuration stream up to the matching /Comment line, parsing only to the extent necessary to match section opens and closes? Thanks to Greg Stein for pushing me to be clearer about the question. Rici
Re: cvs commit: httpd-2.0/server protocol.c
On 25-Oct-04, at 3:37 AM, Graham Leggett wrote: +else if (r-connection-keepalive != AP_CONN_KEEPALIVE) { +ap_log_rerror(APLOG_MARK, APLOG_NOTICE, rv, r, + request line read error.); +} Is it possible to put a more descriptive message than request line read error. If I was to see that in a logfile, I would have absolutely no idea whatsoever what it was trying to tell me :( I believe that ap_log_rerror() inserts the OS error description, resulting in something like: [client 127.0.0.1] (70007)The timeout specified has expired: request line read error Of course, this is OS dependent.
Re: cvs commit: httpd-2.0/server protocol.c
On 25-Oct-04, at 11:04 PM, Roy T. Fielding wrote: This is not an error that the server admin can solve -- it is normal life on the Internet. We really shouldn't be logging it except when on DEBUG level. That was my first reaction, too. However, Ivan Ristic pointed out that (in some cases, anyway) it is the result of a DoS attack, and there may be something a server admin can do about it. Or, if not, at least they might know why their server is suddenly performing badly. For example, we had a problem report on #apache a couple of days ago which turned out, after considerable investigation, to be the result of a single host ip issuing hundreds of request connections in a few minutes. Whether this was a deliberate attack or simply a buggy client is not clear (to me) but the temporary solution of blocking the ip address was certainly within the server admin's abilities.
Re: cvs commit: httpd-2.0/server protocol.c
With all due respect, I don't think the following scenario will be logged by the patch; it only reports when the attempt to read the initial request line fails, not when the socket is closed prior to data transfer terminating. At least, that was the intent. I'll leave it to Ivan to try and make the case within the context of mod_security, if he chooses to. Reinstating the notification message was basically his request in the first place. On 26-Oct-04, at 12:14 AM, [EMAIL PROTECTED] wrote: All major browsers will abandon pending connect threads for a web page whenever you hit the BACK button, as well. Connects in progress at the socket level will still complete but no data will be sent because the threads have all died. Happens 24x7x365.
Re: Is there a limit to using with-module directive ???
On 18-Oct-04, at 9:03 PM, Bennett, Tony - CNF wrote: It only builds the module specified in the last --with-module directive. Is this a limitation with that directive ??? Yes, the directive only handles one module.
Possible error in apr_rgetline_core
Lines 250-263 of server/protocol.c: 250/* Would this overrun our buffer? If so, we'll die. */ 251if (n bytes_handled + len) { 252*read = bytes_handled; 253if (*s) { 254/* ensure this string is terminated */ 255if (bytes_handled n) { 256(*s)[bytes_handled-1] = '\0'; 257} 258else { 259(*s)[n-1] = '\0'; 260} 261} 262return APR_ENOSPC; 263} The first time through this loop, bytes_handled will be 0. If the buffer was provided, rather than being allocated by ap_rgetline_core, and the first read exceeded the maximum length (n), then line 256 will set the byte *before* the buffer to 0. I believe that the change introduced at revision 1.152 is incorrect; the test at line 255 ensures that the buffer will not be overrun, so the change from bytes_handled to bytes_handled-1 in line 256 was unnecessary. Rici
Re: Possible error in apr_rgetline_core
On 21-Sep-04, at 11:03 AM, Joe Orton wrote: pass 2: bytes_handled = 8191, *s = 8191-byte region Quite right: I hadn't looked closely enough at the case where there was no user-supplied buffer :( Ok, bytes_handled can never be n, right? So the following ought to work: 255if (bytes_handled 0) { 256(*s)[bytes_handled-1] = '\0'; 257} 258else { 259(*s)[0] = '\0'; 260}
Re: Bug 31126: Reiser4
I want to make it clear here that I have never used Reiser4, so on a personal level the issue is somewhat academic. On 9-Sep-04, at 1:11 AM, André Malo wrote: SUSv3/POSIX disagrees as well. So I think, Mr. Reiser should just fix, what's broken. It's not such an uncommon case to test an arbitrary file path, if it's possible to open it. So, Apache is only going to work on POSIX systems, you're saying? Even under POSIX, there are cases where this particular piece of error analysis could fail. For example, the redundant open on pathname/.htaccess could return ENAMETOOLONG even though pathname was within limits and names a regular file. (Admittedly, that is a pathological case, but...) EACCES is just the wrong error, because it has different semantics. I tend to agree, but it's not totally clear. Reiser4's files can also function as directories, so ENOTDIR is incorrect. ENOENT seems like a reasonable choice, but POSIX also mandates the return of EACCES instead of ENOENT if search permissions are not present. I don't know enough about the Reiser filesystem to know how one specifies search permissions on a file which can also be a directory. It is worth noting that in the particular circumstances which give rise to this error, ap_directory_walk could tell if the path refers to a directory or a file; it should be possible to avoid walking too far, although there may well be edge cases I haven't thought of. - Race condition. What about just changing line 930 of server/request.c from if (r-finfo.filetype to if (r-finfo.filetype *r-path_info All that will do is not optimise on the last segment; I don't see how not performing the optimization could create a race condition that didn't already exist.
Re: Bug 31126: Reiser4
On 9-Sep-04, at 12:48 PM, William A. Rowe, Jr. wrote: You miss a case with this patch... Consider a request for /content/foo/index.php/extra-path-info The path doesn't end where you think it does, and directory walk would usually handle this case. Perhaps a patch to perform an extra sanity check on EACCES results is a better fix, at least introduced by a compile-time flag? I'm sure you know the code better than I do. I thought that r-info.filetype would be 0 in this case, since extra-path-info doesn't exist; in which case the optimisation is already being skipped. But I might be completely out to lunch. R,
Re: readTrivial enhancement request
On 6-Sep-04, at 10:37 AM, Ivan Ristic wrote: [ The request is trivial to implement (at least I think so), but the feature itself is very important. ] Perhaps I don't understand the request, but wouldn't it be straightforward for a module like mod_security to implement this feature by using one of the connection hooks, perhaps create_connection? Or even by registering an input filter at the beginning of the chain? I'm assuming that the kind of DoS attacks you are looking for include, for example, sending request lines one octet at a time, and that the intent is to trigger only on the first request in a persistent connection, although conceivably it would be a good idea to start a timeout when the first byte of a subsequent request line comes in, in case the denial of service attack is even more subtle. Reporting timeouts in between requests in a persistent connection would trigger the warning in normal operation with a compliant browser, which is presumably undesirable. Rici
Re: Bug 31126: Reiser4
I put the gory details on Bugzilla (http://nagoya.apache.org/bugzilla/show_bug.cgi?id=31126); I honestly don't know what the best way to fix this problem is, though. Basically, ap_directory_walk will, under certain circumstances, attempt to read an .htaccess file from a complete filepath; that is, given the path /path/to/file, it will *also* try /path/to/file/.htaccess. This is not because it has been lied to by the filesystem; it does this to avoid excess calls to lstat() when it thinks it can get away with it. ap_parse_htaccess compensates for this by treating ENOTDIR as though it were ENOENT. The filesystems I have kicking around all seem to return ENOTDIR in this case, but apparently Reiser4 returns EACCES. I don't think this is wrong -- others may disagree -- but returning a security violation rather than revealing the (non)existence of a resource is a common approach. Given the semantics of the Reiser4 filesystem, it is possibly even a reasonable choice. In any event, trying to interpret the precise semantics of error returns from random filesystems is a mug's game. It is worth noting that in the particular circumstances which give rise to this error, ap_directory_walk could tell if the path refers to a directory or a file; it should be possible to avoid walking too far, although there may well be edge cases I haven't thought of. Otherwise, not optimising away the lstat() in last step of the directory walk may be the best solution. Alternatively, of course, ap_parse_htaccess could simply regard *any* error as equivalent to the non-existence of the .htaccess file.
Re: Bug 18388: cookies
Is it metadata about the body? In many cases, a cookie is semantically information about the session; it might have nothing to do with the particular body at all. On 29-Aug-04, at 5:57 PM, Jim Jagielski wrote: I myself would define the cookie header as an entity header, since it *is* meta data about the body, but I can also see it as a more traditional response header as well. But wouldn't adding new info about the response (either as a response header or entity header) invalidate it actually *being* 304 (Not Modified)? In both cases, it *is* new information about the response, in which case the full response has, in a sense, been modifed?
[PATCH] Corrected some problems with mod_info.c
Bug report with patch file attached: http://nagoya.apache.org/bugzilla/show_bug.cgi?id=30919 mod_info seems to get all confused in cases like this: Location /x Order deny,allow Allow from all AuthType Basic AuthName Foo AuthUserFile .htpasswd LimitExcept PUT require valid-user /LimitExcept /Location The Location block is terminated with /LimitExcept and the configuration scan terminates prematurely. Changing the block to this makes it work: Location /x Order deny,allow Allow from all AuthType Basic AuthName Foo LimitExcept PUT require valid-user /LimitExcept AuthUserFile .htpasswd /Location This particularly affects viewing configuration directives in vhosts.
Re: The Byterange filter -- a new design -- feel free to rip it to shreds
On 1-Aug-04, at 4:41 PM, Justin Erenkrantz wrote: Instead, what mod_proxy could do is expose a user directive that says 'no representation transformation allowed' and act accordingly: bypass all filters except for the network filters. In fact, if the filter chain cannot optimise byte ranges, there is a good argument for simply rejecting the byte-range request altogether, or at least to have a configuration directive which would allow that. Otherwise, the server cluster is open to a very simple DoS attack: repeatedly request one byte in an ISO image and wait for the cluster backbone to collapse.
Re: [patch] perchild.c (Re: setjmp/longjmp vs try/throw/catch)
On 20-Jul-04, at 3:33 PM, Tsuyoshi SASAMOTO wrote: Please refer to this discussion about thread safety of setjmp/longjmp: http://groups.google.com/groups? [EMAIL PROTECTED] The signal-to-noise ratio in that thread is very low :) It is clear that setjmp/longjmp are *not* signal-safe; furthermore, signals interact badly with threads. It is certainly not ok to use longjmp in a signal handler, since there is no way of knowing which thread a signal handler might be running in. This does not make setjmp/longjmp thread unsafe (IMHO). As Posix memorably states, using longjmp with a jump buffer from a different thread context is a questionable practice ... not worthy of standardization. R.
Re: Better error logging for mod_access
Fair enough. I guess I am being sensitive here, because the last time I submitted a patch to some other project, I did it with -u and got told that I should use -c. :) Anyway, I apologise for being grumpy and look forward to comments on the patch itself. On 9-Jun-04, at 8:29 AM, Greg Marr wrote: At 10:18 PM 6/8/2004, Rici Lake wrote: The patch is now posted to bugzilla as [Bug 29450]. I believe that conforms to the patches.html document cited below. Although that document says -C3 is acceptable, I have submitted it in the preferential -u format (which I also prefer, actually). It says use -u, but if you absolutely *can't* do -u, then of the remaining options, -C3 is the best, or at least, that is what it is meant to say.
Re: Better error logging for mod_access
Quoting Joshua Slive [EMAIL PROTECTED]: On Sat, 5 Jun 2004, Jeff Trawick wrote: What about enabling via a per-dir directive? Looks very useful, but some sites might have a significant number of new entries in error log serving no useful purpose (for them). The problem with per-dir is that the user might not know exactly where in the config file he should be editting. If they did, they probably wouldn't have a problem finding the relevant allow/deny in the first place. I agree. What about keeping the old error message, but logging an *additional* message at loglevel debug with the additional info? That is certainly possible but I was trying to make things as easy as possible for newbies. I think having both modules available, and using a -D flag to select which one to load would probably work out better. But that is just one person's opinion. Two other notes: - Personally, I hate arbitrary path-size limitations. They always bite you when you least expect it. What about dynamically sizing this? That was how I did it the first time, and then I thought that someone would complain about the possible storage usage. Actually, the best thing would be to use the pointer from the directive structure, but I could not convince myself that it was in the right pool. If someone who understands Apache 2's configuration system better than me said the filename in the directive structure is allocated in a pool which is still live when the directive is being used, I would happily change it. - You'll get more people to read your patches if you use diff -u. Tastes vary. I can post it that way if you prefer. But two people reading it in 11 minutes isn't bad :) Rici.