Re: RFC: Who owns a passed brigade?

2005-04-22 Thread Nick Kew
Rici Lake wrote:

 I favour having ap_pass_brigade do it.

 ... as discussed in IRC.  Yes, I like that too.

 However, I think it is a fallacy that a cleaned-up brigade is not too
 big to wait for pool cleanup. The brigade itself is about four pointers;
 and the corresponding pool cleanup is another four pointers, so that's a
 total of around 32 bytes, not a lot. But suppose that a brigade is
 created for every 8k of filtered data,

Hmmm.

Are there applications like that in the wild?  I'd have thought that
sounds like the kind of case where you definitely want to re-use
one brigade, with a brigade_cleanup after each 8Kb pass_brigade.
To create and destroy half a million brigades, including of course
registering and removing their cleanups, seems well worth avoiding.
So long as the creator is responsible (and the brigade guaranteed
to be reusable), that is trivial for the programmer.

-- 
Nick Kew


Re: RFC: Who owns a passed brigade?

2005-04-22 Thread Joe Orton
On Thu, Apr 21, 2005 at 07:05:34PM -0500, Rici Lake wrote:
 
 On 21-Apr-05, at 5:51 PM, Nick Kew wrote:
 
 Rici Lake wrote:
 
 FWIW, I think the (apparent) practice, where the caller relinquishes
 ownership of the buckets but not the brigade itself, is more efficient
 since it avoids a lot of brigade construction and destruction.
 
 Agreed.  And it works for any situation, as either party can do a
 cleanup to keep resource-use down, and a cleaned up brigade is not
 too big to wait for pool cleanup.
 
 Either party can do a cleanup does not give either party the 
 responsibility. If neither party did the cleanup, that would be a 
 problem. Lack of clear responsibility is always a problem, and not just 
 with computer programs.

The issue here is really which party can *destroy* a brigade, right? 
In the current code where brigades are never really destroyed the fact
that apr_brigade_cleanup() is called everywhere is not really harmful,
is it?

It looked to me like it was going to much simpler to adapt filters
shipped in httpd to match the creator-must-destroy model, than to match
a consumer-must-destroy model, despite the ap_pass_brigade docstring. 
It's certainly easier to audit for.

In creator-must-destroy it is still fine for a consumer to call
apr_brigade_cleanup() on a brigade it is passed, of course; that's just
equivalent to consuming all the buckets, after all.

joe


Re: mod_cache caching the 301 Moved Permanently

2005-04-22 Thread r . pluem


Sander Striker wrote:
 [EMAIL PROTECTED] wrote:
 
 The problem seems to be, that the proxied backend server that is
 cached via mod_disk_cache originally
 delivers HTTP status 301 and the Location
 http://www.beach-clothing.com/where-to-buy/, but once cached
 mod_disk_cache delivers HTTP status 200 instead of 301 (but correctly
 redelivering the Location header).
 I have not proved this for myself so far, but this seems the problem
 to me.
 
 
 This wouldn't surprise me one bit.  The 2.1 branch has seen quite a bit
 of churn in this
 area.
 
 Any chance you could give 2.1 a go and see if that works correctly?

Sorry not enough time right now to check this on the living object. But I think 
what
is missing is something like the following (untested) patch (done against 
2.0.54, but
code in trunk is the same at this part):


--- mod_cache.c.orig2005-04-11 17:47:03.0 +0200
+++ mod_cache.c 2005-04-22 23:43:19.0 +0200
@@ -220,6 +220,8 @@ static int cache_out_filter(ap_filter_t
 ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, r-server,
  cache: running CACHE_OUT filter);

+/* restore status of cached response */
+r-status = cache-handle-cache_obj-info.status;
 /* recall_headers() was called in cache_select_url() */
 cache-provider-recall_body(cache-handle, r-pool, bb);

I found no place in the code where the saved status code is restored to the 
response,
so I think this could be the right place to do so.
Could somebody crosscheck who is more familar with the code? Sander? Justin?

 
 Sander

Regards

RĂ¼diger






Re: mod_cache caching the 301 Moved Permanently

2005-04-22 Thread r . pluem


Olaf van der Spek wrote:
 On 4/22/05, Justin Erenkrantz [EMAIL PROTECTED] wrote:
 

[..cut..]


I don't get it.  What's your problem?  -- justin
 
  
 The 'here' link is to http://www.beach-clothing.com:8080/where-to-buy/
 while he wants it do be to http://www.beach-clothing.com/where-to-buy/
 
 

His problem is a different one. If you type 
http://www.beach-clothing.com/where-to-buy
(without the slash) you have the following headers (first my request / second 
server reply):

GET /where-to-buy HTTP/1.1
Host: www.beach-clothing.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050319
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai
n;q=0.8,image/png,*/*;q=0.5
Accept-Language: de,en;q=0.8,de-de;q=0.5,en-gb;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Date: Fri, 22 Apr 2005 20:39:58 GMT
Server: Apache
Content-Type: text/html; charset=iso-8859-1
Location: http://www.beach-clothing.com/where-to-buy/
Cache-Control: max-age=432000
Expires: Tue, 26 Apr 2005 04:58:38 GMT
Content-Length: 256
Age: 142880
Keep-Alive: timeout=3, max=20
Connection: Keep-Alive


The problem seems to be, that the proxied backend server that is cached via 
mod_disk_cache originally
delivers HTTP status 301 and the Location 
http://www.beach-clothing.com/where-to-buy/, but once cached
mod_disk_cache delivers HTTP status 200 instead of 301 (but correctly 
redelivering the Location header).
I have not proved this for myself so far, but this seems the problem to me.

I think the difference between the here link and the Location header can be 
explained by the ProxyPassReverse
directive which rewrites the Location header, but not the HTML code in the 
error page.

Regards

RĂ¼diger





Re: RFC: Name Based Virtual Host Callback

2005-04-22 Thread Paul Querna
Paul Querna wrote:

 Instead I have tested a simple callback method, that will only run the
 callback for the matching virtual hosts.  Attached is a prototype patch.

Attached is an updated patch.

If no one cares, I will commit it to trunk tonight.

This does include a minor MMN bump.

-Paul
Index: server/vhost.c
===
--- server/vhost.c  (revision 164089)
+++ server/vhost.c  (working copy)
@@ -977,7 +977,57 @@
 }
 }
 
+/**
+ * For every virtual host on this connection, call func_cb.
+ */
+AP_DECLARE(int) ap_vhost_iterate_given_conn(conn_rec *conn, 
+ap_vhost_iterate_conn_cb func_cb,
+void* baton)
+{
+server_rec *s;
+server_rec *last_s;
+name_chain *src;
+apr_port_t port;
+int rv = 0;
 
+if (conn-vhost_lookup_data) {
+last_s = NULL;
+port = conn-local_addr-port;
+
+for (src = conn-vhost_lookup_data; src; src = src-next) {
+server_addr_rec *sar;
+
+/* We only consider addresses on the name_chain which have a 
+ * matching port.
+ */
+sar = src-sar;
+if (sar-host_port != 0  port != sar-host_port) {
+continue;
+}
+
+s = src-server;
+
+if (s == last_s) {
+/* we've already done a callback for this vhost. */
+continue;
+}
+
+last_s = s;
+
+rv = func_cb(baton, conn, s);
+
+if (rv != 0) {
+break;
+}
+}
+}
+else {
+rv = func_cb(baton, conn, conn-base_server);
+}
+
+return rv;
+}
+
 /* Called for a new connection which has a known local_addr.  Note that the
  * new connection is assumed to have conn-server == main server.
  */
Index: include/http_vhost.h
===
--- include/http_vhost.h(revision 164089)
+++ include/http_vhost.h(working copy)
@@ -53,6 +53,30 @@
  const char *arg);
 
 /**
+ * Callback function for every Name Based Virtual Host.
+ * @param baton Opaque user object
+ * @param conn The current Connection
+ * @param s The current Server
+ * @see ap_vhost_iterate_given_conn
+ * @return 0 on success, any non-zero return will stop the iteration.
+ */
+typedef int(*ap_vhost_iterate_conn_cb)(void* baton, conn_rec* conn, 
server_rec* s);
+
+/**
+ * For every virtual host on this connection, call func_cb.
+ * @param conn The current connection
+ * @param func_cb Function called for every Name Based Virtual Host for this 
+ *connection.
+ * @param baton Opaque object passed to func_cb.
+ * @return The return value from func_cb.
+ * @note If func_cb returns non-zero, the function will return at this point, 
+ *   and not continue iterating the virtual hosts.
+ */
+AP_DECLARE(int) ap_vhost_iterate_given_conn(conn_rec *conn,
+ap_vhost_iterate_conn_cb func_cb,
+void* baton);
+
+/**
  * given an ip address only, give our best guess as to what vhost it is 
  * @param conn The current connection
  */
Index: include/ap_mmn.h
===
--- include/ap_mmn.h(revision 164089)
+++ include/ap_mmn.h(working copy)
@@ -93,6 +93,7 @@
  *symbols from the public sector, and decorated 
real_exit_code
  *with ap_ in the win32 os.h.
  * 20050305.0 (2.1.4-dev) added pid and generation fields to worker_score
+ * 20050305.1 (2.1.5-dev) added ap_vhost_iterate_given_conn.
  */
 
 #define MODULE_MAGIC_COOKIE 0x41503230UL /* AP20 */
@@ -100,7 +101,7 @@
 #ifndef MODULE_MAGIC_NUMBER_MAJOR
 #define MODULE_MAGIC_NUMBER_MAJOR 20050305
 #endif
-#define MODULE_MAGIC_NUMBER_MINOR 0 /* 0...n */
+#define MODULE_MAGIC_NUMBER_MINOR 1 /* 0...n */
 
 /**
  * Determine if the server's current MODULE_MAGIC_NUMBER is at least a


Module code review (long). Was Re: RFC: Who owns a passed brigade?

2005-04-22 Thread Rici Lake
On 22-Apr-05, at 6:19 AM, Nick Kew wrote:

However, I think it is a fallacy that a cleaned-up brigade is not too
big to wait for pool cleanup. The brigade itself is about four  
pointers;
and the corresponding pool cleanup is another four pointers, so  
that's a
total of around 32 bytes, not a lot. But suppose that a brigade is
created for every 8k of filtered data,
Are there applications like that in the wild?  I'd have thought that
sounds like the kind of case where you definitely want to re-use
one brigade, with a brigade_cleanup after each 8Kb pass_brigade.
Indeed. But the filter does not know, in general, how much data it will  
be called upon to filter, nor how many times it will be invoked. Being  
invoked once every 8 kb should be pretty common, though, since that  
seems to be the recommendation on how often a filter should pass a  
brigade.

Are there applications like that in the wild? I don't know, but it  
seems likely that there are. None of the following examples are  
definitive, but they seem illustrative.

I cite these examples only because they are available; it was not an  
attempt to criticize anyone's code in particular, but rather to show  
the result of the lack of clear documentation on how to write an output  
filter.

In summary, of five modules examined, three of them create new brigades  
on every invocation without destroying the previous one. One of them  
fails to empty the passed brigade. One of them destroys the passed  
brigade. Two of them do one-time initialization on every invocation of  
the filter.

The only output filter I looked at which seems to conform to  
brigades-are-retained, buckets-are-passed is mod_deflate. This is the  
loop from mod_deflate.c, which seems to be a reasonable model:

  1   static apr_status_t deflate_out_filter(ap_filter_t *f,
  2  apr_bucket_brigade *bb)
  3   {
  // ... declarations
  4   if (APR_BRIGADE_EMPTY(bb)) {
  5   return APR_SUCCESS;
  6   }
  7   if (!ctx) {
  // ... determine whether we should filter the content
  8   /* We're cool with filtering this. */
  9   ctx = f-ctx = apr_pcalloc(r-pool, sizeof(*ctx));
 10   ctx-bb = apr_brigade_create(r-pool, f-c-bucket_alloc);
  // ... more initialization
 11   }
 12   while (!APR_BRIGADE_EMPTY(bb))
 13   {
 14   e = APR_BRIGADE_FIRST(bb);
  // ... check if e is metadata, process data in e
 15   apr_bucket_delete(e);
 16   }
 17   apr_brigade_cleanup(bb);
 18   return APR_SUCCESS;
 19   }
Notes:
lines 4-6
   these only serve to avoid initialization in the event that the filter
   only receives empty brigades. I don't know if this is necessary.
line 7
   all initialization is performed only once, if the ctx structure has
   not yet been created
line 12, 15
   all buckets are removed from the brigade and deleted
line 17
   the call to cleanup is presumably redundant since there is no break
   in the while block. But there could be.
mod_case_filter is supposedly an example of how to write filters, so it  
is likely that its code will be copied by naive filter writers. Here is  
its loop, in essence:

  1   static apr_status_t CaseFilterOutFilter(ap_filter_t *f,
  2   apr_bucket_brigade *pbbIn)
  3   {
  // ... declarations
  4   pbbOut=apr_brigade_create(r-pool, c-bucket_alloc);
  5   for (pbktIn = APR_BRIGADE_FIRST(pbbIn);
  6pbktIn != APR_BRIGADE_SENTINEL(pbbIn);
  7pbktIn = APR_BUCKET_NEXT(pbktIn))
  8   {
  // ... check for EOS
  8   apr_bucket_read(pbktIn,data,len,APR_BLOCK_READ);
  // ... do the transform
 10   pbktOut = apr_bucket_heap_create(buf, len,  
apr_bucket_free,
 11c-bucket_alloc);
 12   APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);
 13   }
 14   return ap_pass_brigade(f-next,pbbOut);
 15   }

Notes:
line 4:
  a new brigade is created on every invocation.
lines 5-7:
  the input brigade is iterated but the buckets are not removed
line 14:
  the input brigade is not cleaned, and the output brigade is not  
destroyed.

OK, here's mod_charset_lite, which is flagged as a learning experience  
(for the author), but I believe can actually be found in the wild. The  
filter function itself was a bit much to include here (it does seem to  
empty its input brigade), but the key is in the following function  
which is called everytime the filter decides it has output available:

  1   static apr_status_t send_downstream(ap_filter_t *f, const char  
*tmp,
  2   apr_size_t len)
  3   {
  // ... declarations
  4   bb = apr_brigade_create(r-pool, c-bucket_alloc);
  5   b = apr_bucket_transient_create(tmp, len, c-bucket_alloc);
  6   APR_BRIGADE_INSERT_TAIL(bb, b);
  7   rv = 

Re: mod_cache caching the 301 Moved Permanently

2005-04-22 Thread Sander Striker
[EMAIL PROTECTED] wrote:
The problem seems to be, that the proxied backend server that is cached via 
mod_disk_cache originally
delivers HTTP status 301 and the Location 
http://www.beach-clothing.com/where-to-buy/, but once cached
mod_disk_cache delivers HTTP status 200 instead of 301 (but correctly 
redelivering the Location header).
I have not proved this for myself so far, but this seems the problem to me.
This wouldn't surprise me one bit.  The 2.1 branch has seen quite a bit of 
churn in this
area.
Any chance you could give 2.1 a go and see if that works correctly?
Sander


Re: RFC: Who owns a passed brigade?

2005-04-22 Thread Rici Lake
On 22-Apr-05, at 9:32 AM, Joe Orton wrote:
The issue here is really which party can *destroy* a brigade, right?
Or perhaps which party *must* destroy a brigade. This is much less of 
an issue if neither party creates a new brigade on every filter 
invocation. But some do.

In the current code where brigades are never really destroyed the fact
that apr_brigade_cleanup() is called everywhere is not really harmful,
is it?
No, what would be harmful would be apr_brigade_cleanup() not being 
called on a non-empty brigade.

It looked to me like it was going to much simpler to adapt filters
shipped in httpd to match the creator-must-destroy model, than to match
a consumer-must-destroy model, despite the ap_pass_brigade docstring.
That seems to be the case.
It's certainly easier to audit for.
Sure, but it applies equally to buckets. It seems that the consensus is 
creator-must-destroy-brigades; consumer-must-remove-buckets. That's 
not easy to audit for, which is why I think ap_pass_brigade should 
guarantee the latter condition.

In creator-must-destroy it is still fine for a consumer to call
apr_brigade_cleanup() on a brigade it is passed, of course; that's just
equivalent to consuming all the buckets, after all.
Sure. Consequently, it's ok for ap_pass_brigade to do the call. Then a 
consumer is free to either consume the buckets or not.