Re: mod_cache: store_body() bites off more than it can chew
Graham Leggett wrote: On 06 Sep 2010, at 11:00 PM, Paul Querna wrote: Isn't this problem an artifact of how all bucket brigades work, and is present in all output filter chains? An output filter might be called multiple times, but a single bucket can still contain a 4gb chunk easily. It seems to me it would be better to think about this holistically down the entire output filter chain, rather than building in special case support for this inside mod_cache's internal methods? In the cache case, thinking about it a bit the in and out brigades are probably unavoidable, as the cache is a special case in that it wants to write the data twice, once to the cache, a second time to the rest of the filter stack. Right now, the cache is forced to read the complete brigade to cache it, no option to give up early. And the cache has no choice but to keep the brigade buckets in the brigade so that they can be passed a second time up the filter stack, no deleting buckets as you go like you normally would. Read one 4GB file bucket in the cache, and in the process the file bucket gets morphed into 1/2 million heap buckets, oops. With two brigades, one in, one out, the in brigade can have the buckets removed as they are consumed, as normal, and moved to the out brigade. The cache can quit at any time, and the code following knows what data to write to the network (out), and what data to loop round and resend to the cache (in). The cache provider could choose to quit and ask to be called again either because writing took too long, or too much data was read (and in the process became heap buckets), either reason is fine. That said, following on your suggestion of thinking about this in the general sense, it would be really nice if the filter stack had the option to say I have bitten off as much of the brigade as I am prepared to chew on right now, and the leftovers are still in the brigade, can you call me back with this data, maybe with more data added, and I'll try swallow some more?. In theory, that would mean all handlers (or entities that sent data) would no longer be allowed to make the blind assumption that the filter stack was willing to consume every possible set of buckets the handler wanted to send, and that the stack had the right to go I'm full, give me a second to chew on this. This wouldn't need separate brigades, probably just a return code that meant EAGAIN, and that was expected to be honoured by handlers. Regards, Graham -- Retrieving bodies from the cache has a similar scalability issue. The CACHE_OUT filter makes a single call to the provider's recall_body(). The entire body must be placed in a single brigade which is sent along the filter chain with a single ap_pass_brigade() call. If a custom provider is using heap buckets and the body is large, then this can consume too much memory. It would be better to loop, asking the provider repeatedly for portions of the body until the provider provides an EOS bucket. Is there interest in a patch implementing this approach? Thanks, Paul
RE: mod_cache: store_body() bites off more than it can chew
Plüm, Rüdiger, VF-Group wrote: -Original Message- From: Graham Leggett Sent: Montag, 13. September 2010 16:35 To: dev@httpd.apache.org Subject: Re: mod_cache: store_body() bites off more than it can chew On 13 Sep 2010, at 4:18 PM, Plüm, Rüdiger, VF-Group wrote: It is not a problem for mod_disk_cache as you say, but I guess he meant for 3rd party providers that could only deliver the cached responses via heap buckets. The cache provider itself puts the bucket in the brigade, and has the power to put any bucket into the brigade it likes, including it's own custom developed buckets. The fact that brigades become heap buckets when read is a property of our bucket brigades, they aren't a restriction applied by the cache. For example, in the large disk cache patch, a special bucket was invented that represented a file that was not be completely present, and that blocked waiting for more data if the in-flight cache file was not yet all there. There was no need to change the API to support this scenario, the cache just dropped the special bucket into the brigade and it was done. Yeah, but in a tricky way, which is absolutely fine and cool if you cannot change the API, but the question is: Is this the way providers should go and does the API looks like as it should? Regards Rüdiger Hi, I'm familiar with the FILE bucket and have considered implementing a new bucket type that would have similar morphing properties for our custom 3rd party cache provider. Currently a handler has the ability to call ap_pass_brigade multiple times hence can produce large bodies in small chunks. The CACHE_OUT filter as currently implemented does not offer that, forcing a 3rd party provider to implement their own bucket type if HEAP buckets would occupy too much memory. Changing CACHE_OUT filter to call recall_body() repeatedly until an EOS is obtained is a small change. More importantly, it won't affect existing providers as they'll produce a brigade with an EOS bucket on their first invocation. Custom bucket types may be a better approach, but shouldn't the CACHE_OUT filter be able to send the content in multiple brigades in the same way a handler would? Thanks, Paul
Re: mod_cache: store_body() bites off more than it can chew
Graham Leggett wrote: Given that the make-cache-writes-atomic problem requires a change to the data format, it may be useful to look at this now, before v2.4 is baked, which will happen soon. How much of a performance boost is the use-null-terminated-strings? Regards, Graham -- If mod_disk_cache's on disk format is changing, now may be an opportunity to investigate some options to improve performance of httpd as a caching proxy. Currently headers and data are in separate files. If they were in a single file, the operating system is given more indication that these two items are tightly coupled. For example, when the headers are read in, the O/S can readahead and buffer part of the body. A difficulty with this could be refreshing the headers after a response to a conditional GET. If the headers are at the start of the file and they change size, then they may overwrite the start of the existing body. You could leave room for expansion (risks wasted space and may not be enough) or you could put the headers at the end of the file (may not benefit from readahead). On a similar theme, would filesystem extended attributes be suitable for storing the headers? The cache file's contents would be the entity body. A problem with this approach could be portability. However the APR could abstract this, reverting to separate files on platforms/filesystems that didn't offer extended attributes. http://en.wikipedia.org/wiki/Extended_file_attributes I haven't tested extended attributes to see if they offer performance gains over separate header and body files. However it seems cleaner to have both parts in one file. It should also eliminate race conditions where headers/body could get out of sync. Thanks, Paul
Re: [PATCH] tproxy2 patch to the apache 2.2.15
JeHo Park wrote: snip yes, i see, so i also made tproxy4 apache patch to the version httpd 2.2.9 and tested it in debian linux box successfully!. the software version i tested looks below -- kernel: vanilla 2.6.31 [tproxy4 included as default ] apache: 2.2.9 [tproxy4 patch applied] iptables: 1.4.3 ebtables: 2.0.8 -- i tested the tproxy4 apache successfully in the debian lenny. but i met some strange things that was .. the same tproxy4 software did not operated correctly in the CentOS the main Environment me and our team developed in is not the debian but the CentOS so i had to give up the tproxy4. this is why i made the tproxy2 apache patch... in the kernel 2.6.18 CentOS kernel :-( Can you share your tproxy4 based patches. I think they're more interesting as they'll work across more distributions in the future. RHEL6 beta has tproxy4 support, as will CentOS6 in time. Your tproxy4 work will become usable when your main environment upgrades. Here's a post showing tproxy history, it recommends against tproxy2: https://lists.balabit.hu/pipermail/tproxy/2008-November/000994.html Bazsi suggests starting with tproxy4 for 2.6.17 and propagate that forward to a 2.6.18 kernel. The tproxy4 API looks easier to use than tproxy2. forex- Unfortunately I didn't find the tproxy4 for 2.6.17 kernel patch. really ? great! i didn't know that ! Hopefully you can locate the tproxy4 for 2.6.17 patch as that would allow Apache to work consistently in both your environment and with 2.6.28+ kernels. but it seems wondering whether Bazsi do backport the tproxy4 kernel patch to the kernel 2.6.17 or 2.6.18 anyway recently, i applied my tproxy2 patch - exactly speaking, i modified or inserted some little bit codes to the existing patch --- to a commercial sites and then i found ..maybe .. tproxy2 is not real transparency.. because i had to insert some route infomations to the box for packet routing problems. However most important is to have future proof Apache changes that will be compatible with distros other than just CentOS5/RHEL5, for example RHEL6. Although you're tied to CentOS5 now, I think Apache trunk would benefit more from tproxy4 patches. The tproxy2 work has a limited future. Incidentally, how are you managing the iptables rules? Is it assumed that these will be setup before Apache httpd is started? Or do you think Apache should own the rules, creating them at startup and removing them on shutdown. yes, i see, both tproxy2 and tproxy4 need some L2 bridge, L3 or route rules by the iptables and etc so i always insert the rules before or after starting apache httpd. and i hope Apache don't own the rules. i call the deletion of the rules from the box as software bypass :-) i think it is not needed the Apache httpd own the rules .. for more easy debugging and other usages .. Handling the iptables rules within Apache would present difficulties. For example if Apache died/crashed, the rules could be left lingering. Perhaps it's best not to pollute Apache with operation system networking setup, especially non-portable settings that are unique to Linux. Thanks, Paul
Re: [PATCH] tproxy2 patch to the apache 2.2.15
JeHo Park wrote: hello Daniel thanks your interest. - Original Message - From: Daniel Ruggeri drugg...@primary.net To: dev@httpd.apache.org Sent: Wednesday, August 04, 2010 9:11 AM Subject: Re: [PATCH] tproxy2 patch to the apache 2.2.15 On 8/3/2010 9:57 AM, JeHo Park wrote: hello ~ it's my first mail to apache dev .. and i am beginner of the apache. :-) Anyway ... recently, i wrote transparent proxy [tproxy2] patch to the httpd-2.2.15 because i needed web proxy and needed to know the source address of any client who try to connect to my web server and after all, i tested the performance of my patched tproxy with AVALANCHE 2900. if anyone ask me the performance result, i will send it to him [the size of the test result pdf is big size] *- here is the platform infomation this patch applied ---* 1. OS CentOS release 5.2 (Final) 2. KERNEL Linux version 2.6.18-194.el5-tproxy2 (r...@localhost.localdomain mailto:r...@localhost.localdomain) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #10 SMP Wed May 26 17:35:19 KST 2010 3. iptables iptables-1.3.8 + tproxy2 supporting patch *-- here is the usage of tproxy2 patched httpd configuration ---* httpd.conf VirtualHost 192.168.200.1:80 ProxyTproxy On # On/Off flag ProxyTPifaddr 192.168.200.1 # IP address of bridge interface br0. example) br0 = eth0 + eth1 /VirtualHost i attach the kernel tproxy2 patch to the kernel above[2.6.18-194.el5-tproxy2 ], httpd-2.2.15 tproxy2 patch and kernel configuration for tproxy2 above all, i want to know my patch is available or not .. and want feedback from anyone :-) JeHo; Hi, can you help me understand what the usage case is for this patch? as far as i know, there is another modules for IP transparency for example tproxy4 and X-Forwarded-For ...etc. but tproxy4 is only available from kernel version 2.6.24 and above X-Forwarded-For make the L3, L4 security box unavailable, because the main function of the x-Forwarded-for is to make the web server know client IP address, we can't sure whether there are some another security box [L3, L4 ..firewall ] between the proxy and web server, in this point, X-Forwarded-For make the security box unavailable. What service or capability does it provide that is not currently available? i just tested the patch in my local network. it worked right and i did performance test with the avalanche. but i didn't test it in field .. and various network environment. so i hope so many people use, test this patch -- Daniel Ruggeri Hi JeHo, Thank you for sharing your patches. I was unable to use your Apache patches on Fedora 13 (kernel 2.6.33). I didn't use your kernel patch since tproxy4.1 was merged into the Linux kernel at 2.6.28. You've patched tproxy2 into the CentOS/RHEL 2.6.18 kernel. tproxy2 behaves differently from tproxy4.1 hence it's to be expected that your userspace patches doesn't work with 2.6.28+ kernels. Here's a post showing tproxy history, it recommends against tproxy2: https://lists.balabit.hu/pipermail/tproxy/2008-November/000994.html Bazsi suggests starting with tproxy4 for 2.6.17 and propagate that forward to a 2.6.18 kernel. The tproxy4 API looks easier to use than tproxy2. Unfortunately I didn't find the tproxy4 for 2.6.17 kernel patch. However most important is to have future proof Apache changes that will be compatible with distros other than just CentOS5/RHEL5, for example RHEL6. Incidentally, how are you managing the iptables rules? Is it assumed that these will be setup before Apache httpd is started? Or do you think Apache should own the rules, creating them at startup and removing them on shutdown. Thanks, Paul
Re: Talking about proxy workers
Paul Fee wrote: Rainer Jung wrote: Minor additions inside. On 06.08.2010 14:49, Plüm, Rüdiger, VF-Group wrote: -Original Message- From: Paul Fee Sent: Freitag, 6. August 2010 14:44 To: dev@httpd.apache.org Subject: Re: Talking about proxy workers Also, is it possible to setup these three reuse styles for a forward proxy? 1: No reuse, close the connection after this request. Yes, this the default. 2: Reuse connection, but only for the client that caused its creation. No. Even if you configure pooled connections like in the example given in 3, the connections are returned to the pool after each request/response cycle. They are not directly associated with the client connection. But: if the MPM is prefork, the client connection is handled by a single process which doesn't handle any other requests during the life of the client connection. Since pools are process local, in this case the pool will always return the same connection (the only connection in the pool). Note that this pooled connection will not be closed when the client connection is closed. It can live longer or shorter than the client connection and you can't tie their lifetime together. Whether the proxy operates in forward or reverse mode doesn't matter, it only matters how the pool aka worker is configured. See 3. 3: Pool connection for reuse by any client. Yes, but this is needed separately for every origin server you forward to: Proxy http://www.frequentlyused.com/ # Set an arbitrary parameter to trigger the creation of a worker ProxySet keepalive=on /Proxy Pools are associated with workers, and workers are identified by origin URL. In case of a reverse proxy you often only have a few origin servers, so pooling works fine. In case of a forward proxy you often have an enormous amount of origin servers, each only visited every now and then. So using persistent connections is less effective. It would only make some sense, if we could optionally tie together client connections and origin server connections. Regards, Rainer I'm using the worker MPM, so connection sharing between clients can happen. As you've pointed out, pooling works well for reverse proxies as there are few backends and the hit rate is high. For forward proxies, there are numerous destinations and the pool hit rate will be low. The pool has a cost due to multi-threaded access to a single data structure, I presume locks protect the connection pool. Locks can limit scalability. I'm wondering if pools should be restricted to the reverse proxy case. Forward proxies would couple the proxy-origin server connection to the client side connection. Since connections can not be shared, there's no need for locking. We'd loss the opportunity to share, but since the probability of a pool hit by another client is low, that loss should be acceptable. Essentially, I'm asking if it would make sense to implement 2: Reuse connection, but only for the client that caused its creation. This could be a configurable proxy worker setting. Thanks, Paul Here's a suggestion to refine connection pooling for forward proxies. Can a connection to an origin server be tightly coupled to the client connection for the lifetime of the client connection? Then, when the client connection closes, the origin server connection can be placed in the pool for possible reuse by another incoming client connection. This would allow a client to reuse its own origin server connection without having to do a lookup in the pool. We'd save on lookup costs and pool locking. However connections would still go into the pool after the client has finished with it, for the potential benefit of other clients. When a client makes a request, mod_proxy looks to see if there's an existing origin server connection coupled to the request_rec. If there's no connection or the connection is not to the correct origin server, then perform a pool lookup. If that fails, then create a fresh connection. Does this should feasible? Can we do this with mod_proxy already? Is it worth implementing? Thanks, Paul
Re: Talking about proxy workers
Rainer Jung wrote: Minor additions inside. On 06.08.2010 14:49, Plüm, Rüdiger, VF-Group wrote: -Original Message- From: Paul Fee Sent: Freitag, 6. August 2010 14:44 To: dev@httpd.apache.org Subject: Re: Talking about proxy workers Also, is it possible to setup these three reuse styles for a forward proxy? 1: No reuse, close the connection after this request. Yes, this the default. 2: Reuse connection, but only for the client that caused its creation. No. Even if you configure pooled connections like in the example given in 3, the connections are returned to the pool after each request/response cycle. They are not directly associated with the client connection. But: if the MPM is prefork, the client connection is handled by a single process which doesn't handle any other requests during the life of the client connection. Since pools are process local, in this case the pool will always return the same connection (the only connection in the pool). Note that this pooled connection will not be closed when the client connection is closed. It can live longer or shorter than the client connection and you can't tie their lifetime together. Whether the proxy operates in forward or reverse mode doesn't matter, it only matters how the pool aka worker is configured. See 3. 3: Pool connection for reuse by any client. Yes, but this is needed separately for every origin server you forward to: Proxy http://www.frequentlyused.com/ # Set an arbitrary parameter to trigger the creation of a worker ProxySet keepalive=on /Proxy Pools are associated with workers, and workers are identified by origin URL. In case of a reverse proxy you often only have a few origin servers, so pooling works fine. In case of a forward proxy you often have an enormous amount of origin servers, each only visited every now and then. So using persistent connections is less effective. It would only make some sense, if we could optionally tie together client connections and origin server connections. Regards, Rainer I'm using the worker MPM, so connection sharing between clients can happen. As you've pointed out, pooling works well for reverse proxies as there are few backends and the hit rate is high. For forward proxies, there are numerous destinations and the pool hit rate will be low. The pool has a cost due to multi-threaded access to a single data structure, I presume locks protect the connection pool. Locks can limit scalability. I'm wondering if pools should be restricted to the reverse proxy case. Forward proxies would couple the proxy-origin server connection to the client side connection. Since connections can not be shared, there's no need for locking. We'd loss the opportunity to share, but since the probability of a pool hit by another client is low, that loss should be acceptable. Essentially, I'm asking if it would make sense to implement 2: Reuse connection, but only for the client that caused its creation. This could be a configurable proxy worker setting. Thanks, Paul
Re: OS Keep-alive on forward proxy
Rainer Jung wrote: snip The default worker for forward proxying does not use connection pooling in the naive sense. It closes each connection after each request. Regardless of pooling, since that's httpd's internal implmentation, is there a reason for defaulting to non-persistent TCP connections on the wire? I've read that the HTTP/1.0 protocol's specification for persistence was weak and that Netscape Navigator's Proxy-Connection: keep-alive header didn't fix the issue. Therefore for HTTP/1.0 mod_proxy would not create a persistent connection to the next hop (e.g. the origin server). However my understanding was that for HTTP/1.1 the protocol was good enough to work correctly over proxy chains and that the hop-by-hop connection header was adequate for negotiating each step on route from the client to the origin server. I would like mod_proxy to use persistent connections for HTTP/1.1, are there reasons for sacrificing this performance improvement? Thanks, Paul
Re: Talking about proxy workers
Mark Watts wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/08/10 12:13, Jeff Trawick wrote: On Fri, Aug 6, 2010 at 3:54 AM, Rainer Jung rainer.j...@kippdata.de wrote: On 05.08.2010 21:30, Eric Covener wrote: http://people.apache.org/~rjung/httpd/trunk/manual/mod/mod_proxy.html.en#workers A direct worker is usually configured using any of ProxyPass, ProxyPassMatch or ProxySet. I don't know much about Proxy, but can this hammer home a bit more that these directives create a new worker implicitly based on [some parts of?] the destination URL? Good point. I updated the patch and HTML page to stress the identification of workers by their URL. And what happens when there's overlap? There's a warning box at the end of the Workers section talking about that. I slightly rephrased it to also contain the trm overlap. New patch: http://people.apache.org/~rjung/patches/mod_proxy_docs_workers-v2.patch nits: + There are two builtin workers, the default forward proxy worker and the built-in + optionally included in a directive module=mod_proxyProxy/directive + directive. How about using container at the end instead of directive? (shrug) + . Direct workers can use connection pooling, + HTTP Keep-Alive and individual configurations for example + for timeouts. (dumb question: what's the diff between keepalive and connection pooling? reuse for one client vs. reuse for any client?) That last part sounds a little awkward. Maybe something like this: A number of processing options can be specified for direct workers, including connection pooling, HTTP Keep-Alive, and I/O timeout values. + Which options are available is depending on the + protocol used by the worker (and given in the origin server URL). + Available protocols include codeajp/code, codefcgi/code, + codeftp/code, codehttp/code and codescgi/code./p + The set of options available for the worker depends on the protocol, which is specified in the origin server URL. Available protocols include + pA balancer worker is created, if its worker URL uses no comma + pWorker sharing happens, if the worker URLs overlap. More precisely + if the URL of some worker is a leading substring of the URL of another + worker defined later in the configuration file. pWorker sharing happens if the worker URLs overlap, which occurs when the URL of some worker is a leading substring of the URL of another worker defined later in the configuration file. + In this case the later worker isn't actually created. Instead the previous + worker is used. The benefit is, that there is only one connection pool, + so connections are more often reused. Unfortunately all the configuration attributes + given explicitly for the later worker overwrite the respective configuration + of the previous worker!/p This sounds like a discussion of pros and cons. There's no pro, since the user didn't intend to configure it this way, right? Can we have some examples put in that section - its a little wordy and I found it a little hard to understand, and I use mod_proxy quite a lot! Mark Can I request an example of how to setup a worker for a forward proxy. Directives such as ProxyPass are for reverse proxies. Can parameters such as disablereuse be used with a proxy block? Also, is it possible to setup these three reuse styles for a forward proxy? 1: No reuse, close the connection after this request. 2: Reuse connection, but only for the client that caused its creation. 3: Pool connection for reuse by any client. Could you provide example configurations for these please? Thanks, Paul
RE: Talking about proxy workers
Plüm, Rüdiger, VF-Group wrote: 3: Pool connection for reuse by any client. Yes, but this is needed separately for every origin server you forward to: Proxy http://www.frequentlyused.com/ # Set an arbitrary parameter to trigger the creation of a worker ProxySet keepalive=on /Proxy Can I use wildcards to enable connection pooling for all forward proxy destinations? Proxy * ProxySet keepalive=on /Proxy Thanks, Paul
Re: OS Keep-alive on forward proxy
Rainer Jung wrote: snip And yes: the forward proxy does *not* do HTTP Keepalive. Technical reason: the connections to the origin server are pooled and retrieved from and returned to the pool for each request. A forward proxy usually talks to many diferent origin servers. Keeping those connections open in a naive way would lead to a lot of not well used pools. Assuming that during one client connection the origin server often is used for multiple requests this could be improved, but would bloat the already complicated proxy code even more. Has mod_proxy operated in that way for a while now? I gained most of my experience with mod_proxy using Apache 2.0.X. My understanding was that proxy to OS connections were tightly coupled to the client to proxy connection. There was a deliberate decision not to reuse proxy-OS connections for requests coming from other client-proxy connections as this may be a security risk. The OS may attribute authorization to a connection and a subsequent request on this persistent connection could inherit these attributes. Each HTTP request *should* be stateless and hence the next request on the same socket should be independent, but there was the risk that a remote (non-Apache) origin server may not work that way. If the proxy-OS connection is pooled and reused by a different client-proxy request, does that risk confusing an origin server that expects all requests on the same connection to come from the same client? ... or have I misunderstood your description? Thanks, Paul
RE: OS Keep-alive on forward proxy
Plüm, Rüdiger, VF-Group wrote: -Original Message- From: Paul Fee Sent: Donnerstag, 5. August 2010 11:18 To: dev@httpd.apache.org Subject: Re: OS Keep-alive on forward proxy Rainer Jung wrote: snip And yes: the forward proxy does *not* do HTTP Keepalive. Technical reason: the connections to the origin server are pooled and retrieved from and returned to the pool for each request. A forward proxy usually talks to many diferent origin servers. Keeping those connections open in a naive way would lead to a lot of not well used pools. Assuming that during one client connection the origin server often is used for multiple requests this could be improved, but would bloat the already complicated proxy code even more. Has mod_proxy operated in that way for a while now? I gained Since 2.2. most of my experience with mod_proxy using Apache 2.0.X. My understanding was that proxy to OS connections were tightly coupled to the client to proxy That was true in 2.0.x yes. connection. There was a deliberate decision not to reuse proxy-OS connections for requests coming from other client-proxy connections as this may be a security risk. The OS may attribute authorization to a connection and a subsequent request on this persistent connection could inherit these attributes. Each HTTP request *should* be stateless and hence the next request on the same socket should be independent, but there was the risk that a remote (non-Apache) origin server may not work that way. If the proxy-OS connection is pooled and reused by a different client-proxy request, does that risk confusing an origin server that expects all requests on the same connection to come from the same client? It would be a bug in this server to expect them to origin from the same client as you correctly state that HTTP is a stateless protocol. Nevertheless you can turn off connection pooling in the case you are dealing with a faulty origin server. Regards Rüdiger That's useful information, it's not mentioned in the overview of new features in 2.2 and I missed it in the detailed changelog. Thanks for correcting my misunderstanding. Regarding disabling connection pooling, I looked at the source and see two ways to achieve this: 1) The disablereuse parameter of the ProxyPass directive. 2) The proxy-initial-not-pooled Apache environment variable set on a per- request basis. Both these relate to reverse proxy requests. Does connection pooling apply to forward proxy requests? If so, are there configuration options to control it? Would disabling connection pooling fix the defect that Ryujiro Shibuya reported (with the penalty of losing the performance gains for pooling)? Thanks, Paul
Re: mod_deflate handling of empty initial brigade
Bryan McQuade wrote: Are there any cases where it's important for ap_pass_bridgade to pass on an empty brigade? Doesn't sound like it, but since this is a core library change I want to double check. When handling a CONNECT request, the response will have no body. In mod_proxy, the CONNECT handler currently skips most filters and writes via the connection filters. However there is a block of #if 0 code which intends to send only a FLUSH bucket down the filter chain. That's not quite the case of an entirely empty brigade, but it seems close enough to warrant highlighting. Thanks, Paul
RE: Age calculation in mod_cache.
Plüm, Rüdiger, VF-Group wrote: -Original Message- From: Ryujiro Shibuya Sent: Mittwoch, 14. April 2010 03:35 To: dev@httpd.apache.org Subject: Age calculation in mod_cache. Hello, A minor issue in the age calculation in mod_cache [ap_cache_current_age() in cache_util.c] is found. In some unusual conditions, the age of cached content can be calculated as negative value. The negative age value will be casted into a huge unsigned integer later, and then the inappropriate Age header e.g. Age: 4294963617 (= more than 135 years) may be returned to the client. In my opinion, the negative age should be adjusted to zero, at least. What are your thoughts? Makes sense. Fixed in trunk as r933886. Regards Rüdiger Hi Rüdiger, Can you educate me on how this can be merged onto the 2.2.x branch? Does a specific request need to be made or do most trunk changes get merged automatically? Thanks, Paul
Eliminating absolute paths on installation
Hello all, After building Apache httpd, I find that the httpd executable has explicit knowledge of its ultimate install location as specified with: ./configure --prefix=install location Items with this absolute knowledge include: ServerRoot (e.g. httpd implicitly know where to find its config file.) RPATH (used by the dynamic linker to locate APR libraries.) This is a problem for me as the install location is not always known at build time. Also, if I give someone a built version of httpd, they can not install it multiple times on one host due to the absolute paths. I expect the ServerRoot item could easily cope with relative paths. Whether the starting point is the current working directory or the directory in which the httpd application resides can be up for debate. The RPATH is slightly different. Before installation libtool creates a script httpd which can be used to run the real httpd which is in the .libs directory. It uses LD_LIBRARY_PATH to temporarily override the RUNPATH stored within the ELF object. However LD_LIBRARY_PATH should be avoided in general use. The RPATH is populated by the -R (or -rpath) linker option. $ORIGIN is a token which the runtime linker interprets as the directory in which the ELF object resides. The current RPATH can be seen with: (Linux) objdump -p httpd | grep PATH (Solaris) dump -Lv httpd | grep PATH RPATH install root/lib Replacing this with $ORIGIN/../lib would cause the httpd executable to search for the APR libraries in ../lib relative to itself. Hence we could now build Apache httpd without advanced knowledge of where it is to be installed. This would be very useful for me. Any thoughts? Thanks, Paul -- ___ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze
Re: Eliminating absolute paths on installation
- Original Message - From: Guy Hulbert [EMAIL PROTECTED] To: dev@httpd.apache.org Subject: Re: Eliminating absolute paths on installation Date: Wed, 13 Dec 2006 08:16:08 -0500 On Wed, 2006-13-12 at 13:16 +0100, Paul Fee wrote: This is a problem for me as the install location is not always known at build time. Also, if I give someone a built version of httpd, they can not install it multiple times on one host due to the absolute paths. Why do they need more than one ? -- --gh Hi Guy, The main motivation is that I don't want to dictate install location to people that are using my builds of httpd. Secondly, I have multiple people testing httpd and my module. I want to increase machine utilisation and allow multiple installations on one box. It may be possible to arrange that they share a common httpd but ideally each installation would be self contained. For example different httpd versions may be built with different options. The only conflicting resource that different instances must avoid contention over should be the TCP port that httpd listens on. Another scenario would be a httpd server in active service and the need to install a new version (in a different directory) for testing without removing the active version. It would be good if httpd had the option to be built without advanced knowledge of its install location. Without eliminating absolute paths, I find myself heading down the path of OS visualisation, which to me seems very heavy weight to install multiple instances of one application. Thanks, Paul -- ___ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze
Re: Eliminating absolute paths on installation
- Original Message - From: Joe Orton [EMAIL PROTECTED] To: dev@httpd.apache.org Subject: Re: Eliminating absolute paths on installation Date: Wed, 13 Dec 2006 14:33:03 + On Wed, Dec 13, 2006 at 01:16:35PM +0100, Paul Fee wrote: The RPATH is slightly different. The only way to avoid the RPATH (in general) is to link APR/APR-util statically; which can only be achieved by not building the shared libraries. So passing --disable-shared to configure may work, though this is not a configuration that gets any testing at all AFAIK, so it may not work but bug reports are welcome. Having libtool use $ORIGIN-relative RPATHs would certainly be a neat hack for platforms which support that; patches would have to go to the libtool team ;) Alternatively, you can get tools which munge ELF binaries post-build - chrpath is the commonly used one IIRC. Regards, joe Hi Joe, A problem avoided is a problem solved! My build of httpd has a small number of (non OS supplied) dependencies: libaprutil libexpat libapr Hence rolling these statically into the one httpd executable sounds feasible. I don't have other apps linking the same APR shared objects, hence I won't be losing opportunities to share .so files in memory. Editing an existing RPATH with chrpath sounds interesting, but dangerous. I won't be surprised if had a limitation such as the replacement RPATH can not exceed the original RPATH, but I could cope with that. Also, delving into the subtleties of libtool sounds a bit intimidating. Anyway, you and Jeff have provided useful pointers. Thanks, Paul -- ___ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze
Re: Re: De-Chunking
- Original Message - From: Christian V. [EMAIL PROTECTED] To: dev@httpd.apache.org Subject: Re: De-Chunking Date: Wed, 08 Nov 2006 09:59:08 +0100 Christian V. wrote: Nick Kew wrote: On Tue, 07 Nov 2006 11:24:05 +0100 Christian V. [EMAIL PROTECTED] wrote: Hi , i 'm running a third-party web service authentication module that hangs when the request coming from the client is splitted out in different chunks. I don't have access to the module and to the client neither, so I'm thinking to write an input filter that collects all the chunks and pass'em to the downstream filter or handler . Is that possible? It's possible, yes. Whether it'll fix the problem for you is not certain. I'd suggest starting with a quick hack (or a dechunking proxy in front of your server) to test it first, if you really can't get the source. Maybe the Proxy will fix it but it will not be the solution, so i think i'm gonna write the module-filter. But i need to know how Apache handle multi chunk request, as im not able to find this information. Is request coming entirely to my filter in the form of bucketbrigades then passed to down-streams module or brigades are passed down as soon as they come? (I hope i explained it well) Tnx a lot, Chris. Let me explain well as im not 100% sure the problem is the 3rd party module, and other people may had met the same issue: [ CLIENT ] -- [ APACHE R.PROXY (SSL) + 3rd MODULE ] -- [WEB SERVICE] The Web Service receives requests from both Java and .net clients. Our problem is the following. The .Net clients (we have one in c# and one in VB, both programmed with visual studio 2005 ) will split the client's XML request in multiple 1024 byte packets. This happens only over HTTPS, and causes problems with the 3rd party module To debug this we have programmed an apache module on the reverse proxy that dumps the stream of data as it receives it from the clients, and our .Net client, over HTTPS, splits it up in multiple chunks, as seen here: [Tue Oct 31 15:09:56 2006] [notice] (IN) bucketdumper: mode READBYTES; blocking; 8192 bytes [Tue Oct 31 15:09:57 2006] [notice] (IN) - (AFTER bucket_read) -\tbucketdumper:\tbytes: 1024 - lenght read: 1024 - data: ?xml version=1.0 encoding=utf-8?soap:Envelope xmlns:soap=http://schemas.xmlsoap.org/soap/envelope/; xmlns:soapenc=http://schemas.xmlsoap.org/soap/encoding/; xmlns:tns=http://www.acme.com.com/wsdl/HelloMoto.wsdl; xmlns:types=http://www.acme.com.com/wsdl/HelloMoto.wsdl/encodedTypes; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:xsd=http://www.w3.org/2001/XMLSchema;soap:Body soap:encodingStyle=http://schemas.xmlsoap.org/soap/encoding/;q1:sayHello xmlns:q1=urn:examples:HelloMotoTAG1 xsi:type=xsd:stringTES/TAG1TAG2 xsi:type=xsd:stringTES/TAG2TAG3 xsi:type=xsd:stringTES/TAG3TAG4 xsi:type=xsd:stringTES/TAG4TAG5 xsi:type=xsd:stringTES/TAG5TAG6 xsi:type=xsd:stringTES/TAG6TAG7 xsi:type=xsd:stringTES/TAG7TAG8 xsi:type=xsd:stringTES/TAG8TAG9 xsi:type=xsd:stringTES/TAG9TAG10 xsi:type=xsd:stringTES/TAG10TAG11 xsi:type=xsd:stringTES/TAG11TAG12 xsi:type=xsd:stringTEST/TAG12/q1:sayHello/soap:Body/soap:Envelope - [Tue Oct 31 15:09:57 2006] [notice] (IN) Complete Bucket : [Tue Oct 31 15:09:57 2006] [notice] (IN) bucketdumper: mode READBYTES; blocking; 8192 bytes [Tue Oct 31 15:09:58 2006] [notice] (IN) - (AFTER bucket_read) -\tbucketdumper:\tbytes: 1 - lenght read: 1 - data: - [Tue Oct 31 15:09:58 2006] [notice] (IN) - (AFTER bucket_read) -\tbucketdumper:\tbytes: 0 - lenght read: 0 - data: - Note how the XML is 1025 bytes long, and gets send in one 1024 byte packet first, followed by a second 1 byte packet (that contains just the final ). This does not happen over HTTP, where the entire XML arrives in one 1025 byte long data chunk. Also, our Java clients do not split up the XML when posting in HTTPS, independently of how long it is. Here is a request made by one of our java clients: [Tue Oct 31 15:12:57 2006] [notice] (IN) bucketdumper: mode READBYTES; blocking; 8192 bytes [Tue Oct 31 15:12:57 2006] [notice] (IN) - (AFTER bucket_read) -\tbucketdumper:\tbytes: 4333 - lenght read: 4333 - data: ?xml version=1.0 encoding=UTF-8?soapenv:Envelope xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/; xmlns:xsd=http://www.w3.org/2001/XMLSchema; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;soapenv:BodyberegnLivorno xmlns=http://prognosiRiservata.acme.com;startinputns1:channel xmlns:ns1=http://;LIV/ns1:channelns2:clientId xmlns:ns2=http://;TEST/ns2:clientIdns3:clientPassword xsi:nil=true xmlns:ns3=http:///ns4:test xmlns:ns4=http://;false/ns4:testns5:useCache mlns:ns36=http://data.prognosiRiservata.acme.com;true/ns36:returnerBeskrivelserns37:returnerThreadSide
Header compression (or lack of) in mod_proxy
Hello all, I'm using Apache as a HTTP proxy. Regarding the request and response headers, I've done some tests and noticed different behaviour in the request and response direction. The request headers are compressed (i.e. headers with same name are merged into one header and comma separated). e.g. hdr: value1 hdr: value2 becomes, hdr: value1, value2 This due to ap_get_mime_headers_core() calling apr_table_compress(). It occurs in protocol.c before Apache even detects that the incoming request is a proxy request. The response headers on the other hand are read by mod_proxy in ap_proxy_read_headers() which calls apr_table_add() but not apr_table_compress(). RFC 2616 states that header compression MUST be allowed, i.e. it's optional, therefore Apache's behaviour is compliant. However if a proxy is between a non-compliant client and/or server then it may be best to leave the headers in their original form. If a direct connection works and a proxied connection fails then the proxy will be perceived as the problem. Could someone point out a reason for the different behaviour in the request and response path? How about making the behaviour configurable so that it's consistent in both directions and if necessary the headers can be left in their original uncompressed form? By the way, my tests were on httpd 2.0.59, however reading the source for 2.2.3 suggests it has same behaviour. Thanks for your time, Paul -- ___ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze
Re: Header compression (or lack of) in mod_proxy
Sorry for the double post, I thought my first post got dropped. But it was my fault because I hadn't subscribed. Anyway more below... - Original Message - From: Graham Leggett [EMAIL PROTECTED] To: dev@httpd.apache.org Subject: confirm subscribe to dev@httpd.apache.org Paul Fee wrote: Could someone point out a reason for the different behaviour in the request and response path? Cookies. Cookie headers cannot be compressed as the RFC says they should be, so proxy works around this by not compressing headers. Thanks for that, now I see the problem. The response could contain a set-cookie header, such as: Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 09-Nov-99 23:12:40 GMT The problem being the date may contain a , character. Use of this reserved character for a purpose other than separating multiple header values means that the Set-Cookie header can not be compressed. Fortunately in the case of a request, the cookie header will not contain a date, hence the same problem is not present. How about making the behaviour configurable so that it's consistent in both directions and if necessary the headers can be left in their original uncompressed form? In theory, the idea that it be consistent in both directions is not unreasonable - it follows the principle of be lenient in what we accept. Therefore, we've eliminated the idea of calling apr_table_compress() for the response due to set-cookie. Would you foresee any issues with disabling the apr_table_compress() call for the request? I'd like to add the option to leave request headers in the form in which they were received from the client. Today, multiple headers with the same name are compressed when read from the client. Therefore Apache modules reading the headers will see a single string with comma separated values. However modules, in theory, could also add new headers so we could have something like: hdr: value1, value2 hdr: value3 If apr_table_compress() was not called, would that break anything? Would modules expect comma separated values or would they be designed to cope with both representations as RFC2616 says they MUST? If we assume that the rest of Apache will cope with both representations, then disabling the call to apr_table_compress() in ap_get_mime_headers_core() will not cause problems. Of course we should keep it configurable and perhaps have the default set to enabled so as not to force new behaviour on users. Thanks, Paul -- ___ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze
Re: De-Chunking
- Original Message - From: Christian V. [EMAIL PROTECTED] To: dev@httpd.apache.org Subject: De-Chunking Date: Tue, 07 Nov 2006 11:24:05 +0100 Hi, I'm running a third-party web service authentication module that hangs when the request coming from the client is splitted out in different chunks. I don't have access to the module and to the client neither, so I'm thinking to write an input filter that collects all the chunks and pass'em to the downstream filter or handler . Is that possible? I would almost expect that if a module's filter is of the appropriate type then is will not see the underlying representation (e.g. chunked or not). However that impression may be due to me usually working with output filter. The same may not be the same for input from the client. Also, Apache 2.2 mod_proxy has a feature to dechunk request bodies, see: http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#request-bodies It sounds like your on a web server rather than a proxy, but the mod_proxy implementation may provide you with some clues. Hope that helps, Paul -- ___ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze
Header compression (or lack of) in mod_proxy
Hello all, I'm using Apache as a HTTP proxy. Regarding the request and response headers, I've done some tests and noticed different behaviour in the request and response direction. The request headers are compressed (i.e. headers with same name are merged into one header and comma separated). e.g. hdr: value1 hdr: value2 becomes, hdr: value1, value2 This due to ap_get_mime_headers_core() calling apr_table_compress(). It occurs in protocol.c before Apache even detects that the incoming request is a proxy request. The response headers on the other hand are read by mod_proxy in ap_proxy_read_headers() which calls apr_table_add() but not apr_table_compress(). RFC 2616 states that header compression MUST be allowed. However if a proxy is between a non-compliant client and/or server then it would be best to leave the headers in their original form. If a direct connection works and a proxied connection fails then the proxy will be perceived as the problem. Could someone point out a reason for the different behaviour in the request and response path? How about making the behaviour configurable so that it's consistent in both directions and if necessary the headers can be left in their original uncompressed form? By the way, my tests were on httpd 2.0.59, however reading the source for 2.2.3 suggests the same behaviour. Thanks for your time, Paul -- ___ Surf the Web in a faster, safer and easier way: Download Opera 9 at http://www.opera.com Powered by Outblaze