Re: Overriding mod_rewrite from another module
OK I tried to find a more robust alternative but could not. I was thinking I could duplicate whatever mod_rewrite was doing to set the request filename that appears to be complex and probably no less brittle. I have another query on this. In reality we do *not* want our rewritten resources to be associated with a filename at all. Apache should never look for such things in the file system under ../htdocs -- they will not be there. We also do not need it to validate or authenticate on these static resources. In particular, we have found that there is some path through Apache that imposes what looks like a file-system-based limitation on URL segments (e.g. around 256 bytes). This limitation is inconvenient and, as far as I can tell, superfluous. URL limits imposed by proxies and browsers are more like 2k bytes, which would allow us to encode more metadata in URLs (e.g. sprites). Is there some magic setting we could put into the request structure to tell Apache not to interpret the request as being mapped from a file, but just to pass it through to our handler? Thanks! -Josh On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis i...@bnoordhuis.nl wrote: On Sat, Jan 1, 2011 at 00:16, Joshua Marantz jmara...@google.com wrote: Thanks for the quick response and the promising idea for a hack. Looking at mod_rewrite.c this does indeed look a lot more surgical, if, perhaps, fragile, as mod_rewrite.c doesn't expose that string-constant in any formal interface (even as a #define in a .h). Nevertheless the solution is easy-to-implement and easy-to-test, so...thanks! You're welcome, Joshua. :) You could try persuading a core committer to add this as a (semi-)official extension. Nick Kew reads this list, Paul Querna often idles in #node.js at freenode.net. I'm also still wondering if there's a good source of official documentation for the detailed semantics of interfaces like ap_hook_translate_name. Neither a Google Search, a stackoverflow.com search, nor the Apache Modules http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8qid=1293837117sr=8-1 book offer much detail. code.google.com fares a little better but just points to 4 existing usages. This question comes up often. In my experience the online documentation is almost always outdated, incomplete or outright wrong. I don't bother looking things up, I go straight to the source. It's a kind of job security, I suppose. There are only a handful of people that truly and deeply understand Apache. We can ask any hourly rate we want!
Re: Overriding mod_rewrite from another module
I have implemented Ben's hack in mod_pagespeed in http://code.google.com/p/modpagespeed/source/detail?r=345 . It works great. But I am concerned that a subtle change to mod_rewrite.c will break this hack silently. We would catch it in our regression tests, but the large number of Apache users that have downloaded mod_pagespeed do not generally run our regression tests. I have another idea for a solution that I'd like to see opinions on. Looking at Nick Kew's book, it seems like I could set request-filename to whatever I wanted, return OK, but then also shunt off access_checker for my rewritten resources. The access checking on mod_pagespeed resources is redundant, because the resource will either be served from cache (in which case it had to be authenticated to get into the cache in the first place) or will be decoded and the original resource(s) fetched from the same server with full authentication. I'd appreciate any comments on this approach. -Josh On Mon, Jan 3, 2011 at 11:40 AM, Joshua Marantz jmara...@google.com wrote: OK I tried to find a more robust alternative but could not. I was thinking I could duplicate whatever mod_rewrite was doing to set the request filename that appears to be complex and probably no less brittle. I have another query on this. In reality we do *not* want our rewritten resources to be associated with a filename at all. Apache should never look for such things in the file system under ../htdocs -- they will not be there. We also do not need it to validate or authenticate on these static resources. In particular, we have found that there is some path through Apache that imposes what looks like a file-system-based limitation on URL segments (e.g. around 256 bytes). This limitation is inconvenient and, as far as I can tell, superfluous. URL limits imposed by proxies and browsers are more like 2k bytes, which would allow us to encode more metadata in URLs (e.g. sprites). Is there some magic setting we could put into the request structure to tell Apache not to interpret the request as being mapped from a file, but just to pass it through to our handler? Thanks! -Josh On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis i...@bnoordhuis.nl wrote: On Sat, Jan 1, 2011 at 00:16, Joshua Marantz jmara...@google.com wrote: Thanks for the quick response and the promising idea for a hack. Looking at mod_rewrite.c this does indeed look a lot more surgical, if, perhaps, fragile, as mod_rewrite.c doesn't expose that string-constant in any formal interface (even as a #define in a .h). Nevertheless the solution is easy-to-implement and easy-to-test, so...thanks! You're welcome, Joshua. :) You could try persuading a core committer to add this as a (semi-)official extension. Nick Kew reads this list, Paul Querna often idles in #node.js at freenode.net. I'm also still wondering if there's a good source of official documentation for the detailed semantics of interfaces like ap_hook_translate_name. Neither a Google Search, a stackoverflow.com search, nor the Apache Modules http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8qid=1293837117sr=8-1 book offer much detail. code.google.com fares a little better but just points to 4 existing usages. This question comes up often. In my experience the online documentation is almost always outdated, incomplete or outright wrong. I don't bother looking things up, I go straight to the source. It's a kind of job security, I suppose. There are only a handful of people that truly and deeply understand Apache. We can ask any hourly rate we want!
Re: Overriding mod_rewrite from another module
On Mon, Jan 3, 2011 at 4:50 PM, Ben Noordhuis i...@bnoordhuis.nl wrote: This means that returning OK from my handler does not prevent mod_authz_host's handler from being called. You're mistaken, Joshua. The access_checker hook by default is empty. mod_authz_host is a module and it can be disabled (if you're on a Debian/Ubuntu system, run `a2dismod authz_host` and reload Apache). My perspective is that my team has implemented an Apache module that was launched on Nov 3 2010. Since its launch, we've encountered a variety of compatibility reports with other modules, notably mod_rewrite. My goal is not to remove authentication from the server; only from messing with my module's rewritten resource. The above statement is just observing that, while it's possible to shunt off mod_rewrite by returning OK from an upstream handler, the same is not true of mod_authz_host because it's invoked with a different magic macro. With respect to the URL length, I'm fairly sure it's nearly 8K (grep for HUGE_STRING_LEN in core_filters.c). There may exist some buffer in Apache that's 8k. But I have traced through failing requests earlier that were more like 256 bytes. This was reported as mod_pagespeed Issue 9http://code.google.com/p/modpagespeed/issues/detail?id=9 and resolved by limiting the number of css files that could be combined together so that we did not exceed the pathname limitations. I'm pretty sure it was due to some built-in filter or core element in httpd trying to map the URL to a filename (which is not necessary as far as mod_pagespeed is concerned) and bumping into an OS path limitation (showing up as 403 Forbidden). I confess I'm not entirely sure what you are trying to accomplish. You're serving up custom content and you're afraid mod_rewrite is going to munch the URL? Or is it more involved than that? That's exactly right. The simplest example is mod_pagespeed can infinitely extend the cache lifetime of a js file, without compromising the site owner's ability to propagate changes quickly, by putting an md5-hash of the css content into the URL. old: script src=scripts/hacks.js/script new: script src=scripts/hacks.js*.pagespeed.ce.HASH.js*/script If some mod_rewrite rule munges scripts/hacks.js.ce.pagespeed.HASH.js, then mod_pagespeed will fail to serve it. The issue is most simply stated in a Stack Overflow article: http://stackoverflow.com/questions/4099659/mod-rewrite-mod-pagespeed-rewritecond In this case, the user had hand-entered a mod_rewrite rule that broke mod_pagespeed so it made sense for him to fix it there. However, we have heard reports of other cases where a user installs some content-generation software that generate mod_rewrite rules that break mod_pagespeed. Such users may not even know what mod_rewrite is, so they can't easily work around the broken rules. This issue is reported as mod_pagespeed Issue 63http://code.google.com/p/modpagespeed/issues/detail?id=63 . Hope this clears things up. I'm still interested in your opinion on my solution where I (inspired by your hack) save the original URL in request-notes and then use *that* in my resource handler in lieu of request-unparsed_uri. This change is now committed to svn trunk (but not released in a formal patch) as http://code.google.com/p/modpagespeed/source/detail?r=348 . -Josh
Re: Overriding mod_rewrite from another module
The access checking on mod_pagespeed resources is redundant, because the resource will either be served from cache (in which case it had to be authenticated to get into the cache in the first place) or will be decoded and the original resource(s) fetched from the same server with full authentication. Re: suppressing mod_authz_host: This doesn't sound like it guards against a user that meets the AAA conditions causing the resource to be cached and served to users who would not have met the AAA restrictions. Maybe you are missing a map_to_storage callback to tell the core that this thing will really, really not be served from the filesystem. Re: suppressing rewrite. Your comments in the src imply that rewrite is doing some of what you're also suppressing in server/core.c:ap_core_translate_name(). Also, it's odd that your scheme for suppressing mod_rewrite wasn't a no-op for rewrite in htaccess context, since these use the RUN_ALL fixups hook to do its magic, but maybe you're catching a break there?
Re: Overriding mod_rewrite from another module
On Mon, Jan 3, 2011 at 6:15 PM, Eric Covener cove...@gmail.com wrote: The access checking on mod_pagespeed resources is redundant, because the resource will either be served from cache (in which case it had to be authenticated to get into the cache in the first place) or will be decoded and the original resource(s) fetched from the same server with full authentication. Re: suppressing mod_authz_host: This doesn't sound like it guards against a user that meets the AAA conditions causing the resource to be cached and served to users who would not have met the AAA restrictions. This is a good point, but I think I'm covered. mod_pagespeed will only rewrite resources that are publicly cacheable. What does AAA stand for? Authorization Authentication in Apache or something? In any case I've abandoned, for the moment, the attempt to bypass mod_authz_host on a per-request basis. Maybe you are missing a map_to_storage callback to tell the core that this thing will really, really not be served from the filesystem. I was not aware of the concept of a map_to_storage callback at all. I will have to investigate. This may be very helpful. Thanks. Re: suppressing rewrite. Your comments in the src imply that rewrite is doing some of what you're also suppressing in server/core.c:ap_core_translate_name(). Also, it's odd that your scheme for suppressing mod_rewrite wasn't a no-op for rewrite in htaccess context, since these use the RUN_ALL fixups hook to do its magic, but maybe you're catching a break there? It's quite possible that the previous hack where we use the node mod_rewrite_rewritten would break if mod_rewrite.c:hook_uri2file's functional component could get called by mod_rewrite.c:hook_fixup, but I haven't analyzed the module deeply enough to understand it at that level. But I think the present hack, where we don't turn off mod_rewrite but just ignore its output via our own request-note will be more robust. At least I hope it will. In my testing 2 weeks ago I had trouble invoking mod_rewrite from .htaccess. I'll have to try again. -Josh
Re: Overriding mod_rewrite from another module
On Mon, Jan 3, 2011 at 23:19, Joshua Marantz jmara...@google.com wrote: My goal is not to remove authentication from the server; only from messing with my module's rewritten resource. The above statement is just observing that, while it's possible to shunt off mod_rewrite by returning OK from an upstream handler, the same is not true of mod_authz_host because it's invoked with a different magic macro. My bad, I parsed your post as 'mod_authz_host is a core module and cannot be removed' which is obviously false but not what you meant. Yes, all auth_checker hooks are run. You can't prevent it but you can catch the 403 on the rebound and complain loudly in the logs. Actually, that's a lie. You can prevent it and that might also answer this next bit... There may exist some buffer in Apache that's 8k. But I have traced through failing requests earlier that were more like 256 bytes. This was reported as mod_pagespeed Issue 9http://code.google.com/p/modpagespeed/issues/detail?id=9 and resolved by limiting the number of css files that could be combined together so that we did not exceed the pathname limitations. I'm pretty sure it was due to some built-in filter or core element in httpd trying to map the URL to a filename (which is not necessary as far as mod_pagespeed is concerned) and bumping into an OS path limitation (showing up as 403 Forbidden). This might be the doing of core_map_to_storage(). Never run into it myself (with URLs up to 4K, anyway) but there you go. Okay, here is a dirty secret: if you hook map_to_storage and return DONE, you bypass Apache's authentication stack - and nearly all other hooks too. Probably an exceedingly bad idea. You can however use it to prevent core_map_to_storage() from running. Just return OK and you're set. I'm still interested in your opinion on my solution where I (inspired by your hack) save the original URL in request-notes and then use *that* in my resource handler in lieu of request-unparsed_uri. This change is now committed to svn trunk (but not released in a formal patch) as http://code.google.com/p/modpagespeed/source/detail?r=348 . Sounds fine, that's the kind of stuff request notes are for.
Re: Overriding mod_rewrite from another module
I answered my own question by implementing it and failing. You can't bypass mod_authz_host because it gets invoked via the magic macro: AP_IMPLEMENT_HOOK_RUN_ALL(int,access_checker, (request_rec *r), (r), OK, DECLINED) This means that returning OK from my handler does not prevent mod_authz_host's handler from being called. I came up with a simpler idea that does not require depending on string-literals in mod_rewrite.c. I still add a translate_name hook to run prior to mod_rewrite, but I don't try to prevent mod_rewrite from corrupting my URL. Instead I just squirrel away the uncorrupted URL in my own entry in request-notes so that I can use that rather than request-unparsed_uri downstream when processing the request. This seems to work well. The only drawback is if the site admin adds a mod_rewrite rule that mutates mod_pagespeed's resource name into something that does not pass authentication, then mod_authz_host will reject the request before I can process it. This seems like a reasonable tradeoff as that configuration would likely be borked in other ways besides mod_pagespeed resources. Commentary would be welcome. -Josh On Mon, Jan 3, 2011 at 1:10 PM, Joshua Marantz jmara...@google.com wrote: I have implemented Ben's hack in mod_pagespeed in http://code.google.com/p/modpagespeed/source/detail?r=345 . It works great. But I am concerned that a subtle change to mod_rewrite.c will break this hack silently. We would catch it in our regression tests, but the large number of Apache users that have downloaded mod_pagespeed do not generally run our regression tests. I have another idea for a solution that I'd like to see opinions on. Looking at Nick Kew's book, it seems like I could set request-filename to whatever I wanted, return OK, but then also shunt off access_checker for my rewritten resources. The access checking on mod_pagespeed resources is redundant, because the resource will either be served from cache (in which case it had to be authenticated to get into the cache in the first place) or will be decoded and the original resource(s) fetched from the same server with full authentication. I'd appreciate any comments on this approach. -Josh On Mon, Jan 3, 2011 at 11:40 AM, Joshua Marantz jmara...@google.comwrote: OK I tried to find a more robust alternative but could not. I was thinking I could duplicate whatever mod_rewrite was doing to set the request filename that appears to be complex and probably no less brittle. I have another query on this. In reality we do *not* want our rewritten resources to be associated with a filename at all. Apache should never look for such things in the file system under ../htdocs -- they will not be there. We also do not need it to validate or authenticate on these static resources. In particular, we have found that there is some path through Apache that imposes what looks like a file-system-based limitation on URL segments (e.g. around 256 bytes). This limitation is inconvenient and, as far as I can tell, superfluous. URL limits imposed by proxies and browsers are more like 2k bytes, which would allow us to encode more metadata in URLs (e.g. sprites). Is there some magic setting we could put into the request structure to tell Apache not to interpret the request as being mapped from a file, but just to pass it through to our handler? Thanks! -Josh On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis i...@bnoordhuis.nl wrote: On Sat, Jan 1, 2011 at 00:16, Joshua Marantz jmara...@google.com wrote: Thanks for the quick response and the promising idea for a hack. Looking at mod_rewrite.c this does indeed look a lot more surgical, if, perhaps, fragile, as mod_rewrite.c doesn't expose that string-constant in any formal interface (even as a #define in a .h). Nevertheless the solution is easy-to-implement and easy-to-test, so...thanks! You're welcome, Joshua. :) You could try persuading a core committer to add this as a (semi-)official extension. Nick Kew reads this list, Paul Querna often idles in #node.js at freenode.net. I'm also still wondering if there's a good source of official documentation for the detailed semantics of interfaces like ap_hook_translate_name. Neither a Google Search, a stackoverflow.com search, nor the Apache Modules http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8qid=1293837117sr=8-1 book offer much detail. code.google.com fares a little better but just points to 4 existing usages. This question comes up often. In my experience the online documentation is almost always outdated, incomplete or outright wrong. I don't bother looking things up, I go straight to the source. It's a kind of job security, I suppose. There are only a handful of people that truly and deeply understand Apache. We can ask any hourly rate we want!
Re: Overriding mod_rewrite from another module
On Mon, Jan 3, 2011 at 22:07, Joshua Marantz jmara...@google.com wrote: I answered my own question by implementing it and failing. You can't bypass mod_authz_host because it gets invoked via the magic macro: AP_IMPLEMENT_HOOK_RUN_ALL(int,access_checker, (request_rec *r), (r), OK, DECLINED) This means that returning OK from my handler does not prevent mod_authz_host's handler from being called. You're mistaken, Joshua. The access_checker hook by default is empty. mod_authz_host is a module and it can be disabled (if you're on a Debian/Ubuntu system, run `a2dismod authz_host` and reload Apache). With respect to the URL length, I'm fairly sure it's nearly 8K (grep for HUGE_STRING_LEN in core_filters.c). I still add a translate_name hook to run prior to mod_rewrite, but I don't try to prevent mod_rewrite from corrupting my URL. Instead I just squirrel away the uncorrupted URL in my own entry in request-notes so that I can use that rather than request-unparsed_uri downstream when processing the request. This seems to work well. The only drawback is if the site admin adds a mod_rewrite rule that mutates mod_pagespeed's resource name into something that does not pass authentication, then mod_authz_host will reject the request before I can process it. This seems like a reasonable tradeoff as that configuration would likely be borked in other ways besides mod_pagespeed resources. I confess I'm not entirely sure what you are trying to accomplish. You're serving up custom content and you're afraid mod_rewrite is going to munch the URL? Or is it more involved than that?
Overriding mod_rewrite from another module
I need to find the best way to prevent mod_rewrite from renaming resources that are generated by a different module, specifically mod_pagespeed. This needs to be done from within mod_pagespeed, rather than asking the site admin to tweak his rule set. By reading mod_rewrite.c, I found a mechanism that appears to work. But it has its own issues and I'm having trouble finding any relevant doc about the mechanism: ap_hook_translate_name(bypass_translators, APR_HOOK_FIRST -1); bypass_translators returns OK for resources generated by the module, preventing mod_rewrite from disturbing them. It returns DECLINED for other resources. The trouble is that httpd seems to report error messages in the log for the lack of a filename. We can set the request-filename to something but that causes the requests to fail completely on some servers. We haven't isolated the difference between servers that can handle the fake filename and ones that can't yet. Is there a better way to solve the original problem: preventing mod_rewrite from corrupting mod_pagespeed's resources? Or is there better doc on the semantics of the request.filename field in the context of a resource that is not stored as a file? Or on ap_hook_translate_name? sent from my android
Re: Overriding mod_rewrite from another module
On Fri, Dec 31, 2010 at 18:17, Joshua Marantz jmara...@google.com wrote: Is there a better way to solve the original problem: preventing mod_rewrite from corrupting mod_pagespeed's resources? From memory and from a quick peek at mod_rewrite.c: in your translate_name hook, set a mod_rewrite_rewritten note in r-notes with value 0 and return DECLINED. That'll trick mod_rewrite into thinking that it has already processed the request.
Re: Overriding mod_rewrite from another module
Thanks for the quick response and the promising idea for a hack. Looking at mod_rewrite.c this does indeed look a lot more surgical, if, perhaps, fragile, as mod_rewrite.c doesn't expose that string-constant in any formal interface (even as a #define in a .h). Nevertheless the solution is easy-to-implement and easy-to-test, so...thanks! I'm also still wondering if there's a good source of official documentation for the detailed semantics of interfaces like ap_hook_translate_name. Neither a Google Search, a stackoverflow.com search, nor the Apache Moduleshttp://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8qid=1293837117sr=8-1book offer much detail. code.google.com fares a little better but just points to 4 existing usages. -Josh On Fri, Dec 31, 2010 at 1:50 PM, Ben Noordhuis i...@bnoordhuis.nl wrote: On Fri, Dec 31, 2010 at 18:17, Joshua Marantz jmara...@google.com wrote: Is there a better way to solve the original problem: preventing mod_rewrite from corrupting mod_pagespeed's resources? From memory and from a quick peek at mod_rewrite.c: in your translate_name hook, set a mod_rewrite_rewritten note in r-notes with value 0 and return DECLINED. That'll trick mod_rewrite into thinking that it has already processed the request.