I second Stuart's kudos. Replacing EZProxy with an Apache proxy sounds just crazy enough to be brilliant. I could see an open source recipe book taking shape: how to accomplish EZProxy functions using Apache modules and their directives. I think that might end up being more useful than yet another standalone proxy application.
-- Scott On 01/29/14, stuart yeates wrote: > Thank you Andrew, that is insanely useful. > > cheers > stuart > > On 30/01/14 12:00, Andrew Anderson wrote: > >When OCLC first announced their purchase of EZproxy, we started a low > >priority research project to see what the alternatives were a few years ago, > >and what it would take to bring them into a production ready state. The two > >open source solutions we evaluated were Squid and Apache HTTPd. We > >considered other options (e.g. Apache Traffic Server), but limited the > >research to these two pieces of software since they are already widely used > >and familiar to most system administrators. > > > >Long story short, Squid did not support URL rewriting in a way that we felt > >would be able to be supported well, between requiring patches to the core > >C++ server code, or an external rewriting processes, or an ICAP server > >implementation. Some of that has improved a bit since the original > >evaluation, but the built-in support for URL rewriting may still need some > >time to mature. Another aspect of Squid that did not seem to be a good fit > >was that it is somewhat limited in its authentication mechanisms vs Apache > >HTTPd. > > > >So we moved on to evaluating Apache HTTPd with the mod_proxy family of > >modules. While Apache HTTPd does not support the advanced cache federation > >features as Squid, it has grown to be a robust proxy solution in its own > >right, and the 2.4 release appears to have all of the required pieces out of > >the box, with the mod_proxy_html module functionality. In addition to basic > >URL rewriting support, you get full HTTP protocol support, mature IPv6 > >support, GZIP support, just about any authentication mechanism you need, a > >server that you can self-host content with easily, as well as a built-in > >HTTP object cache. > > > >How would it work? > > > >Here’s the current EZproxy stanza for ProQuest: > > > >HTTPHeader X-Requested-With > >HTTPHeader Accept-Encoding > >Title ProQuest > >URL http://search.proquest.com/ip > >DJ proquest.com > >HJ gateway.proquest.com > >DJ umi.com > >HJ fedsearch.proquest.com > >HJ literature.proquest.com > >DJ conquest-leg-insight.com > >DJ conquestsystems.com > >DJ m.search.proquest.com > >DJ media.proquest.com > >NeverProxy order.proquest.com > >NeverProxy rss.proquest.com > > > >Here’s an Apache HTTPd configuration using ProQuest that accomplishes much > >of the same functionality for the main search.proquest.com interface: > > > ><VirtualHost _default_:80> > > ServerName search.proquest.com.fqdn > > > > ProxyRequests Off > > ProxyVia On > > > > RewriteEngine On > > RewriteRule ^/(.*) http://search.proquest.com/$1 [P] > > > > <Location “/“> > > AllowMethods GET POST OPTIONS > > ProxyPassReverse http://search.proquest.com/ > > ProxyPassReverseCookieDomain search.proquest.com search.proquest.com.fqdn > > CacheEnable disk > > SetOutputFilter INFLATE;DEFLATE > > Header Append Vary User-Agent env=!dont-vary > > # Put Authentication directives here > > ErrorDocument 401 /path/to/login > > Require Valid-User > > </Location> > ></virtualHost> > > > >A few notes on this: > > > >- There is no need for NeverProxy: if you do not define a VirtualHost for > >the hostname, it is not proxied. So instead of HJ and DJ lines, you add a > >new VirtualHost block for each hostname that needs to be proxied. The astute > >will ask “what about services that have dozens or hundreds of host entries, > >like Sage?” Those can be handled by the ProxyExpress features in Apache > >HTTPd. > > > >- There is no need for HTTPHeader: since Apache HTTPd is a full HTTP > >proxy/server, it supports all HTTP headers natively. > > > >- Some of the hostnames that are in EZproxy stanzas are not needed, and some > >are legacy hostnames that are no longer used by the vendor > > > >- Some of the hostnames that are in EZproxy stanzas are for CDN hosted > >content that requires no special access (e.g. JavaScript/CSS/graphics assets > >that make up the vendor’s user interface). Another example: how many of you > >have “DJ google.com” in one of your stanzas? Now how many of you registered > >your IP addresses with Google in any way? Outside of Google Scholar, I > >suspect the answer to those questions are “nearly everyone” and “nearly no > >one”, respectively. > > > >- Some of the hostnames are for things that no sane person would do: How > >many people run their discovery services through their EZproxy server vs. > >authenticating their discovery platform by IP address with vendors directly? > > > >- Something that this configuration does that EZproxy does not do is enable > >object caching. This can easily save 30-50% of your upstream bandwidth usage > >(Proxy/ProxySSL in EZproxy can achieve the same result with an external > >caching proxy server). > > > >- More complex vendor platforms (e.g. Gale Cengage) need ProxyHTML > >directives and ProxyHTMLURLMap configured, and multiple VirtualHost sections > >to get them fully working. These can be a little fun to get working > >initially. > > > >- Some services need redirects edited to work correctly, and not break out > >of the proxy: > > > > Header edit Location http://vendor/ http://vendor.fqdn/ > > > >- Some vendors send wrong HTTP headers for the MIME type, and mod_proxy_html > >exposes this in some cases as it rewrites the page. There may be a better > >way to do this, but this is what I threw together for testing: > > > > <Location “/badpath”> > > ProxyHTMLEnable Off > > SetOutputFilter INFLATE;dummy-html-to-plain > > ExtFilterOptions LogStdErr Onfail=remove > > </Location> > > ExtFilterDefine dummy-html-to-plain mode=output intype=text/html > > outtype=text/plain cmd=“/bin/cat -“ > > > >So what’s currently missing in the Apache HTTPd solution? > > > >- Services that use an authentication token (predominantly ebook vendors) > >need special support written. I have been entertaining using mod_lua for > >this to make this support relatively easy for someone who is not hard-core > >technical to maintain. > > > >- Services that are not IP authenticated, but use one of the Form-based > >authentication variants. I suspect that an approach that injects a script > >tag into the page pointing to javascript that handles the form > >fill/submission might be a sane approach here. This should also cleanly deal > >with the ASP.net abominations that use __PAGESTATE to store sessions > >client-side instead of server-side. > > > >- EZproxy’s built-in DNS server (enabled with the “DNS” directive) would > >need to be handled using a separate DNS server (there are several options to > >choose from). > > > >- In this setup, standard systems-level management and reporting tools would > >be used instead of the /admin interface in EZproxy > > > >- In this setup, the functionality of the EZproxy /menu URL would need to be > >handled externally. This may not be a real issue, as many academic sites > >already use LMS or portal systems instead of the EZproxy to direct students > >to resources, so this feature may not be as critical to replicate. > > > >- And of course, extensive testing. While the above ProQuest stanza works > >for the main ProQuest search interface, it won’t work for everyone, > >everywhere just yet. > > > >Bottom line: Yes, Apache HTTPd is a viable EZproxy alternative if you have a > >system administrator who knows their way around Apache HTTPd, and are > >willing to spend some time getting to know your vendor services intimately. > > > >All of this testing was done on Fedora 19 for the 2.4 version of HTTPd, > >which should be available in RHEL7/CentOS7 soon, so about the time that hard > >decisions are to be made regarding EZproxy vs something else, that something > >else may very well be Apache HTTPd with vendor-specific configuration files. > > > > > -- > Stuart Yeates > Library Technology Services http://www.victoria.ac.nz/library/ -- -- Scott Prater Shared Development Group General Library System University of Wisconsin - Madison pra...@wisc.edu