I second Stuart's kudos. Replacing EZProxy with an Apache proxy sounds just 
crazy enough to be brilliant. I could see an open source recipe book taking 
shape: how to accomplish EZProxy functions using Apache modules and their 
directives. I think that might end up being more useful than yet another 
standalone proxy application.

-- Scott
On 01/29/14, stuart yeates wrote:
> Thank you Andrew, that is insanely useful.
> 
> cheers
> stuart
> 
> On 30/01/14 12:00, Andrew Anderson wrote:
> >When OCLC first announced their purchase of EZproxy, we started a low 
> >priority research project to see what the alternatives were a few years ago, 
> >and what it would take to bring them into a production ready state. The two 
> >open source solutions we evaluated were Squid and Apache HTTPd. We 
> >considered other options (e.g. Apache Traffic Server), but limited the 
> >research to these two pieces of software since they are already widely used 
> >and familiar to most system administrators.
> >
> >Long story short, Squid did not support URL rewriting in a way that we felt 
> >would be able to be supported well, between requiring patches to the core 
> >C++ server code, or an external rewriting processes, or an ICAP server 
> >implementation. Some of that has improved a bit since the original 
> >evaluation, but the built-in support for URL rewriting may still need some 
> >time to mature. Another aspect of Squid that did not seem to be a good fit 
> >was that it is somewhat limited in its authentication mechanisms vs Apache 
> >HTTPd.
> >
> >So we moved on to evaluating Apache HTTPd with the mod_proxy family of 
> >modules. While Apache HTTPd does not support the advanced cache federation 
> >features as Squid, it has grown to be a robust proxy solution in its own 
> >right, and the 2.4 release appears to have all of the required pieces out of 
> >the box, with the mod_proxy_html module functionality. In addition to basic 
> >URL rewriting support, you get full HTTP protocol support, mature IPv6 
> >support, GZIP support, just about any authentication mechanism you need, a 
> >server that you can self-host content with easily, as well as a built-in 
> >HTTP object cache.
> >
> >How would it work?
> >
> >Here’s the current EZproxy stanza for ProQuest:
> >
> >HTTPHeader X-Requested-With
> >HTTPHeader Accept-Encoding
> >Title ProQuest
> >URL http://search.proquest.com/ip
> >DJ proquest.com
> >HJ gateway.proquest.com
> >DJ umi.com
> >HJ fedsearch.proquest.com
> >HJ literature.proquest.com
> >DJ conquest-leg-insight.com
> >DJ conquestsystems.com
> >DJ m.search.proquest.com
> >DJ media.proquest.com
> >NeverProxy order.proquest.com
> >NeverProxy rss.proquest.com
> >
> >Here’s an Apache HTTPd configuration using ProQuest that accomplishes much 
> >of the same functionality for the main search.proquest.com interface:
> >
> ><VirtualHost _default_:80>
> > ServerName search.proquest.com.fqdn
> >
> > ProxyRequests Off
> > ProxyVia On
> >
> > RewriteEngine On
> > RewriteRule ^/(.*) http://search.proquest.com/$1 [P]
> >
> > <Location “/“>
> > AllowMethods GET POST OPTIONS
> > ProxyPassReverse http://search.proquest.com/
> > ProxyPassReverseCookieDomain search.proquest.com search.proquest.com.fqdn
> > CacheEnable disk
> > SetOutputFilter INFLATE;DEFLATE
> > Header Append Vary User-Agent env=!dont-vary
> > # Put Authentication directives here
> > ErrorDocument 401 /path/to/login
> > Require Valid-User
> > </Location>
> ></virtualHost>
> >
> >A few notes on this:
> >
> >- There is no need for NeverProxy: if you do not define a VirtualHost for 
> >the hostname, it is not proxied. So instead of HJ and DJ lines, you add a 
> >new VirtualHost block for each hostname that needs to be proxied. The astute 
> >will ask “what about services that have dozens or hundreds of host entries, 
> >like Sage?” Those can be handled by the ProxyExpress features in Apache 
> >HTTPd.
> >
> >- There is no need for HTTPHeader: since Apache HTTPd is a full HTTP 
> >proxy/server, it supports all HTTP headers natively.
> >
> >- Some of the hostnames that are in EZproxy stanzas are not needed, and some 
> >are legacy hostnames that are no longer used by the vendor
> >
> >- Some of the hostnames that are in EZproxy stanzas are for CDN hosted 
> >content that requires no special access (e.g. JavaScript/CSS/graphics assets 
> >that make up the vendor’s user interface). Another example: how many of you 
> >have “DJ google.com” in one of your stanzas? Now how many of you registered 
> >your IP addresses with Google in any way? Outside of Google Scholar, I 
> >suspect the answer to those questions are “nearly everyone” and “nearly no 
> >one”, respectively.
> >
> >- Some of the hostnames are for things that no sane person would do: How 
> >many people run their discovery services through their EZproxy server vs. 
> >authenticating their discovery platform by IP address with vendors directly?
> >
> >- Something that this configuration does that EZproxy does not do is enable 
> >object caching. This can easily save 30-50% of your upstream bandwidth usage 
> >(Proxy/ProxySSL in EZproxy can achieve the same result with an external 
> >caching proxy server).
> >
> >- More complex vendor platforms (e.g. Gale Cengage) need ProxyHTML 
> >directives and ProxyHTMLURLMap configured, and multiple VirtualHost sections 
> >to get them fully working. These can be a little fun to get working 
> >initially.
> >
> >- Some services need redirects edited to work correctly, and not break out 
> >of the proxy:
> >
> >     Header edit Location http://vendor/ http://vendor.fqdn/
> >
> >- Some vendors send wrong HTTP headers for the MIME type, and mod_proxy_html 
> >exposes this in some cases as it rewrites the page. There may be a better 
> >way to do this, but this is what I threw together for testing:
> >
> >     <Location “/badpath”>
> > ProxyHTMLEnable Off
> > SetOutputFilter INFLATE;dummy-html-to-plain
> > ExtFilterOptions LogStdErr Onfail=remove
> >     </Location>
> >     ExtFilterDefine dummy-html-to-plain mode=output intype=text/html 
> > outtype=text/plain cmd=“/bin/cat -“
> >
> >So what’s currently missing in the Apache HTTPd solution?
> >
> >- Services that use an authentication token (predominantly ebook vendors) 
> >need special support written. I have been entertaining using mod_lua for 
> >this to make this support relatively easy for someone who is not hard-core 
> >technical to maintain.
> >
> >- Services that are not IP authenticated, but use one of the Form-based 
> >authentication variants. I suspect that an approach that injects a script 
> >tag into the page pointing to javascript that handles the form 
> >fill/submission might be a sane approach here. This should also cleanly deal 
> >with the ASP.net abominations that use __PAGESTATE to store sessions 
> >client-side instead of server-side.
> >
> >- EZproxy’s built-in DNS server (enabled with the “DNS” directive) would 
> >need to be handled using a separate DNS server (there are several options to 
> >choose from).
> >
> >- In this setup, standard systems-level management and reporting tools would 
> >be used instead of the /admin interface in EZproxy
> >
> >- In this setup, the functionality of the EZproxy /menu URL would need to be 
> >handled externally. This may not be a real issue, as many academic sites 
> >already use LMS or portal systems instead of the EZproxy to direct students 
> >to resources, so this feature may not be as critical to replicate.
> >
> >- And of course, extensive testing. While the above ProQuest stanza works 
> >for the main ProQuest search interface, it won’t work for everyone, 
> >everywhere just yet.
> >
> >Bottom line: Yes, Apache HTTPd is a viable EZproxy alternative if you have a 
> >system administrator who knows their way around Apache HTTPd, and are 
> >willing to spend some time getting to know your vendor services intimately.
> >
> >All of this testing was done on Fedora 19 for the 2.4 version of HTTPd, 
> >which should be available in RHEL7/CentOS7 soon, so about the time that hard 
> >decisions are to be made regarding EZproxy vs something else, that something 
> >else may very well be Apache HTTPd with vendor-specific configuration files.
> >
> 
> 
> -- 
> Stuart Yeates
> Library Technology Services http://www.victoria.ac.nz/library/

--
-- 
Scott Prater
Shared Development Group
General Library System
University of Wisconsin - Madison
pra...@wisc.edu

Reply via email to