This post belongs in the code4lib journal ... Shirley
On Wed, Jan 29, 2014 at 3:00 PM, Andrew Anderson <and...@lirn.net> wrote: > When OCLC first announced their purchase of EZproxy, we started a low > priority research project to see what the alternatives were a few years > ago, and what it would take to bring them into a production ready state. > The two open source solutions we evaluated were Squid and Apache HTTPd. > We considered other options (e.g. Apache Traffic Server), but limited the > research to these two pieces of software since they are already widely used > and familiar to most system administrators. > > Long story short, Squid did not support URL rewriting in a way that we > felt would be able to be supported well, between requiring patches to the > core C++ server code, or an external rewriting processes, or an ICAP server > implementation. Some of that has improved a bit since the original > evaluation, but the built-in support for URL rewriting may still need some > time to mature. Another aspect of Squid that did not seem to be a good fit > was that it is somewhat limited in its authentication mechanisms vs Apache > HTTPd. > > So we moved on to evaluating Apache HTTPd with the mod_proxy family of > modules. While Apache HTTPd does not support the advanced cache federation > features as Squid, it has grown to be a robust proxy solution in its own > right, and the 2.4 release appears to have all of the required pieces out > of the box, with the mod_proxy_html module functionality. In addition to > basic URL rewriting support, you get full HTTP protocol support, mature > IPv6 support, GZIP support, just about any authentication mechanism you > need, a server that you can self-host content with easily, as well as a > built-in HTTP object cache. > > How would it work? > > Here's the current EZproxy stanza for ProQuest: > > HTTPHeader X-Requested-With > HTTPHeader Accept-Encoding > Title ProQuest > URL http://search.proquest.com/ip > DJ proquest.com > HJ gateway.proquest.com > DJ umi.com > HJ fedsearch.proquest.com > HJ literature.proquest.com > DJ conquest-leg-insight.com > DJ conquestsystems.com > DJ m.search.proquest.com > DJ media.proquest.com > NeverProxy order.proquest.com > NeverProxy rss.proquest.com > > Here's an Apache HTTPd configuration using ProQuest that accomplishes much > of the same functionality for the main search.proquest.com interface: > > <VirtualHost _default_:80> > ServerName search.proquest.com.fqdn > > ProxyRequests Off > ProxyVia On > > RewriteEngine On > RewriteRule ^/(.*) http://search.proquest.com/$1 [P] > > <Location "/"> > AllowMethods GET POST OPTIONS > ProxyPassReverse http://search.proquest.com/ > ProxyPassReverseCookieDomain search.proquest.comsearch.proquest.com.fqdn > CacheEnable disk > SetOutputFilter INFLATE;DEFLATE > Header Append Vary User-Agent env=!dont-vary > # Put Authentication directives here > ErrorDocument 401 /path/to/login > Require Valid-User > </Location> > </virtualHost> > > A few notes on this: > > - There is no need for NeverProxy: if you do not define a VirtualHost for > the hostname, it is not proxied. So instead of HJ and DJ lines, you add a > new VirtualHost block for each hostname that needs to be proxied. The > astute will ask "what about services that have dozens or hundreds of host > entries, like Sage?" Those can be handled by the ProxyExpress features in > Apache HTTPd. > > - There is no need for HTTPHeader: since Apache HTTPd is a full HTTP > proxy/server, it supports all HTTP headers natively. > > - Some of the hostnames that are in EZproxy stanzas are not needed, and > some are legacy hostnames that are no longer used by the vendor > > - Some of the hostnames that are in EZproxy stanzas are for CDN hosted > content that requires no special access (e.g. JavaScript/CSS/graphics > assets that make up the vendor's user interface). Another example: how > many of you have "DJ google.com" in one of your stanzas? Now how many of > you registered your IP addresses with Google in any way? Outside of Google > Scholar, I suspect the answer to those questions are "nearly everyone" and > "nearly no one", respectively. > > - Some of the hostnames are for things that no sane person would do: How > many people run their discovery services through their EZproxy server vs. > authenticating their discovery platform by IP address with vendors directly? > > - Something that this configuration does that EZproxy does not do is > enable object caching. This can easily save 30-50% of your upstream > bandwidth usage (Proxy/ProxySSL in EZproxy can achieve the same result with > an external caching proxy server). > > - More complex vendor platforms (e.g. Gale Cengage) need ProxyHTML > directives and ProxyHTMLURLMap configured, and multiple VirtualHost > sections to get them fully working. These can be a little fun to get > working initially. > > - Some services need redirects edited to work correctly, and not break out > of the proxy: > > Header edit Location http://vendor/ http://vendor.fqdn/ > > - Some vendors send wrong HTTP headers for the MIME type, and > mod_proxy_html exposes this in some cases as it rewrites the page. There > may be a better way to do this, but this is what I threw together for > testing: > > <Location "/badpath"> > ProxyHTMLEnable Off > SetOutputFilter INFLATE;dummy-html-to-plain > ExtFilterOptions LogStdErr Onfail=remove > </Location> > ExtFilterDefine dummy-html-to-plain mode=output intype=text/html > outtype=text/plain cmd="/bin/cat -" > > So what's currently missing in the Apache HTTPd solution? > > - Services that use an authentication token (predominantly ebook vendors) > need special support written. I have been entertaining using mod_lua for > this to make this support relatively easy for someone who is not hard-core > technical to maintain. > > - Services that are not IP authenticated, but use one of the Form-based > authentication variants. I suspect that an approach that injects a script > tag into the page pointing to javascript that handles the form > fill/submission might be a sane approach here. This should also cleanly > deal with the ASP.net abominations that use __PAGESTATE to store sessions > client-side instead of server-side. > > - EZproxy's built-in DNS server (enabled with the "DNS" directive) would > need to be handled using a separate DNS server (there are several options > to choose from). > > - In this setup, standard systems-level management and reporting tools > would be used instead of the /admin interface in EZproxy > > - In this setup, the functionality of the EZproxy /menu URL would need to > be handled externally. This may not be a real issue, as many academic > sites already use LMS or portal systems instead of the EZproxy to direct > students to resources, so this feature may not be as critical to replicate. > > - And of course, extensive testing. While the above ProQuest stanza works > for the main ProQuest search interface, it won't work for everyone, > everywhere just yet. > > Bottom line: Yes, Apache HTTPd is a viable EZproxy alternative if you have > a system administrator who knows their way around Apache HTTPd, and are > willing to spend some time getting to know your vendor services intimately. > > All of this testing was done on Fedora 19 for the 2.4 version of HTTPd, > which should be available in RHEL7/CentOS7 soon, so about the time that > hard decisions are to be made regarding EZproxy vs something else, that > something else may very well be Apache HTTPd with vendor-specific > configuration files. > > -- > Andrew Anderson, Director of Development, Library and Information > Resources Network, Inc. > http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | > http://www.facebook.com/LIRNnotes > > On Jan 29, 2014, at 14:42, Margo Duncan <mdun...@uttyler.edu> wrote: > > > Would you *have* to be hosted? We're in a rural part of the USA and > network connections from here to anywhere aren't great, so we try to host > most everything we can. EZProxy really is "EZ" to host yourself. > > > > Margo > > > > -----Original Message----- > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > stuart yeates > > Sent: Wednesday, January 29, 2014 1:40 PM > > To: CODE4LIB@LISTSERV.ND.EDU > > Subject: Re: [CODE4LIB] EZProxy changes / alternatives ? > > > > The text I've seen talks about "[e]xpanded reporting capabilities to > support management decisions" in forthcoming versions and encourages > towards the hosted solution. > > > > Since we're in .nz, they'd put our hosted proxy server in .au, but the > network connection between .nz and .au is via the continental .us, which > puts an extra trans-pacific network loop in 99% of our proxied network > connections. > > > > cheers > > stuart > > > > On 30/01/14 03:14, Ingraham Dwyer, Andy wrote: > >> OCLC announced in April 2013 the changes in their license model for > North America. EZProxy's license moves from requiring a one-time purchase > of US$495 to a *annual* fee of $495, or through their hosted service, with > the fee depending on scale of service. The old one-time purchase license > is no longer offered for sale as of July 1, 2013. I don't have any details > about pricing for other parts of the world. > >> > >> An important thing to recognize here, is that they cannot legally > change the terms of a license that is already in effect. The software you > have purchased under the old license is still yours to use, indefinitely. > OCLC has even released several maintenance updates during 2013 that are > available to current license-holders. In fact, they released V5.7 in early > January 2014, and made that available to all license-holders. However, all > updates after that version are only available to holders of the yearly > subscription. The hosted product is updated to the most current version > automatically. > >> > >> My recommendation is: If your installation of EZProxy works, don't > change it. Yet. Upgrade your installation to the last version available > under the old license, and use that for as long as you can. At this point, > there are no world-changing new features that have been added to the > product. There is speculation that IPv6 support will be the next big > feature-add, but I haven't heard anything official. Start planning and > budgeting for a change, either to the yearly fee, or the cost of hosted, or > to some as-yet-undetermined alternative. But I see no need to start paying > now for updates you don't need. > >> > >> -Andy > >> > >> > >> > >> Andy Ingraham Dwyer > >> Infrastructure Specialist > >> State Library of Ohio > >> 274 E. 1st Avenue > >> Columbus, OH 43201 > >> library.ohio.gov > >> > >> > >> -----Original Message----- > >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf > >> Of stuart yeates > >> Sent: Tuesday, January 28, 2014 10:03 PM > >> To: CODE4LIB@LISTSERV.ND.EDU > >> Subject: Re: [CODE4LIB] EZProxy changes / alternatives ? > >> > >> I probably should have been more specific. > >> > >> Does anyone have experience switching from EzProxy to anything else? > >> > >> Is anyone else aware of the coming OCLC changes and considering > switching? > >> > >> Does anyone have a worked example like: "My EzProxy config for site Y > looked like A; after the switch, my X config for site Z looked like B"? > >> > >> I'm aware of this good article: > >> http://journal.code4lib.org/articles/7470 > >> > >> cheers > >> stuart > >> > >> > >> On 29/01/14 15:24, stuart yeates wrote: > >>> We've just received notification of forth-coming changes to EZProxy, > >>> which will require us to pay an arm and a leg for future versions to > >>> install locally and/or host with OCLC AU with a ~ 10,000km round trip. > >>> > >>> What are the alternatives? > >>> > >>> cheers > >>> stuart > >> > >> > >> -- > >> Stuart Yeates > >> Library Technology Services http://www.victoria.ac.nz/library/ > >> > > > > > > -- > > Stuart Yeates > > Library Technology Services http://www.victoria.ac.nz/library/ >