On 6/14/11 9:17 AM, Dan Scott wrote:
On Tue, Jun 14, 2011 at 08:41:59AM -0400, Bill Erickson wrote:

Hi Dan,

I'd like to suggest we not make this change or at least make the
default significantly lower.  With a 30-second timeout and a slow or
crippled added content provider, it would not take long for the
Apache processes to be gobbled up, leaving EG unusable.

Hmm. I guess as you say below that depends on load and the added content
provider; we've been running with timeout set to 45 seconds and using
the new OpenLibrary Read API where some requests do take a long time to
resolve (30 seconds for an ISBN with many editions is not unusual, at
least in this early stage before they've optimized their own service). I
thought that with caching integrated into added content, the idea was
that the initial request would be costly but subsequent requests would
be cached - therefore spreading out the pain.

Yes, in some environments a high timeout works fine. I think it's very subjective. And, yes, that is the goal of caching. It helps a lot, but obviously it doesn't remove the need to make network calls.


My preference would be to keep it at 1 w/ the understanding that
users can raise the value if they want to take that risk.  If that's
too aggressive for a default, I could maybe see using 2 or 3
seconds. Anything higher is unsafe, IMO.  Of course, it depends on
the environment.

Keeping it at 1 would be the status quo, and status quo was that I was
seeing plenty of timeouts at that setting both when we had Syndetics as
our AC provider and when we switched to OpenLibrary. 2 or 3 would
definitely be better.

Right, I understand and agree with all of this. I am suggesting that (by default) added content suffer in favor of avoiding denial of service. Since I don't think there is a timeout that will work for everyone, my preference is to default to the safest (or reasonably safe) option, even if it means losing some content.

When I raised the default timeout value on IRC a
week or two back, the general reaction was that 1 seemed low.

I'm sorry I missed that conversation.


If you see AC caching presenting a possible denial of service issue, then
maybe we should just eliminate the caching entirely, or overhaul it so
that it draws from a different pool of Apache processes than the main
Evergreen processes?

I don't see caching as a problem. The problem (as you explain below) is from Apache process gobbling. An overhaul to allow AC calls to pull from a different set of Apache servers would solve the problem. It would have to be a true overhaul to added content delivery, though, given AJAX domain restrictions.

From what you've described, it sounds like as it
currently is architected on a single-server system,

Single or Multi-server, since AC requests are spread across all of the Apache servers, regardless of the number of bricks.

...a sufficient
number of concurrent AC requests would exhaust the available Apache
processes no matter what the timeout value is set to; it's less likely
to happen at 1, but still a denial of service waiting to happen.

Agreed. It's similar to the suggestion in the Evergreen install instructions that direct users to set the KeepAliveTimeout value to 1 (instead of the default 25). It's for the same reason. We're sacrificing speed for reduced likelihood of DOS.

-b


--
Bill Erickson
| VP, Software Development & Integration
| Equinox Software, Inc. / Your Library's Guide to Open Source
| phone: 877-OPEN-ILS (673-6457)
| email: [email protected]
| web: http://esilibrary.com

Equinox is going to New Orleans! Please visit us at booth 550
at ALA Annual to learn more about Koha and Evergreen.

Reply via email to