Re: dynamically change max client value

2002-11-04 Thread Scott Hess
Based on my experience, this wouldn't be a high-quality solution, it would
be a hack.  I've seen very few cases where load spiked enough to be an
issue, but was transient enough that a solution like this would work - and
in those cases, plain old Unix multitasking normally suffices.

What happens if you implement the solution anyhow is that you get a bunch
of users stuck in the ListenBacklog.  So they'll wait a couple minutes
before their request even starts.  If you have a deep backlog, requests
just pile up so that the machine never gets its head above water.  In the
worst case, clients will timeout while their request is in the backlog,
but since you don't find that out until you send a response which writes
out to the network, you can very easily do work that can never be
delivered.  Beyond all that, the user experience simply _sucks_.

[Yes, I've done what you suggest, just not using the implementation you
suggest.  It's integrated into an existing custom module, you could also
probably do it with a reverse proxy.  In the end, it was not a productive
solution.]

What I think you really want is a module that will intercept all requests,
and send back "The server is really busy, try again in five minutes" if
the server is too busy by some measure.  You generally want this to be a
super-low-cost option, so that you can spin through requests very quickly.  
Optimally, no externally-blockable pieces (no database connections, no
locking filesystem access, etc).  One relatively simple option might be to
use a Squid, and an URL redirector which implements the magic check.  If
the machine is not busy, send through to the real server, if the machine
is busy, redirect to an URL which will deliver your message.

[Again, yes, I've done this in Apache1.3, but in code targetted to our
custom modules.  You could certainly do it more generically, I just
haven't had the need.  You might check mod_backhand.]

Later,
scott


On Mon, 4 Nov 2002, David Burry wrote:
> I realize that allowing _everything_ to be dynamically configured via
> SNMP (or signal or something) would probably be too substantial of an
> API change to be considered for the current code base, but it would be
> nice to consider it for some future major revision of Apache
> 
> And it would be more than just "nice" if at least the max client value
> thing could be somehow worked into the current versions of Apache...  
> There is a current very real and very large problem that could be solved
> by this, not just a "nice to have" feature.  This is what I meant to
> emphasize in my original email...
> 
> Dave
> 
> - Original Message -
> From: "Dirk-Willem van Gulik" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Monday, November 04, 2002 9:35 AM
> Subject: Re: dynamically change max client value
> 
> 
> >
> > In my ideal world every config directive would be able to advertize or
> > register an optional 'has changed' hook. Which, if present, would be
> > called in context whenever a value is somehow updated (through snmp, a
> > configd, signal, wathever). If there is no such hook; the old -update- on
> > graceful restart is the default (though it sure would be nice to have some
> > values also advertize that they need a full shutdown and restart).
> >
> > Of course one could also argue for not just a put but also for a 'get'
> > interface in context :-)
> >
> > Dw
> >
> > On Mon, 4 Nov 2002, David Burry wrote:
> >
> > > Recently there has been a little discussion about an API in apache for
> > > controlling starts, stops, restarts, etc...
> > >
> > > I have an idea that may help me solve a problem I've been having.  The
> > > problem is in limiting the number of processes that will run on a
> machine to
> > > somewhere below where the machine will keel over and die, while still
> being
> > > close to the maximum the machine will handle.  The issue is depending on
> > > what the majority of those processes are doing it changes the maximum
> number
> > > a given machine can handle by a few orders of magnitude, so a
> multi-purpose
> > > machine that serves, say, static content and cgi scripts (or other
> things
> > > that vary greatly in machine resource usage) cannot be properly tuned
> for
> > > maximum performance while guaranteeing the machine won't die under heavy
> > > load.
> > >
> > > The solution I've thought of is... what if Apache had an API that could
> be
> > > used to say "no more processes, whatever you have NOW is the max!"  or
> > > otherwise to dynamically raise or lower the max number (perhaps "oh
> there's
> > > too many, reduce a bit")  You see, an external monitoring system
> could
> > > monitor cpu and memory and whatnot and dynamically adjust apache
> depending
> > > on what it's doing.  This kind of system could really increase the
> > > stability of any large Apache server farm, and help keep large traffic
> > > spikes from killing apache so bad that nobody gets served anything at
> all.
> > >
> > > In fact 

Re: A suggested ROADMAP for working 2.1/2.2 forward?

2002-10-18 Thread Scott Hess
On Thu, 17 Oct 2002, William A. Rowe, Jr. wrote:
> With the simultaneous release of Apache 2.1-stable and Apache
> 2.2-development, the Apache HTTP Server project is moving to a more
> predictable stable code branch, while opening the development to forward
> progress without concern for breaking the stable branch.  This document
> explains the rational between the two versions and their behavior, going
> forward.

This is great.  I'm our "Apache guy", and 2.0 has been a non-starter.  I
can fairly easily keep up with the 1.3 changes, because it doesn't involve
revision to our codebase, so we get the best of both worlds.  What I think
this arrangement would allow me to do is make local adjustments to our 2.1
codebase, and if they prove out in production, I can repackage them as a
patch to 2.2.  Right now, the likelihood that I'll contribute to the most
current development tree is nil, because it's just too different from
where most of my work is done.

Excellent,
scott




Re: [PATCH] Alerting when fnctl is going bad

2002-09-26 Thread Scott Hess

On Wed, 25 Sep 2002, Sander van Zoest wrote:
> On Wed, 25 Sep 2002, Justin Erenkrantz wrote:
> > On Thu, Sep 26, 2002 at 02:11:59AM +0200, Dirk-Willem van Gulik wrote:
> > > ->Makes the wait loop no longer endless - but causes it
> > >   to bail out (and emit some warnings ahead of time) after
> > >   a couple of thousand consequituve EINTRs.
> > Placing a 'magic number' on how many EINTRs is 'failure' doesn't
> > seem right.  -- justin
> 
> Although, things like these have been done many times in the past.
> Especially in BSD.  As long as the number is high enough to where there
> doesn't seem to be an obvious reason to go above that, then I do not see
> why not.

Is there any other way to detect that the fcntl() is bad, other than "we 
got more than X EINTR"?  For instance, in this case I'm guessing it's 
related to interrupts due to the lockfile being on a network filesystem of 
some sort, and it looks like you could have a server run fine for a couple 
days, then drop itself for no obvious reason.

Every time I've ever seen code which does something different after
getting "too many" EINTR responses, and later rolled that code into a
production environment, it's turned out to be wrong.  EINTR never seems to
happen in development environments :-).  If you're getting "too many"
EINTR results, in my experience it means that the code isn't handling
errors correctly somewhere else, and it will bite you in other ways, so
nowadays I always go looking for the _real_ problem.

[Unless, of course, the OS itself has a bug.  But that should definitely
involve conditional code to fix.]

Later,
scott






RE: E-Kabong resolution: Re: acceptance of El-Kabong into APR

2002-09-12 Thread Scott Hess

On Thu, 12 Sep 2002, Harrie Hazewinkel wrote:
> --On Thursday, September 12, 2002 8:50 AM -0500 "Jenkins, David" 
> > I disagree almost completely.  If you are truly dedicated to the ASF
> > community, you will understand the cautiousness necessary in deciding 
> > who has commit privs.
> 
> I was mainly thinking of bigger pieces of code - code component - and
> for those their are mostly also maintainers needed. Those maintainers
> are mostly first the donator. For small patches I agree not everyone
> should get commit access.

I think it's important to keep in mind that being part of the Apache
deliverables is not the only option.  Contributors can always spin up
their own external opensource project, as was done for mod_ssl, mod_perl,
mod_php, etc.  Yes, this places more of a burden on the contributor, but
that's fair in cases where the contributor desires to maintain tighter
control.

A side effect of this is that if the component becomes popular,
integrating it becomes more compelling.

Later,
scott




Re: El-Kabong -- HTML Parser

2002-09-10 Thread Scott Hess

[I am not an Apache contributor, merely a lurker, but...]

On Tue, 10 Sep 2002, Jon Travis wrote:
> These are not coercive tactics.  These are processes which are
> beneficial to both the ASF and Covalent.  I cannot continually monitor
> the progress of this project for eternity.  I'm astonished that this
> deadline email has caused such a response.  This sets an extremely bad
> precedent for other companies (or anyone for that matter) who wants to
> contribute to the ASF.
> 
> Personally (Covalent hat off), it's a bummer that this is your response
> to the donation.  I was the one who originally proposed it to
> management, they agreed to it, and now I've gotten involved in all kinds
> of politics and inflamatory emails.  That's a long way from being
> excited about contritributing to the ASF, and sadly seems like more
> trouble than it's worth.

As I said earlier: if all you want is to contribute the code, put a
compatible open source license on it and put it on a publicly accessable
website, somewhere.

>From following the thread, I get the feeling you don't want to contribute
it, you want someone to take ownership of it.  A couple points:

 1) Everyone here has a real-life job.
 2) Many of those jobs don't involve Apache directly.
 3) Anyone who's writing code has their own pet projects they want done.
 4) Anyone without a pet project has a choice of dozens/hundreds of
abandoned/unmaintained projects to work on.
 5) Integration work is hard work.

If you really want the ASF to pull this project into the Apache core, your
best bet is to volunteer to integrate it and write some example code.  
After all, you're the one with the code, you're the one who wants to
contribute it to the community.

This isn't specific to the Apache group.  This is just how open source
software works.  And this basic thread happens every couple months on
every open source project I monitor.

As far as inflammatory emails, you must be reading lists that I don't have
access to, because I haven't seen it.  Given that you've essentially asked
the community to prove that it's worthy of accepting your contribution,
I'm actually surprised the responses have been so calm.

Later,
scott




Re: El-Kabong -- HTML Parser

2002-09-09 Thread Scott Hess

I'm not sure I understand what your goal is, here.  The discussion seems
to be +1 for including your parser somewhere in some Apache project in the
future, there's just no clear concensus on where.  Is there any reason you
can't just release your project under the ASF license and be done with it?

Later,
scott

On Mon, 9 Sep 2002, Jon Travis wrote:
> Ok, since I'm not seeing any activity towards getting this 
> integrated, I'd like to set a deadline.  This would help
> me out, since it gives direction as to where the project
> can go, as well as the ASF since political discussion shouldn't
> weigh down the process.  It will just save us all a lot of
> time & energy.
> 
> Anyway, I'd like to give an additional week to the ASF
> to deal with the code.  Next Monday, if it hasn't been
> decided I'll look into other options.
> 
> -- Jon
> 
> 
> On Mon, Sep 09, 2002 at 10:36:21AM -0700, Jon Travis wrote:
> > Time for another ping.  It's been 2 weeks.  Any word?
> > 
> > -- Jon
> > 
> > 
> > On Mon, Aug 26, 2002 at 08:32:16PM -0700, Jon Travis wrote:
> > > Hi all...
> > > Jon Travis here...
> > > 
> > > Covalent has written a pretty keen HTML parser (called el-kabong) 
> > > which we'd like to offer to the ASF for inclusion in APR-util (or
> > > whichever other umbrella it fits under.)  It's faster than 
> > > anything I can find, provides a SAX stylee interface, uses
> > > APR for most of its operations (hash tables, etc.), and has a
> > > pretty nice testsuite.  We use it in our code to re-write HTML on 
> > > the fly.  I would be the initial maintainer of the code.
> > > 
> > > Please voice any interest, thanks.
> > > 
> > > -- Jon
> > > 
> 




Re: Thread-unsafe libraries in httpd-2.0

2002-08-16 Thread Scott Hess

On Thu, 15 Aug 2002, William A. Rowe, Jr. wrote:
> There's no reason to bloat all of Apache and it's well behaved modules
> with extra code, when only a handful of modules care to report that they
> can't be compiled for a threaded architecture.

The strict engineer in me agrees.  The pragmatic engineer in me realizes
that threading issues are hard, and that you're going to get more false
positives (modules allowed to run who shouldn't be) if you make threading
opt-out rather than opt-in.  It's not like this code (or flag) has to be
handled on every request.

[Just in case that wasn't clear - modules should indicate that they are
thread-safe, else the threaded MPMs should abort.  Perhaps it would be
sufficient to simply report an error or alert in the logs, so that when
things go wrong, it occurs to the admin to consider thread-safety issues
alongside other issues.]

When it comes down to it, we're only talking about a couple extra lines
for all of the standard modules to indicate that they are thread-safe.  
While that road does lead to creature feep, non-thread-safe code running
in a threaded program can be very touchy, likely to work in a large number
of cases, while crashing with weird, hard-to-debug symptoms.

Later,
scott





Re: config handling (was: Re: cvs commit: httpd-2.0/server core.c)

2002-05-20 Thread Scott Hess

On Mon, 20 May 2002, Greg Stein wrote:
> On Sat, May 18, 2002 at 12:32:20PM -0500, William A. Rowe, Jr. wrote:
> > On Win32, we load-unload-reload the parent, then load-unload-reload 
> > the child config.  Losing both redundant unload-load sequences will 
> > be a huge win at startup.
> 
> Yup. If we process the tree in a much smarter fashion, then nothing
> should need to be unloaded.

One thing I _like_ about the load-unload-reload is that it generally
forces you (the module author) to consider the graceful restart case,
rather than simply crashing (or getting buggy) the first time someone does
it.  [Sorry if you're using those terms in a technical fashion that I'm
not following.]

OTOH, on Windows the parent and child both have to load things, so you get
a similar effect.

[Speaking of this, one thing I'd like to see for Windows would be a way
for the parent process to cache the config (or parse tree) and pass it
directly to the child, so that you don't have the possibility of changing
config when a new child is spawned due to MaxRequestsPerChild.  Yeah, I
_should_ submit a patch rather than a request.]

Later,
scott




RE: is httpd a valid way to start Apache?

2002-05-16 Thread Scott Hess

On Thu, 16 May 2002, Joshua Slive wrote:
> On Thu, 16 May 2002, Ryan Bloom wrote:
> > My own opinion is that we leave things exactly as they are today.  If
> > you are running the binary by hand, you are taking some responsibility
> > for knowing what you are doing.  That means having the environment
> > variables setup correctly before you start.
> >
> > If you don't want that responsibility, use apachectl to run the 
> > server.  Trying to solve this problem any other way just seems like we 
> > are asking for trouble.
> 
> I think that is exactly what this proposal is saying.  But at the same
> time it is cleaning up apachectl and adding some useful functionality to
> httpd.  As I've said, the current apachectl is over-complicated and the
> split between apachectl and httpd is confusing to some people.  This
> change would clear that up.

Would it make sense to move the httpd binary to .../libexec/httpd?  That
makes it clear that this is an internal binary which you shouldn't run
directly, unless you're really smart.  Then apachectl stays in .../sbin/.

[Idea courtesy of mysql's mysqld.]

Later,
scott




RE: Move perchild to experimental?

2002-04-17 Thread Scott Hess

In my experience this argument always ends with: copy the ,v files, then
cvs rm the old version, with a comment on the order of "moved to
../wherever".  Perhaps with a "moved from .../wherever" comment added to
the new version.  I think it's even ended that way on this list a couple 
times.

Messing with history is bad!

Later,
scott

On Wed, 17 Apr 2002, Ryan Bloom wrote:
> I would much rather move the ,v files.  This is a standard argument on
> this list, and there has never been consensus.  The history is important
> with stuff like MPMs, and doing a cvs rm, cvs add removes the history.
> 
> Ryan
> 
> --
> Ryan Bloom  [EMAIL PROTECTED]
> 645 Howard St.  [EMAIL PROTECTED]
> San Francisco, CA 
> 
> > -Original Message-
> > From: Aaron Bannert [mailto:[EMAIL PROTECTED]]
> > Sent: Wednesday, April 17, 2002 3:12 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Move perchild to experimental?
> > 
> > On Wed, Apr 17, 2002 at 03:10:10PM -0700, Justin Erenkrantz wrote:
> > > Okay, so it seems we have consensus to move it.
> > >
> > > Uh, how do we move it?
> > >
> > > - Delete it and re-add them in the new directory
> > > - Move the .v files on icarus
> > 
> > If you move the ,v files you'll be messing with history, how about
> just
> > delete and add?
> > 
> > *cough*svn could probably do it*cough*
> > 
> > -aaron
> 




Re: performance: using mlock(2) on httpd parent process

2002-03-20 Thread Scott Hess

On Wed, 20 Mar 2002, Stas Bekman wrote:
> mod_perl child processes save a lot of memory when they can share memory
> with the parent process and quite often we get reports from people that
> they lose that shared memory when the system decides to page out the
> parent's memory pages because they are LRU (least recently used, the
> algorithm used by many memory managers).

I'm fairly certain that this is not an issue.  If a page was shared COW
before being paged out, I expect it will be shared COW when paged back in,
at least for any modern OS.

[To verify that I wasn't talking through my hat, here, I just verified
this using RedHat 7.2 running kernel 2.4.9-21.  If you're interested in my
methodology, drop me an email.]

> I believe that this applies to all httpd modules and httpd itself, the
> more we can share the less memory resources are needed, and usually it
> leads to a better performance.

I'm absolutely _certain_ that unmodified pages from executable files will
be backed by the executable, and will thus be shared by default.

> Therefore my question is there any reason for not using mlockall(2) in
> the parent process on systems that support it and when the parent httpd
> is started as root (mlock* works only within root owned processes).

I don't think mlockall is appropriate for something with the heft of
mod_perl.

Why are the pages being swapped out in the first place?  Presumably
there's a valid reason.  Doing mlockall on your mod_perl would result in
restricting the memory available to the rest of the system.  Whatever is
causing mod_perl to page out would then start thrashing.  Worse, since
mlockall will lock down mod_perl pages indiscriminately, the resulting
thrashing will probably be even worse than what they're seeing right now.

Later,
scott




Re: performance: using mlock(2) on httpd parent process

2002-03-20 Thread Scott Hess

On Thu, 21 Mar 2002, Stas Bekman wrote:
> > On Wed, 20 Mar 2002, Stas Bekman wrote:
> > 
> >>mod_perl child processes save a lot of memory when they can share 
> >>memory with the parent process and quite often we get reports from 
> >>people that they lose that shared memory when the system decides to 
> >>page out the parent's memory pages because they are LRU (least 
> >>recently used, the algorithm used by many memory managers).
> >>
> > 
> > I'm fairly certain that this is not an issue.  If a page was shared 
> > COW before being paged out, I expect it will be shared COW when paged 
> > back in, at least for any modern OS.
> 
> But if the system needs to page things out, most of the parent process's
> pages will be scheduled to go first, no? So we are talking about a
> constant page-in/page-out from/to the parent process as a performance
> degradation rather than memory unsharing. Am I correct?

The system is going to page out an approximation of the
least-recently-used pages.  If the children are using those pages, then
they won't be paged out, regardless of what the parent is doing.  [If the
children _aren't_ using those pages, then who cares?]

> > [To verify that I wasn't talking through my hat, here, I just verified
> > this using RedHat 7.2 running kernel 2.4.9-21.  If you're interested in my
> > methodology, drop me an email.]
> 
> I suppose that this could vary from one kernel version to another.

Perhaps, but I doubt it.  I can't really do real tests on older kernels
because I don't have them on any machines I control, but I'd be somewhat
surprised if any OS which runs on modern hardware worked this way.  It
would require the OS to map a given page to multiple places in the
swapfile, which would be significant extra work, and I can't think of any
gains from doing so.

> I'm just repeating the reports posted to the mod_perl list. I've never
> seen such a problem myself, since I try hard to have close to zero swap
> usage.

:-).  In my experience, you can get some really weird stuff happening when
you start swapping mod_perl.  It seems to be stable in memory usage,
though, so long as you have MaxClients set low enough that your maximum
amount of committed memory is appropriate.  Also, I've seen people run
other heavyweight processes, like mysql, on the same system, so that when
the volume spikes, mod_perl spikes AND mysql spikes.  A sure recipe for
disaster.

> [Yes, please let me know your methodology for testing this]

OK, two programs.  bigshare.c:

#include 
#include 
#include 

#define MEGS 256
static char *mem = NULL;
static char vv = 0;

static void handler(int signo)
{
char val = 0;
unsigned ii;
signal(signo, handler);
for (ii=0; ii

int main(int argc, char **argv)
{
char *mem = calloc(1, 384*1024*1024);
free(mem);
return 0;
}

These both compile under RedHat 7.2, you might have to adjust the #include
directives for other systems.  Adjust the MEGS value in bigshare.c to be
big enough to matter, but not so big that it causes bigshare itself to
swap.  I chose 1/2 of my real memory size.  The 384 in makeitswap.c is 3/4
of my real memory, so it pushes tons of stuff into swap.

Run bigshare.  Use ps or something appropriate to determine that, indeed,
all four bigshare processes are using up 256M of memory, but it's all
shared.

Then, run makeitswap.  All of the bigshare processes should partly or
fully page out.  Afterwards I I was seeing RSS from 260k to 1M on the
bigshare processes.

Then, kill -USR1 one of the bigswap processes.  This causes the process to
re-read all of the memory it earlier allocated, thus it should page in
256M or so.  ps or top should show the RSS rising as it swaps back in.  
You can also use "vmstat 1" to watch it happen (watch the Swap/si column).  
On some systems you may need to use iostat.  More than likely your system
response also goes to heck, because it's spending so much time swapping
data in.  bigswap should end up with RSS about 256M, again.

Then, kill -USR1 another of the bigswap processes.  On my system, this
happened much faster than the first time.  Also, I saw only minimal
swapins in vmstat (128 or so per second, versus >10,000 per second for the 
-USR1 against the first process).  Send -USR1 to other bigshare processes, 
same results.  You can verify that the pages are shared with ps or 
whatever.

> >>Therefore my question is there any reason for not using mlockall(2) in
> >>the parent process on systems that support it and when the parent 
> >>httpd is started as root (mlock* works only within root owned 
> >>processes).
> > 
> > I don't think mlockall is appropriate for something with the heft of
> > mod_perl.
> > 
> > Why are the pages being swapped out in the first place?  Presumably
> > there's a valid reason. 
> 
> Well, the system coming close to zero of real memory available. The
> parent process starts swapping like crazy because most of its pages are
> LRU, slowing the whole system down and if the load doesn'

Re: Parent death should force children suttee

2002-01-31 Thread Scott Hess

On Thu, Jan 31, 2002 at 06:40:01PM -0500, Dale Ghent wrote:
> From a users' standpoint, it would seem more like a bug in apache if
> s/he tries to shut apache down via apachectl, and then start it back up.
> 
> First, the shutdown will fail, because the ppid is no-longer existing
> (and thus producding the "unclean shutdown message), and when the
> attempt by the bewildered admin is made to start apache again, it fails
> because the childs are still bound to ports and whatnot.
> 
> Although I hold no voting power here, I'd say that the children are to
> die with the parent.

It might be useful to allow conceptual space for it to work either way.
Locally we've modified Apache 1.3.x for FreeBSD to add a "graceful
shutdown" signal at USR2.  Upon receiving this signal, the parent drops a
mark in the scoreboard, and does a shutdown() on the listen socket(s).
This indicates to the OS that it should stop accepting new connections on
the socket.  The existing requests continue to completion, and children
can even accept new requests off of the listen backlog
(ap_max_request_per_child is bumped up in the children in this case,
because the parent won't spin up new servers if the children die). Once
the backlog is empty, continued accept() calls will result in an error
(ECONNABORTED), and the child bails out.  Basically, this was to play nice
with our load balancer, you can shut down and start up servers with zero
requests lost in transit.

If that wasn't cool enough - once the parent calls shutdown() on the
listen sockets, FreeBSD lets us _immediately_ start a new server on the
same sockets.  So, rolling a new build is as simple as "shut the old one
down, start the new one up."

[BTW, I agree with the children killing themselves when the parent goes
away, perhaps configurable between "Kill yourself ASAP" versus "Kill
yourself when you come up for air between requests."  The notion of
killing everything which _looks_ like an Apache child scares me (what if
you're running multiple servers on a box?).]

Later,
scott hess
[EMAIL PROTECTED]