Pier Fumagalli wrote:

> > [EMAIL PROTECTED] wrote:
> >>
> >> Brian Behlendorf wrote:
> >>>
> >>> The load average on apache.org spiked into the 300's a couple times this
> >>> weekend.
> >
> >>> I bet there's some cutoff, maybe it's 800, or 900, where
> >>> the # of parallel connections is just too much and something goes
> >>> nonlinear between Apache and the kernel.
> >>
> >> That's possible.  I will nudge it upward and keep a close eye on the load
> >> averages.

> > OK, with MaxClients at 900, I saw the load averages start to jump.  
> > I'm still seeing erratic ssh response.  Is there some way to monitor network
> > bandwidth utilization in real time?  I wonder if we are maxing out our NICs
> > this morning?  That would make the big downloads take longer, using up more httpd
> > processes.  If that's our problem, maybe we should pursue Joshua Slive's idea
> > of redirecting requests for the most popular downloads to a reliable mirror, like
> > nagoya.
 
> Please, feel free to redirect
> 
> www.apache.org/dist/httpd/(.*)   -> nagoya.apache.org/dist/httpd/$1
> www.apache.org/dist/jakarta/(.*) -> nagoya.apache.org/dist/jakarta/$1
> jakarta.apache.org/builds/(.*)   -> nagoya.apache.org/dist/jakarta/$1
> www.apache.org/dist/xml/(.*)     -> nagoya.apache.org/dist/xml/$1
> Xml.apache.org/dist/(.*)         -> nagoya.apache.org/dist/xml/$1
> 
> This should take some load away from daedalus, and maybe give you a better
> chance to investigate the problem further... Right now it's kinda of
> unbearable on there...

Well, I don't feel brave enough to redirect all of that at once without knowing
what's going on in the xml or jarkarta projects, and without being able to
monitor nagoya very well.  But I did add:

Redirect /dist/httpd/binaries http://nagoya.apache.org/dist/httpd/binaries

...as well as changing old Redirects from /httpd.apache.org/dist/ to point to
the proper place on nagoya, and bumping MaxClients from 775 to 800.

Please let me know if there are problems due to the Redirects I added.  There is
a list of the top 20 bandwidth eaters from yesterday at
http://www.apache.org/~gregames/top20.bytes_served.10.20 .  I suppose it would
be safe to put in individual redirects for those on the top 20 list that aren't
already covered until daedalus's response is sufficiently good (whatever that
means - maybe sub-minute response for shift-reload of http://xml.apache.org/ ). 
It's getting better already.  

httpd folks: there is a buglet or two relating to changing MaxClients via
graceful restart.  If you look at daedalus's server-status, there's a bunch of
processes at the end of the display mostly in "C" state, occasionally "G". 
grep'ping for the pids in the the error log, I see "long lost child came home"
messages.  I think we're stopping too soon when we search for the child's pid
after getting SIGCHILD.  mod_status is displaying "requests being processed" too
high, as if those long lost children were really there.  Then after I bumped
MaxClients up from 775 to 800, the number of httpd's actually running according
to ps seemed to top out at 777.  The difference there seems to be that we still
have several processes in "G" state in the lower part of the scoreboard.

Greg

Reply via email to