Re: [users@httpd] tuning question

2014-07-18 Thread David Favor

Miles Fidelman wrote:

Hi Folks,

Ever once in a while, a crawler comes along and starts indexing our site 
- and in the process pushes our server's load average through the roof.


Short of blocking the crawlers, can anybody suggest some quick tuning 
adjustments to make, to reduce load (setting the max. number of servers 
and/or requests, renicing processes)?


Yes - my next step is to go pour through manuals - but I expect others 
have done this enough to be able to point me at a few specific config 
file lines to change, and specific commands for identifying and renicing 
processes.


Thanks very much,

Miles Fidelman



http://BadBotBlocker.com is some code I use on client sites to do adaptive 
blocking.

   Warning: code is ugly + requires a good cleanup + packaging for different 
OSes.

Pretty simple.

1) Anyone who follows the /bad-spider/ link gets blocked for 1 hour
   via iptables (all ports + protocols)

2) After 1 hour the iptables block rule is removed

3) Add  links to /bad-spider/ in every file served,
   so only non-humans every "see this link"

4) add the link /bad-spider/ as blocked for everyone to robots.txt

This simple approach has cut down traffic by 80-90% for some of my clients.

Because the rules only last for 1 hour, no site is blocked forever.

Because the rules are adaptive (appear + disappear), there no maintenance.

After reboots, all rules are lost + process just starts again.

Very simple code.

- David, Skype: davidfavor

-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] tuning question

2014-07-12 Thread Jeff Trawick
On Sat, Jul 12, 2014 at 5:06 PM, Miles Fidelman 
wrote:

> Jeff Trawick wrote:
>
>  On Sat, Jul 12, 2014 at 1:25 PM, Miles Fidelman <
>> mfidel...@meetinghouse.net > wrote:
>>
>> Hi Folks,
>>
>> Ever once in a while, a crawler comes along and starts indexing
>> our site - and in the process pushes our server's load average
>> through the roof.
>>
>> Short of blocking the crawlers, can anybody suggest some quick
>> tuning adjustments to make, to reduce load (setting the max.
>> number of servers and/or requests, renicing processes)?
>>
>>
>> Use robots.txt to block access to dynamically generated resources which
>> are
>> expensive to generate and not necessary for search hits?
>>
>> Is it using a lot of concurrent requests, or is the main load issue due to
>> the cost of the requests it is making?
>>
>>  a bit of both



If you want to limit concurrent requests just from web crawlers, try
something like mod_qos.  (See
http://unix.stackexchange.com/questions/37481/throttling-web-crawlers)

If it were me, I'd try to block needless, expensive requests with
robots.txt too.  http://www.robotstxt.org/robotstxt.html



>
>
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.    Yogi Berra
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
> For additional commands, e-mail: users-h...@httpd.apache.org
>
>


-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


Re: [users@httpd] tuning question

2014-07-12 Thread Miles Fidelman

Jeff Trawick wrote:
On Sat, Jul 12, 2014 at 1:25 PM, Miles Fidelman 
mailto:mfidel...@meetinghouse.net>> wrote:


Hi Folks,

Ever once in a while, a crawler comes along and starts indexing
our site - and in the process pushes our server's load average
through the roof.

Short of blocking the crawlers, can anybody suggest some quick
tuning adjustments to make, to reduce load (setting the max.
number of servers and/or requests, renicing processes)?


Use robots.txt to block access to dynamically generated resources 
which are

expensive to generate and not necessary for search hits?

Is it using a lot of concurrent requests, or is the main load issue due to
the cost of the requests it is making?


a bit of both



--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org



Re: [users@httpd] tuning question

2014-07-12 Thread Jeff Trawick
On Sat, Jul 12, 2014 at 1:25 PM, Miles Fidelman 
wrote:

> Hi Folks,
>
> Ever once in a while, a crawler comes along and starts indexing our site -
> and in the process pushes our server's load average through the roof.
>
> Short of blocking the crawlers, can anybody suggest some quick tuning
> adjustments to make, to reduce load (setting the max. number of servers
> and/or requests, renicing processes)?
>

Use robots.txt to block access to dynamically generated resources which are
expensive to generate and not necessary for search hits?

Is it using a lot of concurrent requests, or is the main load issue due to
the cost of the requests it is making?



>
> Yes - my next step is to go pour through manuals - but I expect others
> have done this enough to be able to point me at a few specific config file
> lines to change, and specific commands for identifying and renicing
> processes.
>
> Thanks very much,
>
> Miles Fidelman
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.    Yogi Berra
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
> For additional commands, e-mail: users-h...@httpd.apache.org
>
>


-- 
Born in Roswell... married an alien...
http://emptyhammock.com/


[users@httpd] tuning question

2014-07-12 Thread Miles Fidelman

Hi Folks,

Ever once in a while, a crawler comes along and starts indexing our site 
- and in the process pushes our server's load average through the roof.


Short of blocking the crawlers, can anybody suggest some quick tuning 
adjustments to make, to reduce load (setting the max. number of servers 
and/or requests, renicing processes)?


Yes - my next step is to go pour through manuals - but I expect others 
have done this enough to be able to point me at a few specific config 
file lines to change, and specific commands for identifying and renicing 
processes.


Thanks very much,

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is.    Yogi Berra


-
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org