Looking for some apache config help to block evil spiders

2009-10-10 Thread Steven W. Orr
I never really checked before, but I have a lot of evil spiders crawling
around my server. Some of them respect my robots.txt file and others do not.
Some of the ones that do are still *very* pushy. So I decided to shut that
bastards off. Here's what I added to my httpd.conf:

RewriteLoglogs/rewrite_log
RewriteLogLevel 1

RewriteCond %{HTTP_USER_AGENT}  ^Baiduspider.* [OR]
RewriteCond %{HTTP_USER_AGENT}  ^msnbot.* [OR]
RewriteCond %{HTTP_USER_AGENT}  ^NaverBot.* [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Sogou-Test-Spider.*
RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/4.0.*
RewriteCond %{HTTP_USER_AGENT}  ^T-Mobile Dash.*
RewriteRule .* - [F,L]

and inside each of the virtual domains, I added:

RewriteEngine On
RewriteOptions Inherit

Here's the problem. What I want to see is the rewrite_log telling me what it
has redirected or failed. Instead, I'm getting a line telling me every link
that it does NOT rewrite. For example:

72.30.65.61 - - [10/Oct/2009:14:28:24 --0400] \
[vdom.syslang.net/sid#b7298ed0][rid#b6b488e8/initial] (1) pass through /d1/fn

I have googled my brains out and it seems like others have had the same
questions. I see no answers. If anyone has any idea I love to hear it.

I understand that nod_rewrite is complicated, but what I'd like to end up with
a log of all the spiders that got rejected by my rules. Current;y, the
access_log tells me where the attempt is, the error_log tell me nothing and
the rewrite_log is telling me more than I want with none of what I need.

The goal is to see the spiders bouncing off.

Anyone?

-- 
Time flies like the wind. Fruit flies like a banana. Stranger things have  .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net



signature.asc
Description: OpenPGP digital signature
-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: Looking for some apache config help to block evil spiders

2009-10-10 Thread Sharpe, Sam J
2009/10/10 Steven W. Orr ste...@syslang.net:
 RewriteCond %{HTTP_USER_AGENT}  ^Baiduspider.* [OR]
 RewriteCond %{HTTP_USER_AGENT}  ^msnbot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}  ^NaverBot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}  ^Sogou-Test-Spider.*
 RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/4.0.*
 RewriteCond %{HTTP_USER_AGENT}  ^T-Mobile Dash.*
 RewriteRule .* - [F,L]

Are you actually missing the [OR] at the end of the 4th and 5th
RewriteCond lines, or is that a mispaste...

If you are missing the [OR] then you are only matching things that
starts with any of the top four matches AND Mozilla/4.0 AND T-Mobile
Dash (somewhat mutually exclusive!)...

Correct that and try again...

-- 
Sam

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Looking for some apache config help to block evil spiders

2009-10-10 Thread Sharpe, Sam J
2009/10/10 Sharpe, Sam J sam.sharpe+lists.red...@gmail.com:
 2009/10/10 Steven W. Orr ste...@syslang.net:
 RewriteCond %{HTTP_USER_AGENT}  ^Baiduspider.* [OR]
 RewriteCond %{HTTP_USER_AGENT}  ^msnbot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}  ^NaverBot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}  ^Sogou-Test-Spider.*
 RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/4.0.*
 RewriteCond %{HTTP_USER_AGENT}  ^T-Mobile Dash.*
 RewriteRule .* - [F,L]

 Are you actually missing the [OR] at the end of the 4th and 5th
 RewriteCond lines, or is that a mispaste...

 If you are missing the [OR] then you are only matching things that
 starts with any of the top four matches AND Mozilla/4.0 AND T-Mobile
 Dash (somewhat mutually exclusive!)...

I found this for a customer today, it's a cracking read and has some
great pre-written ways of blocking this kind of thing:
http://www.askapache.com/htaccess/fight-blog-spam-with-apache.html


-- 
Sam

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Looking for some apache config help to block evil spiders

2009-10-10 Thread Steven W. Orr
On 10/10/09 14:37, quoth Steven W. Orr:
 I never really checked before, but I have a lot of evil spiders crawling
 around my server. Some of them respect my robots.txt file and others do not.
 Some of the ones that do are still *very* pushy. So I decided to shut that
 bastards off. Here's what I added to my httpd.conf:
 
 RewriteLoglogs/rewrite_log
 RewriteLogLevel 1
 
 RewriteCond %{HTTP_USER_AGENT}  ^Baiduspider.* [OR]
 RewriteCond %{HTTP_USER_AGENT}^msnbot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}^NaverBot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}^Sogou-Test-Spider.*
 RewriteCond %{HTTP_USER_AGENT}^Mozilla/4.0.*
 RewriteCond %{HTTP_USER_AGENT}^T-Mobile Dash.*
 RewriteRule .* - [F,L]
 
 and inside each of the virtual domains, I added:
 
 RewriteEngine On
 RewriteOptions Inherit
 
 Here's the problem. What I want to see is the rewrite_log telling me what it
 has redirected or failed. Instead, I'm getting a line telling me every link
 that it does NOT rewrite. For example:
 
 72.30.65.61 - - [10/Oct/2009:14:28:24 --0400] \
 [vdom.syslang.net/sid#b7298ed0][rid#b6b488e8/initial] (1) pass through /d1/fn
 
 I have googled my brains out and it seems like others have had the same
 questions. I see no answers. If anyone has any idea I love to hear it.
 
 I understand that nod_rewrite is complicated, but what I'd like to end up with
 a log of all the spiders that got rejected by my rules. Current;y, the
 access_log tells me where the attempt is, the error_log tell me nothing and
 the rewrite_log is telling me more than I want with none of what I need.
 
 The goal is to see the spiders bouncing off.
 
 Anyone?
 
 

On 10/10/09 14:55, quoth Sharpe, Sam J:
 Are you actually missing the [OR] at the end of the 4th and 5th
 RewriteCond lines, or is that a mispaste...

Yes, thanks, I missed that, but that isn't the problem. The problem is that I
want to be able to see what gets rejected in the log files.

 I found this for a customer today, it's a cracking read and has some
 great pre-written ways of blocking this kind of thing:
 http://www.askapache.com/htaccess/fight-blog-spam-with-apache.html

Turns out there are a number of these kinds of pages out there and some of
them are asking the same question I am: How can I see the rejects in the log
files?

Anyone?


-- 
Time flies like the wind. Fruit flies like a banana. Stranger things have  .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net



signature.asc
Description: OpenPGP digital signature
-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines

Re: Looking for some apache config help to block evil spiders

2009-10-10 Thread Sharpe, Sam J
2009/10/10 Steven W. Orr ste...@syslang.net:
 On 10/10/09 14:37, quoth Steven W. Orr:
 RewriteCond %{HTTP_USER_AGENT}  ^Baiduspider.* [OR]
 RewriteCond %{HTTP_USER_AGENT}        ^msnbot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}        ^NaverBot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}        ^Sogou-Test-Spider.*
 RewriteCond %{HTTP_USER_AGENT}        ^Mozilla/4.0.*
 RewriteCond %{HTTP_USER_AGENT}        ^T-Mobile Dash.*
 RewriteRule .* - [F,L]

 The goal is to see the spiders bouncing off.

 On 10/10/09 14:55, quoth Sharpe, Sam J:
 Are you actually missing the [OR] at the end of the 4th and 5th
 RewriteCond lines, or is that a mispaste...

 Yes, thanks, I missed that, but that isn't the problem. The problem is that I
 want to be able to see what gets rejected in the log files.

Your rule didn't match anything, because there are mutually exclusive
options ANDed  - that was my point.

You can't have a user_agent that starts with Mozilla AND Sogou - it
has to be one or the other, so you would have never seen anything in
the logs.

Without access to ALL your rewrite rules, I can't tell you whether
lines such as:

 72.30.65.61 - - [10/Oct/2009:14:28:24 --0400] \
 [vdom.syslang.net/sid#b7298ed0][rid#b6b488e8/initial] (1) pass through /d1/fn

are hits on the match set you posted above, or hits on another rewrite
rule, but I don't see any evidence that it's the Spider matching rule
that is generating those lines either.

You might also try upping RewriteLogLevel to something higher than 1
to see more detail...

--
Sam

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Looking for some apache config help to block evil spiders

2009-10-10 Thread Tony Nelson
On 09-10-10 14:37:29, Steven W. Orr wrote:
 ... Here's what I added to my httpd.conf:
 
 RewriteLoglogs/rewrite_log
 RewriteLogLevel 1
 
 RewriteCond %{HTTP_USER_AGENT}  ^Baiduspider.* [OR]
 RewriteCond %{HTTP_USER_AGENT}^msnbot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}^NaverBot.* [OR]
 RewriteCond %{HTTP_USER_AGENT}^Sogou-Test-Spider.*
 RewriteCond %{HTTP_USER_AGENT}^Mozilla/4.0.*
 RewriteCond %{HTTP_USER_AGENT}^T-Mobile Dash.*
 RewriteRule .* - [F,L]
 
 and inside each of the virtual domains, I added:
 
 RewriteEngine On
 RewriteOptions Inherit
 
 Here's the problem. What I want to see is the rewrite_log telling me
 what it has redirected or failed. Instead, I'm getting a line telling 
 me every link that it does NOT rewrite. For example:
 
 72.30.65.61 - - [10/Oct/2009:14:28:24 --0400] \
 [vdom.syslang.net/sid#b7298ed0][rid#b6b488e8/initial] (1) pass 
 through /d1/fn
 
 I have googled my brains out and it seems like others have had the
 same questions. I see no answers. If anyone has any idea I love to 
 hear it.

WAG:  The RewriteRule doesn't actually rewrite anything.  Perhaps 
something would be logged if it did.  You'd probably still have the 
other log lines as well.

-- 

TonyN.:'   mailto:tonynel...@georgeanelson.com
  '  http://www.georgeanelson.com/

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines