Re: LinkWalker

2004-03-19 Thread Chris



I have this same robot on my site. Can i Block this 
robot using .htaccess files..???




Chris


http://www.truefootball.com
http://www.worldofjerseys.com


Re: LinkWalker

2004-03-19 Thread Chris



I have this same robot on my site. Can i Block this 
robot using .htaccess files..???




Chris


http://www.truefootball.com
http://www.worldofjerseys.com


Re: LinkWalker

2002-01-08 Thread Jesse Goerz

On Tuesday 08 January 2002 01:38, Russell Coker wrote:
 On Mon, 7 Jan 2002 23:31, Nathan Strom wrote:
   I have a nasty web spider with an agent name of
   LinkWalker downloading everything on my site (including
   .tgz files).  Does anyone know anything about it?
 
  It's apparantly a link-validation robot operated by a
  company called SevenTwentyFour Incorporated, see:
  http://www.seventwentyfour.com/tech.html

 Oops.

 Actually they sent me an offer of a free trial to their
 service (which seems quite useful).  The free trial gave me
 some useful stats and let me fix a bunch of broken links (of
 course I didn't pay).

You can do the same thing with wget:
--spider
   When invoked with this option, Wget will behave as a Web
   spider, which means that it will not download the pages, just
   check that they are there.  You can use it to check your
   bookmarks, e.g. with:

wget --spider --force-html -i bookmarks.html

   This feature needs much more work for Wget to get close to 
   the functionality of real WWW spiders.

You'll be checking more than bookmarks but you get the idea.

Jesse


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2002-01-08 Thread Marcel Hicking

On 8 Jan 2002, at 9:56, Jesse Goerz wrote:

 On Tuesday 08 January 2002 01:38, Russell Coker wrote:
  On Mon, 7 Jan 2002 23:31, Nathan Strom wrote:
I have a nasty web spider with an agent name of
LinkWalker downloading everything on my site
(including .tgz files).  Does anyone know anything
about it?
  
   It's apparantly a link-validation robot operated by a
   company called SevenTwentyFour Incorporated, see:
   http://www.seventwentyfour.com/tech.html
 
  Oops.
 
  Actually they sent me an offer of a free trial to their
  service (which seems quite useful).  The free trial gave
  me some useful stats and let me fix a bunch of broken
  links (of course I didn't pay).

 You can do the same thing with wget:
 --spider
When invoked with this option, Wget will behave as a Web
spider, which means that it will not download the pages,
just check that they are there.  You can use it to check
your bookmarks, e.g. with:

 wget --spider --force-html -i bookmarks.html

This feature needs much more work for Wget to get close
to the functionality of real WWW spiders.

 You'll be checking more than bookmarks but you get the idea.


In case you are running ht://dig, there's a add-on
on the contributed works page to parse htdig's output
and generate a broken links report from it.
Since htdig touches every link anyway, quite intimating.

Cheers,
Marcel


--
   __
 .´  `.
 : :' !   Enjoy
 `. `´   Debian/GNU Linux
   `-   Now even on the 5 Euro banknote!


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2002-01-08 Thread Russell Coker
On Mon, 7 Jan 2002 23:31, Nathan Strom wrote:
  I have a nasty web spider with an agent name of LinkWalker downloading
  everything on my site (including .tgz files).  Does anyone know anything
  about it?

 It's apparantly a link-validation robot operated by a company called
 SevenTwentyFour Incorporated, see:
 http://www.seventwentyfour.com/tech.html

Oops.

Actually they sent me an offer of a free trial to their service (which seems 
quite useful).  The free trial gave me some useful stats and let me fix a 
bunch of broken links (of course I didn't pay).

Hmm, I wonder if they REALLY downloaded those files or aborted the transfers 
after the first few K (needed to verify that the link was correct).

Anyway I'll remove that line from my iptables configuration now!

 Personally, I think this is a rogue organization -- there was an entry
 from this spider in our logs coming from a Seven24 IP with a HTTP
 referrer of
 www.adultinterracialsexvideos.com/interracialsex/interracialgroupsexsen.htm
l. Needless to say, we do not run an adult web site and that referrer site
 does NOT have a link to us. Likely Seven24 is trying to clutter people's
 logs with references as a form of advertising.

A single entry in web logs does not mean much.  If I blocked every origin of 
a bad entry in my web logs I'd be busy all day doing it...

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page




Re: LinkWalker

2002-01-08 Thread Jesse Goerz
On Tuesday 08 January 2002 01:38, Russell Coker wrote:
 On Mon, 7 Jan 2002 23:31, Nathan Strom wrote:
   I have a nasty web spider with an agent name of
   LinkWalker downloading everything on my site (including
   .tgz files).  Does anyone know anything about it?
 
  It's apparantly a link-validation robot operated by a
  company called SevenTwentyFour Incorporated, see:
  http://www.seventwentyfour.com/tech.html

 Oops.

 Actually they sent me an offer of a free trial to their
 service (which seems quite useful).  The free trial gave me
 some useful stats and let me fix a bunch of broken links (of
 course I didn't pay).

You can do the same thing with wget:
--spider
   When invoked with this option, Wget will behave as a Web
   spider, which means that it will not download the pages, just
   check that they are there.  You can use it to check your
   bookmarks, e.g. with:

wget --spider --force-html -i bookmarks.html

   This feature needs much more work for Wget to get close to 
   the functionality of real WWW spiders.

You'll be checking more than bookmarks but you get the idea.

Jesse




Re: LinkWalker

2002-01-08 Thread Marcel Hicking
On 8 Jan 2002, at 9:56, Jesse Goerz wrote:

 On Tuesday 08 January 2002 01:38, Russell Coker wrote:
  On Mon, 7 Jan 2002 23:31, Nathan Strom wrote:
I have a nasty web spider with an agent name of
LinkWalker downloading everything on my site
(including .tgz files).  Does anyone know anything
about it?
  
   It's apparantly a link-validation robot operated by a
   company called SevenTwentyFour Incorporated, see:
   http://www.seventwentyfour.com/tech.html
 
  Oops.
 
  Actually they sent me an offer of a free trial to their
  service (which seems quite useful).  The free trial gave
  me some useful stats and let me fix a bunch of broken
  links (of course I didn't pay).

 You can do the same thing with wget:
 --spider
When invoked with this option, Wget will behave as a Web
spider, which means that it will not download the pages,
just check that they are there.  You can use it to check
your bookmarks, e.g. with:

 wget --spider --force-html -i bookmarks.html

This feature needs much more work for Wget to get close
to the functionality of real WWW spiders.

 You'll be checking more than bookmarks but you get the idea.


In case you are running ht://dig, there's a add-on
on the contributed works page to parse htdig's output
and generate a broken links report from it.
Since htdig touches every link anyway, quite intimating.

Cheers,
Marcel


--
   __
 .´  `.
 : :' !   Enjoy
 `. `´   Debian/GNU Linux
   `-   Now even on the 5 Euro banknote!




Re: LinkWalker

2002-01-07 Thread Nathan Strom

[EMAIL PROTECTED] (Russell Coker) wrote in message 
news:[EMAIL PROTECTED]...
 I have a nasty web spider with an agent name of LinkWalker downloading 
 everything on my site (including .tgz files).  Does anyone know anything 
 about it?

It's apparantly a link-validation robot operated by a company called
SevenTwentyFour Incorporated, see:
http://www.seventwentyfour.com/tech.html

I found your post while searching for information on this robot while
tracking its spoor through our HTTP logs here.

 I've added the following to my firewall setup to stop further attacks...
 
 # crappy LinkWalker - evil spider that downloads every file including .tgz on
 # the site
 iptables -A INPUT -j logitrej -p tcp -s 209.167.50.25 -d 0.0.0.0/0 --dport www

We were hit from 209.167.50.22; if you want to use iptables to block
this spider, I'd block all of NETBLK-NET-SEVEN24AUU1, 209.167.50.16 -
209.167.50.31.


Personally, I think this is a rogue organization -- there was an entry
from this spider in our logs coming from a Seven24 IP with a HTTP
referrer of 
www.adultinterracialsexvideos.com/interracialsex/interracialgroupsexsen.html.
Needless to say, we do not run an adult web site and that referrer
site does NOT have a link to us. Likely Seven24 is trying to clutter
people's logs with references as a form of advertising.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2002-01-07 Thread Nathan Strom

[EMAIL PROTECTED] (Russell Coker) wrote in message 
news:[EMAIL PROTECTED]...
 I wasn't aware that there was any format to robots.txt, I thought that the 
 mere presense of such a file would prevent robots from visiting.

Nope; see:
http://www.robotstxt.org/wc/robots.html


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2002-01-07 Thread Chris Wagner

Bwahahaha!!  Man, that is low.  Advertising to sysadmins through the access
logs  Sheesh.  But now that you mention 7-24, I think I recognize that.
I think they are a spam marketing outfit.

At 02:31 PM 1/7/02 -0800, Nathan Strom wrote:
Personally, I think this is a rogue organization -- there was an entry
from this spider in our logs coming from a Seven24 IP with a HTTP
referrer of
www.adultinterracialsexvideos.com/interracialsex/interracialgroupsexsen.html.
Needless to say, we do not run an adult web site and that referrer
site does NOT have a link to us. Likely Seven24 is trying to clutter
people's logs with references as a form of advertising.




--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--

0100


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2002-01-07 Thread Nathan Strom
[EMAIL PROTECTED] (Russell Coker) wrote in message news:[EMAIL PROTECTED]...
 I have a nasty web spider with an agent name of LinkWalker downloading 
 everything on my site (including .tgz files).  Does anyone know anything 
 about it?

It's apparantly a link-validation robot operated by a company called
SevenTwentyFour Incorporated, see:
http://www.seventwentyfour.com/tech.html

I found your post while searching for information on this robot while
tracking its spoor through our HTTP logs here.

 I've added the following to my firewall setup to stop further attacks...
 
 # crappy LinkWalker - evil spider that downloads every file including .tgz on
 # the site
 iptables -A INPUT -j logitrej -p tcp -s 209.167.50.25 -d 0.0.0.0/0 --dport www

We were hit from 209.167.50.22; if you want to use iptables to block
this spider, I'd block all of NETBLK-NET-SEVEN24AUU1, 209.167.50.16 -
209.167.50.31.


Personally, I think this is a rogue organization -- there was an entry
from this spider in our logs coming from a Seven24 IP with a HTTP
referrer of 
www.adultinterracialsexvideos.com/interracialsex/interracialgroupsexsen.html.
Needless to say, we do not run an adult web site and that referrer
site does NOT have a link to us. Likely Seven24 is trying to clutter
people's logs with references as a form of advertising.




Re: LinkWalker

2002-01-07 Thread Nathan Strom
[EMAIL PROTECTED] (Russell Coker) wrote in message news:[EMAIL PROTECTED]...
 I wasn't aware that there was any format to robots.txt, I thought that the 
 mere presense of such a file would prevent robots from visiting.

Nope; see:
http://www.robotstxt.org/wc/robots.html




Re: LinkWalker

2002-01-07 Thread Frank Louwers
 site does NOT have a link to us. Likely Seven24 is trying to clutter
 people's logs with references as a form of advertising.

... a practise we see more and more often here as well! Even
'respectable' major isp's are starting to do it!

It's a strange world ...

Frank Louwers
Openminds b.v.b.a.




Re: LinkWalker

2002-01-07 Thread Chris Wagner
Bwahahaha!!  Man, that is low.  Advertising to sysadmins through the access
logs  Sheesh.  But now that you mention 7-24, I think I recognize that.
I think they are a spam marketing outfit.

At 02:31 PM 1/7/02 -0800, Nathan Strom wrote:
Personally, I think this is a rogue organization -- there was an entry
from this spider in our logs coming from a Seven24 IP with a HTTP
referrer of
www.adultinterracialsexvideos.com/interracialsex/interracialgroupsexsen.html.
Needless to say, we do not run an adult web site and that referrer
site does NOT have a link to us. Likely Seven24 is trying to clutter
people's logs with references as a form of advertising.




--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--

0100




Re: LinkWalker

2001-12-24 Thread Russell Coker

On Mon, 24 Dec 2001 06:42, Jeremy Lunn wrote:
 On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
  I have a nasty web spider with an agent name of LinkWalker downloading
  everything on my site (including .tgz files).  Does anyone know anything
  about it?

 Surely you'd be able to disallow access to it with Apache?

Yes, but using iptables is easier.  My Apache setup is complex enough 
already...

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2001-12-24 Thread Jeremy Lunn

On Mon, Dec 24, 2001 at 11:43:09AM +0100, Russell Coker wrote:
   I have a nasty web spider with an agent name of LinkWalker downloading
   everything on my site (including .tgz files).  Does anyone know anything
   about it?
 
  Surely you'd be able to disallow access to it with Apache?
 
 Yes, but using iptables is easier.  My Apache setup is complex enough 
 already...

But that's assumming that it comes from the same IP addr.

-- 
Jeremy Lunn
Melbourne, Australia
http://www.jabber.org/ - the next generation of Instant Messaging.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2001-12-24 Thread Jeff Waugh

quote who=Russell Coker

  Why don't you just update your robots.txt to explicitly specify which
  files you don't or do, allow spiders access to. If it's a rule-obiding
  spider, that will be the end of it.
 
 I wasn't aware that there was any format to robots.txt, I thought that the 
 mere presense of such a file would prevent robots from visiting.

http://www.searchtools.com/robots/robots-txt.html

- Jeff

-- 
 Funny, I have no trouble distinguishing my mobile phone from the  
   others because it's in my _own fucking pocket_! - Mobile Rage   


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2001-12-24 Thread Russell Coker
On Mon, 24 Dec 2001 06:42, Jeremy Lunn wrote:
 On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
  I have a nasty web spider with an agent name of LinkWalker downloading
  everything on my site (including .tgz files).  Does anyone know anything
  about it?

 Surely you'd be able to disallow access to it with Apache?

Yes, but using iptables is easier.  My Apache setup is complex enough 
already...

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page




Re: LinkWalker

2001-12-24 Thread Jeremy Lunn
On Mon, Dec 24, 2001 at 11:43:09AM +0100, Russell Coker wrote:
   I have a nasty web spider with an agent name of LinkWalker downloading
   everything on my site (including .tgz files).  Does anyone know anything
   about it?
 
  Surely you'd be able to disallow access to it with Apache?
 
 Yes, but using iptables is easier.  My Apache setup is complex enough 
 already...

But that's assumming that it comes from the same IP addr.

-- 
Jeremy Lunn
Melbourne, Australia
http://www.jabber.org/ - the next generation of Instant Messaging.




Re: LinkWalker

2001-12-24 Thread Jeff Waugh
quote who=Russell Coker

  Why don't you just update your robots.txt to explicitly specify which
  files you don't or do, allow spiders access to. If it's a rule-obiding
  spider, that will be the end of it.
 
 I wasn't aware that there was any format to robots.txt, I thought that the 
 mere presense of such a file would prevent robots from visiting.

http://www.searchtools.com/robots/robots-txt.html

- Jeff

-- 
 Funny, I have no trouble distinguishing my mobile phone from the  
   others because it's in my _own fucking pocket_! - Mobile Rage   




LinkWalker

2001-12-23 Thread Russell Coker

I have a nasty web spider with an agent name of LinkWalker downloading 
everything on my site (including .tgz files).  Does anyone know anything 
about it?

I've added the following to my firewall setup to stop further attacks...

# crappy LinkWalker - evil spider that downloads every file including .tgz on
# the site
iptables -A INPUT -j logitrej -p tcp -s 209.167.50.25 -d 0.0.0.0/0 --dport www

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2001-12-23 Thread Nick Jennings

Why don't you just update your robots.txt to explicitly specify which
files you don't or do, allow spiders access to. If it's a rule-obiding
spider, that will be the end of it.

On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
 I have a nasty web spider with an agent name of LinkWalker downloading 
 everything on my site (including .tgz files).  Does anyone know anything 
 about it?
 
 I've added the following to my firewall setup to stop further attacks...
 
 # crappy LinkWalker - evil spider that downloads every file including .tgz on
 # the site
 iptables -A INPUT -j logitrej -p tcp -s 209.167.50.25 -d 0.0.0.0/0 --dport www
 
 -- 
 http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
 http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
 http://www.coker.com.au/projects.html Projects I am working on
 http://www.coker.com.au/~russell/ My home page
 
 
 -- 
 To UNSUBSCRIBE, email to [EMAIL PROTECTED]
 with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
 
 

-- 
  Nick Jennings


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2001-12-23 Thread Russell Coker

On Sun, 23 Dec 2001 20:28, Nick Jennings wrote:
 Why don't you just update your robots.txt to explicitly specify which
 files you don't or do, allow spiders access to. If it's a rule-obiding
 spider, that will be the end of it.

I wasn't aware that there was any format to robots.txt, I thought that the 
mere presense of such a file would prevent robots from visiting.

As for rule-abiding spiders, such programs will not download files ending in 
.wav, .mp3, .gz, .tgz, or .zip so I won't even see them.

That's why I usually don't even notice responsible web spiders such as google 
when browsing my web logs!

 On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
  I have a nasty web spider with an agent name of LinkWalker downloading
  everything on my site (including .tgz files).  Does anyone know anything
  about it?
 
  I've added the following to my firewall setup to stop further attacks...

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2001-12-23 Thread Nick Jennings

On Sun, Dec 23, 2001 at 09:17:54PM +0100, Russell Coker wrote:
 
 I wasn't aware that there was any format to robots.txt, I thought that the 
 mere presense of such a file would prevent robots from visiting.

Here is an example of my robots.txt

User-agent: *
Disallow: /webalizer/
Disallow: contacts.txt
Disallow: /dl/


 As for rule-abiding spiders, such programs will not download files ending in 
 .wav, .mp3, .gz, .tgz, or .zip so I won't even see them.

 That's why I usually don't even notice responsible web spiders such as google 
 when browsing my web logs!

 Hmm, I have had spiders grab .tgz's from me before, but not anymore.

 User-agent can be set to a specific spider agent-name, or * for all spiders.

-- 
  Nick Jennings


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2001-12-23 Thread Chris Wagner

You should be able to tell if it cares about robots.txt by looking in the
logs to see if it's downloading /robots.txt.  If it is then something like:
User-agent: LinkWalker
Disallow: /

will keep it off your site.  If it doesn't, then iptables will keep it away.
Robots info:
http://www.global-positioning.com/robots_text_file/index.html

The fact that it downloads binaries too makes me think it's a site sucker
and not a legit spider.


At 12:30 PM 12/23/01 -0800, Nick Jennings wrote:
On Sun, Dec 23, 2001 at 09:17:54PM +0100, Russell Coker wrote:
 
 I wasn't aware that there was any format to robots.txt, I thought that the 
 mere presense of such a file would prevent robots from visiting.





---=REMEMBER THE WORLD TRADE CENTER=---
___/`   WTC 911   `\___

0100


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: LinkWalker

2001-12-23 Thread Jeremy Lunn

On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
 I have a nasty web spider with an agent name of LinkWalker downloading 
 everything on my site (including .tgz files).  Does anyone know anything 
 about it?

Surely you'd be able to disallow access to it with Apache?

-- 
Jeremy Lunn
Melbourne, Australia
http://www.jabber.org/ - the next generation of Instant Messaging.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




LinkWalker

2001-12-23 Thread Russell Coker
I have a nasty web spider with an agent name of LinkWalker downloading 
everything on my site (including .tgz files).  Does anyone know anything 
about it?

I've added the following to my firewall setup to stop further attacks...

# crappy LinkWalker - evil spider that downloads every file including .tgz on
# the site
iptables -A INPUT -j logitrej -p tcp -s 209.167.50.25 -d 0.0.0.0/0 --dport www

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page




Re: LinkWalker

2001-12-23 Thread Nick Jennings
Why don't you just update your robots.txt to explicitly specify which
files you don't or do, allow spiders access to. If it's a rule-obiding
spider, that will be the end of it.

On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
 I have a nasty web spider with an agent name of LinkWalker downloading 
 everything on my site (including .tgz files).  Does anyone know anything 
 about it?
 
 I've added the following to my firewall setup to stop further attacks...
 
 # crappy LinkWalker - evil spider that downloads every file including .tgz on
 # the site
 iptables -A INPUT -j logitrej -p tcp -s 209.167.50.25 -d 0.0.0.0/0 --dport www
 
 -- 
 http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
 http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
 http://www.coker.com.au/projects.html Projects I am working on
 http://www.coker.com.au/~russell/ My home page
 
 
 -- 
 To UNSUBSCRIBE, email to [EMAIL PROTECTED]
 with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
 
 

-- 
  Nick Jennings




Re: LinkWalker

2001-12-23 Thread Russell Coker
On Sun, 23 Dec 2001 20:28, Nick Jennings wrote:
 Why don't you just update your robots.txt to explicitly specify which
 files you don't or do, allow spiders access to. If it's a rule-obiding
 spider, that will be the end of it.

I wasn't aware that there was any format to robots.txt, I thought that the 
mere presense of such a file would prevent robots from visiting.

As for rule-abiding spiders, such programs will not download files ending in 
.wav, .mp3, .gz, .tgz, or .zip so I won't even see them.

That's why I usually don't even notice responsible web spiders such as google 
when browsing my web logs!

 On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
  I have a nasty web spider with an agent name of LinkWalker downloading
  everything on my site (including .tgz files).  Does anyone know anything
  about it?
 
  I've added the following to my firewall setup to stop further attacks...

-- 
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/   Postal SMTP/POP benchmark
http://www.coker.com.au/projects.html Projects I am working on
http://www.coker.com.au/~russell/ My home page




Re: LinkWalker

2001-12-23 Thread Nick Jennings
On Sun, Dec 23, 2001 at 09:17:54PM +0100, Russell Coker wrote:
 
 I wasn't aware that there was any format to robots.txt, I thought that the 
 mere presense of such a file would prevent robots from visiting.

Here is an example of my robots.txt

User-agent: *
Disallow: /webalizer/
Disallow: contacts.txt
Disallow: /dl/


 As for rule-abiding spiders, such programs will not download files ending in 
 .wav, .mp3, .gz, .tgz, or .zip so I won't even see them.

 That's why I usually don't even notice responsible web spiders such as google 
 when browsing my web logs!

 Hmm, I have had spiders grab .tgz's from me before, but not anymore.

 User-agent can be set to a specific spider agent-name, or * for all spiders.

-- 
  Nick Jennings




Re: LinkWalker

2001-12-23 Thread Chris Wagner
You should be able to tell if it cares about robots.txt by looking in the
logs to see if it's downloading /robots.txt.  If it is then something like:
User-agent: LinkWalker
Disallow: /

will keep it off your site.  If it doesn't, then iptables will keep it away.
Robots info:
http://www.global-positioning.com/robots_text_file/index.html

The fact that it downloads binaries too makes me think it's a site sucker
and not a legit spider.


At 12:30 PM 12/23/01 -0800, Nick Jennings wrote:
On Sun, Dec 23, 2001 at 09:17:54PM +0100, Russell Coker wrote:
 
 I wasn't aware that there was any format to robots.txt, I thought that the 
 mere presense of such a file would prevent robots from visiting.





---=REMEMBER THE WORLD TRADE CENTER=---
___/`   WTC 911   `\___

0100




Re: LinkWalker

2001-12-23 Thread Jeremy Lunn
On Sun, Dec 23, 2001 at 05:41:47PM +0100, Russell Coker wrote:
 I have a nasty web spider with an agent name of LinkWalker downloading 
 everything on my site (including .tgz files).  Does anyone know anything 
 about it?

Surely you'd be able to disallow access to it with Apache?

-- 
Jeremy Lunn
Melbourne, Australia
http://www.jabber.org/ - the next generation of Instant Messaging.