-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have noticed over the last few weeks that MSN's search bot ( http://search.msn.com/msnbot.htm ) is ignoring robots.txt entries.
My first indication was when I noticed that its hit count was much higher than any other crawling/slurping bot. I began scanning logs and found specific instances where msnbot had directly requested items that were specifically under robots.txt-blocked structures. Upon running some Web searches I soon discovered that this is a global problem. This page - http://algorhythm.org/archives/2003/06/27/bad_msnbot_bad.html - has a good rundown of the issue, including M$' response to it (in summary, they don't care). I decided to block that bot as a whole from the servers I control. Even though the bot ignores most of what it finds in robots.txt it currently honors statements that are specific to itself. This makes it easy to block from within robots.txt: User-agent: msnbot Disallow: / Hope this helps, Brian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFDR/yr8iwHek1OcGYRAn/GAJ0YpUxMCp+Bw0rKLlTuBF8Su6rELQCgvfAW Qmbo7XcD71/6N0hJJqx9UJk= =aaOl -----END PGP SIGNATURE----- _______________________________________________ RLUG mailing list [email protected] http://lists.rlug.org/mailman/listinfo/rlug
