It looks to me as though Yahoo has some sort of robot defense operating. I was just testing a multi-threaded robot that I use to analyze discussions, including Yahoo's stock market boards. On the first run, it seemed to do fine, but when I tried to run it again a few minutes later, it didn't retrieve anything... so I tried going to the message boards using IE on the same machine. Every page is returning a 403 Forbidden error now -- even when I try to see robots.txt. As far as I know, Yahoo has never even had a robots.txt file.
I'm guessing that the speed of my robot triggered a block against this IP address. Another machine, in the same subnet, can access the pages just fine. I've been working on the underlying database for the last few weeks, so I haven't run the spider lately. Thus, I'm not sure when this behavior might hvae started. My robot is quite fast and my connection yields throughput of about 1 mbit/sec, so it certainly hit their server fairly hard. But hey, it's Yahoo. If they can't handle getting hit this hard on a mid-day Saturday, it's hard to imagine who can. No lectures about well-behaved robots, please. I know, I know. The next step for that robot will be to have each thread hit completely different domains. Perhaps each one will rotate through a few domains. Anybody know what Yahoo might be doing, or what its policy is about robots? I haven't been able to find anything that addresses the issue directly. I don't see anything under its TOS that would clearly apply. If they want to have a limit on robots, I sure would appreciate it if they would say what it is... It's been about 30 minutes now and I'm still blocked, it seems. Just checked from another machine -- they still have no robots.txt at all. Nick -- [EMAIL PROTECTED] (408) 904-7198