Some have wondered why anyone would want to abuse Scroogle using Tor. Apart from some malicious types that may be doing it for their own amusement, it looks to me like they are trying to datamine Google -- arguably the largest, most diverse database on the planet.
If you can manage to run a script 24/7 that datamines Google, you can monetize your results. Search engine optimizers would like to be able to do this. So would various directory builders. Doing it by scraping google.com directly is not easy. Scroogle provides 100 links of organic results per request, with less than one-half the byte-bloat that Google delivers for the same links and snippets. It is also much easier to parse Scroogle's simple output page than it is to parse Google's output page. I spend a couple hours per day blocking abusers. A huge amount of this is done through a couple dozen monitoring programs I've written, but for the most part these programs provide candidates for blocking only, and my wetware is needed to make the final determination. My efforts to counter abuse occasionally cause some programmers to consider using Tor to get Scroogle's results. About a year ago I began requiring any and all Tor searches at Scroogle to use SSL. Using SSL is always a good idea, but the main reason I did this is that the SSL requirement discouraged script writers who didn't know how to add this to their scripts. This policy helped immensely in cutting back on the abuse I was seeing from Tor. Now I'm seeing script writers who have solved the SSL problem. This leaves me with the user-agent, the search terms, and as a last resort, blocking Tor exit nodes. If they vary their search terms and user-agents, it can take hours to analyze patterns and accurately block them by returning a blank page. That's the way I prefer to do it, because I don't like to block Tor exit nodes. Those who are most sympathetic with what Tor is doing are also sympathetic with what Scroogle is doing. There's a lot of collateral damage associated with blocking Tor exit nodes, and I don't want to alienate the Tor community except as a last resort. One reason why Scroogle has lasted for more than six years is that we are nonprofit, and Google knows by now that I don't tolerate abuse. My job is to stop the abuser before Scroogle passes their search terms to Google. Abusers who use Tor make this more difficult for me. Blocking an IP address is easy, but blocking Tor abusers without alienating other Tor users is more complex. -- Daniel Brandt *********************************************************************** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talk in the body. http://archives.seul.org/or/talk/