On Fri, 06 Aug 2004, [EMAIL PROTECTED] wrote: >>>>>> "TZ" == Ted Zlatanov <[EMAIL PROTECTED]> writes: > > >>> You misunderstand. If registration is required, a crawler will fail > >>> anyway, > >> > >> Unless the crawler is itself registered. If I wrote a crawler, I'd > >> keep a database of usernames and passwords for this purpose. > > TZ> That's not a typical web crawler, and obviously not what I meant. > TZ> Such databases already exist (e.g. bugmenot) but using them to rip a > TZ> page is definitely abusive. Think Google, not rip-off. > > i wrote a crawler for a client that did just that (even had paid > registration for the wall street journal).
Sure, and I can write one too. The context was Slashdot/Google-style caching clients, not targeted research. In that context, using predefined u/p databases so people don't go to the original page is abusive. I explained that a redirect would handle this, if the redirect to the mirrored page only happened when the original was not available for N minutes. Ted _______________________________________________ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm