On Fri, 06 Aug 2004, [EMAIL PROTECTED] wrote:

>>>>>> "TZ" == Ted Zlatanov <[EMAIL PROTECTED]> writes:
> 
>   >>> You misunderstand.  If registration is required, a crawler will fail
>   >>> anyway,
>   >> 
>   >> Unless the crawler is itself registered.  If I wrote a crawler, I'd
>   >> keep a database of usernames and passwords for this purpose.
> 
>   TZ> That's not a typical web crawler, and obviously not what I meant.
>   TZ> Such databases already exist (e.g. bugmenot) but using them to rip a
>   TZ> page is definitely abusive.  Think Google, not rip-off.
> 
> i wrote a crawler for a client that did just that (even had paid
> registration for the wall street journal). 

Sure, and I can write one too.  The context was Slashdot/Google-style
caching clients, not targeted research.  In that context, using
predefined u/p databases so people don't go to the original page is
abusive.  I explained that a redirect would handle this, if the
redirect to the mirrored page only happened when the original was not
available for N minutes.  

Ted
_______________________________________________
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to