Just one more comment: such a list could be useful if it's published by a well identified person or group who can be contacted in case of disagreement or to get off the list.

Le 23/06/2011 08:27, Antoine Zimmermann a écrit :
Le 22/06/2011 23:49, Richard Cyganiak a écrit :
On 21 Jun 2011, at 10:44, Martin Hepp wrote:
PS: I will not release the IP ranges from which the trouble
originated, but rest assured, there were top research institutions
among them.

The right answer is: name and shame. That is the way to teach them.

Like Karl said, we should collect information about abusive crawlers
so that site operators can defend themselves. It won't be *that* hard
to research and collect the IP ranges of offending universities.

I started a list here: http://www.w3.org/wiki/Bad_Crawlers

What's the use of this list?
Assume it stays empty, as you hope. What's the use?
Assume it gets filled with names: so what? It does not prove these
crawlers are bad. The authors of the crawlers can just remove themselves
from the list. If a crawler is on the list, chances are that nobody
would notice anyway, especially not the kind of people that Martin is
defending in his email. If a crawler is put to the list because it is
bad and measures are taken, what happens when the crawler get fixed and
become polite? And what if measures are taken while the crawler was not
bad at all to start with?
Surely, this list is utterly useless.

Maybe you can keep the page to describe what are the problems that bad
crawlers create and what are the measures that publishers can take to
overcome problematic situation.


AZ



The list is currently empty. I hope it stays that way.

Thank you all, Richard




Reply via email to