
My concern is not really about the idea of blacklisting etc. I am concerned about the means. Certainly a public wikipage is not a good place to put accusations.

Le 23/06/2011 11:01, Richard Cyganiak a écrit :

On 23 Jun 2011, at 07:27, Antoine Zimmermann wrote:
I started a list here:

What's the use of this list? Assume it stays empty, as you hope.
What's the use?

That should be obvious.

Not to me. Can you elaborate?

Assume it gets filled with names: so what? It does not prove these
crawlers are bad. The authors of the crawlers can just remove
themselves from the list.

Check out the "watch" and "history" tabs on that page.


so on Thursday 23rd, 9:04, user foobar96 wrote that Sindice is a bad
crawler. Then what?

If a crawler is on the list, chances are that nobody would notice
anyway, especially not the kind of people that Martin is defending
in his email.

It takes very little effort to make a copy-paste Apache config
snippet that blocks the offending IP ranges. Pointing the victims of
abusive crawlers to such a snippet is a first-aid measure.

How do you know who are the victims? They somehow have to make themselves known so that they can be directed to the wiki page. If you know the victims, you'd better give them the config snippet directly. A wiki page which is /accusing/ people is much more likely to be inaccurate (or empty) than a wiki page with encyclopedic details on common knowledge.

If a crawler is put to the list because it is bad and measures are
taken, what happens when the crawler get fixed and become polite?
And what if measures are taken while the crawler was not bad at all
to start with?

It shifts some pain from the server operators to the crawler
operators who have to see how they get off the list again. That's a
good thing.

It's a public wiki. It can hardly be simpler to get off the list.

Surely, this list is utterly useless.

It's important to show that the community is taking the issue serious
and is establishing social norms and processes to deal with problems
as they arise. These processes will start out primitive, but I'd
claim that a wiki page is one step up in sophistication from this
mailing list thread.

I hear you, but not like that, not with a public wiki page.


Best, Richard

Maybe you can keep the page to describe what are the problems that
bad crawlers create and what are the measures that publishers can
take to overcome problematic situation.


The list is currently empty. I hope it stays that way.

Thank you all, Richard

Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex

Reply via email to