Re: Think before you write Semantic Web crawlers

Martin Hepp Tue, 21 Jun 2011 03:42:55 -0700

Thanks for the hint, but I am not talking about "my" servers.

I am talking about a site-owner somewhere in Kentucky running a small shop on 
www.godaddy.com who adds RDF to his site, informs PingTheSemanticWeb, and what 
he gets in turn are wild-west crawlers that bring down his tiny server by 
crawling the same data over and over again from a powerful University network.

These may be small nuisances from a historic perspective, but they may be the 
last trigger for the end of the academic/W3C part of the Semantic Web project.

As a side-note: Even if Google, Bing, and Yahoo say - "yes, folks - you can use 
RDFa and all of the fancy academic SW stuff if you absolutely want, and yes, we 
promise to not punish your sites" - which market share for RDFa and the 
traditional Web do you expect over Microdata and "their" way of adding 
structured data?

I bet a bottle of Champagne that the market-share of the academic Semantic Web 
movement will be less than 10 %, even if such a statement was made loud and 
clearly, in a year from now - not 10 % of the total Web, but 10 % of all 
structured Web content.

If you do not reach the SEO and Web hacker worlds, your project is dead, 
because even the greatest technology advantage that you may bring will be 
nothing against a 90 % dominance as far as the consumption of the data is 
concerned. And if you annoy site-owners, your are out of the game even quicker.

Best

Martin
On Jun 21, 2011, at 12:04 PM, Daniel Herzig wrote:

> 
> Hi Martin,
> 
> Have you tried to put a Squid [1]  as reverse proxy in front of your servers 
> and use delay pools [2] to catch hungry crawlers?
> May be that helps.
> 
> Cheers,
> Daniel
> 
> [1] http://www.squid-cache.org/
> [2] http://wiki.squid-cache.org/Features/DelayPools

Re: Think before you write Semantic Web crawlers

Reply via email to