I remember hearing a couple of times that CorenSearchBot was down, but just assumed that something so important was being rescued, though I did wonder slightly about the recent net increase in articles on EN wiki. 3,738,826 articles today means we've way overshot the 3 million projection, the 3.5 million prediction is looking distinctly cautious and and even the 4 million by late 2012 http://commons.wikimedia.org/wiki/File:Enwikipediapercgrowth.PNG looks somewhat unceiling like.
Could we get Google and Bing to make an exception for CorenSearchbot? If not then I'd agree that a spider would make sense, though I've no idea what that would cost. Having our own spider could be useful for other things though, including: # bot adding of {{deadlink}} templates. # creating our own wayback machine showing webpages as they were when they were cited by our articles # a "may have moved here" table so we could add possibly moved here and wayback options to {{deadlink}}. # A bot to update links as sites reorganise and organisations rebrand, without it we could be mostly deadlinked as early as mid-century. #A bot that listed probable deaths based on obituaries in reliable sources and even updates to subjects' own websites would also be useful. # Possible breaches of our copyright would be another potential use, but maybe we just need to rename "what links here" as "what links here (internal)" and add "what links here (external)". WSC > > Message: 5 > Date: Wed, 14 Sep 2011 17:09:44 +0200 > From: Kim Bruning <k...@bruning.xs4all.nl> > Subject: Re: [Foundation-l] The WikiNews fork - for lack of a copyvio > detection bot half a project was lost > To: Wikimedia Foundation Mailing List > <foundation-l@lists.wikimedia.org> > Message-ID: <20110914170944.c22...@bruning.lan> > Content-Type: text/plain; charset=us-ascii > > On Wed, Sep 14, 2011 at 10:49:06AM -0500, Aaron Adrignola wrote: > > CorenSearchBot has not been operational for several months since Yahoo > > stopped allowing automated queries. Bing's terms of use don't permit > > this either and apparently the same is true for Google. > > It might be useful to have a community operated spider, then? In that way, > we could also optimize > our database for the kinds of queries we need. > > sincerely, > Kim Bruning > > > > > _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l