> From: thelastguard...@hotmail.com
> To: mediawiki-l@lists.wikimedia.org
> Date: Sat, 28 Feb 2009 12:21:13 -0800
> CC: thelastguard...@hotmail.com
> Subject: [Mediawiki-l] possible revision comparison optimization with diff3?
>
> Hello, I run a sort of semi busy wiki, and I have been experiencing
> difficulties with its CPU load lately, with load jumping to as high as 140
> at noon (not 1.4, not 14, but ~140). Obviously this brought the site to a
> crawl. After investigation I have found the course- multiple diff3
> comparisons were called at the same time.
>
>
>
> To explain the cause of this needs a little background explanation. The wiki
> I run deals with the edit of large text files. It is common to see pages
> with hundreds of kb of pure text on any given wiki page. Normally my servers
> would be able to handle the edit requests of these pages.
>
>
>
> However, it seems that searchbots/crawlbots (from both search engines and
> individual users) have been hitting my wiki pretty hard lately. Each of
> these bots tries to copy all the pages, this include Revision History of
> each of these 100kb sized wiki text pages. Since each page could have
> potentially hundreds of edits, for every single large text files, hundreds
> of Revision history diff (from lighttpd/apache -> php5 -> diff3? ) are
> spawned.
>
>
>
> I have done some testing on my servers, and I found that each diff3
> comparison of a typical large text page leads to a 3 increase of CPU load.
>
>
> Right now I have implemented a few temporary restrictions-
>
> 1. Limit # of conn per IP
>
> 2. Disallow all search bots
>
> 3. increase ram limit in php config file
>
> 4. Memcache wherever it's possible (not all servers have memcache)
>
>
>
> I have some problems with 1. and 2. . First of all, 1. doesn't really solve
> the load problem. The slowdown could still occur if multiple bots hit the
> site at the same time.
>
> 2. faces a similar problem. After I edited my rebots.txt, I discovered that
> some clowns are ignoring my robots.txt. Also, only Google supports regular
> expression in robots.txt, so I can't just use Disallow: *diff=* .
>
>
>
> I don't want to break these large text pages up because it makes it harder
> for scripts to compile the scripts together from the database directly.
>
>
>
>
>
> So I turn my attention to system level optimization. Does anyone have any
> experience with messing with diff3? Like for example switching to say
> libxdiff? Or renice the fcgi? (I use lighttpd) Or is it possible to disable
> Revision Comparison altogether for pages older than a certain age?
>
>
>
> Thanks for the help
>
>
>
> Tim
>
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Maybe these access of bots are normal and effects of the formation of
wikipedia2 and of another project i work on www.wikilogos.org ? sorry. was i
the creatorofwikipedia? maybe afterall whynot...
_________________________________________________________________
Téléphonez gratuitement à tous vos proches avec Windows Live Messenger !
Téléchargez-le maintenant !
http://www.windowslive.fr/messenger/1.asp
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l