On Sun, Nov 25, 2012 at 12:50 PM, jan iversen <j...@apache.org> wrote:
> e) repair "broken redirect", "double redirect", "non categorized pages".
> Remark no information will be deleted/changed.

This can be done using the pywikipedia bot.  There is an account
already in place for running maintenance bot tasks.  I'll contact you
off-list with the details.


> f) Convert all pages to UTF8 (mysql), most pages are defined as latin1, but
> some have UTF8 content, this will not work after an update.

The tables that are truly Latin1 will convert cleanly to UTF-8 (or at
least they should).  Any that are tagged as Latin1, but internally are
actually UTF-8 will do silly things.  This wrong encoding thing is a
known issue.  The upgrade scripts will see Latin1, attempt to convert
the content to UTF-8... except it's already UTF-8, and you end up with
double encoded content.  My suggestion is, try the upgrade on the test
system first, and if the conversion does the double encoding thing, it
should be visible, especially on double byte encoded languages
(Chinese for example).  I'll start hunting for my notes on this (TJ
has reminded me a couple of times, but I keep putting it off).


Clayton

Reply via email to