Re: [Sugar-devel] Edit/audit wikipedia activity
On Tue, Oct 26, 2010 at 4:51 PM, Martin Langhoff martin.langh...@gmail.com wrote: On Thu, Oct 21, 2010 at 12:06 PM, Martin Langhoff martin.langh...@gmail.com wrote: Unfortunately, there is a clear need to organise a facility to audit/edit the wikipedia snapshots we have and repack the archive. Some simple rough mods to server.py to allow local edits Now with a few minor updates and fixes. Documentation at http://wiki.laptop.org/go/WikiBrowse_Editing cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel
Re: [Sugar-devel] Edit/audit wikipedia activity
On Thu, Oct 21, 2010 at 12:06 PM, Martin Langhoff martin.langh...@gmail.com wrote: Unfortunately, there is a clear need to organise a facility to audit/edit the wikipedia snapshots we have and repack the archive. Some simple rough mods to server.py to allow local edits -- start server.py with an additional argument (a path to an existing directory) and it'll save its results there. Start it like ./server.py 8080 /home/martin/wikiedits The server shows the changed files, which will go into a 'wiki' subdirectory there. You can check the edits thus: diff -ur /home/martin/wikiedits/wiki.orig /home/martin/wikiedits/wiki And mergeupdates.py to... um, merge those updates bzcat es_PE.xml.bz2.processed | tools/mergeupdates.py //wiki | bzip2 es_PE.xml.bz2.processed.changed You'll have to re-create the indexes (look at what woip/sh/process does right after processing the file). git clone git://dev.laptop.org/users/martin/wikiserver cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel
Re: [Sugar-devel] Edit/audit wikipedia activity
Hi Martin, Martin Langhoff martin.langh...@gmail.com writes: Do we have any easy way to do this? Is any of the many static wikipedia project already working on audit/edit/polish facilites we can reuse (that accepts our format)? FYI I forwarded your question to the maintainer of Kiwix (http://www.kiwix.org/) - he is currently actively working on a kiwix.xo activity, and I agree your point is important. I'll follow-up with answers, if any. Best, -- Bastien ___ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel
Re: [Sugar-devel] Edit/audit wikipedia activity
Hi Chris, (while Mitch does his magic on 1.75... I distract you a bit...) -- I am looking at reproducing the re-compile current es_PE Wikipedia Bundle process. Looking at the instructions in http://wiki.laptop.org/go/WikiBrowse, it's not 100% clear. For a trivial example, if I have an updated blacklist, how would I re-run the process? Where are the source files for wikipedia content and traffic stats? I have a checkout of http://dev.laptop.org/git/projects/wikiserver/ and looking at http://dev.laptop.org/~cjb/eswiki/ My intention is to understand the process better to see how we edit or override content. thanks! m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel
[Sugar-devel] Edit/audit wikipedia activity
Hi list, we are getting interesting news of not-quite-good content in Wikipedia content included in the Wikipedia activities. Unfortunately, there is a clear need to organise a facility to audit/edit the wikipedia snapshots we have and repack the archive. Do we have any easy way to do this? Is any of the many static wikipedia project already working on audit/edit/polish facilites we can reuse (that accepts our format)? Has anyone been working on this area at all? Is anyone interested in working on this area? cheers, m -- martin.langh...@gmail.com mar...@laptop.org -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel
Re: [Sugar-devel] Edit/audit wikipedia activity
On 22 October 2010 05:06, Martin Langhoff martin.langh...@gmail.com wrote: we are getting interesting news of not-quite-good content in Wikipedia content included in the Wikipedia activities. Schools wiki is quite good as it has images as well as text on 5000 pages of curriculum (British) appropriate content. This could be a good place to start Tabitha ___ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel
Re: [Sugar-devel] Edit/audit wikipedia activity
On 10/21/2010 12:06 PM, Martin Langhoff wrote: Unfortunately, there is a clear need to organise a facility to audit/edit the wikipedia snapshots we have and repack the archive. Do we have any easy way to do this? I'm the wrong person to answer this question, but the activity's archive production system does already have support for an article blacklist (and indeed many articles were excluded from the current bundles). I don't know who is in possession of this list, or exactly who took responsibility for producing the most recent version. Nonetheless, excluding articles is easy. Actually editing article text is not something we have attempted AFAIK. Ideally, I think, we would fix textual problems upstream as they are discovered. The most recent available snapshots for English and Spanish are 10-14 days old, so this strategy does create a delay, during which time things can continue to change. In general, I believe that auditing wikipedia is a fool's errand. There are 3.5 million articles in English Wikipedia, growing by over a thousand a day. Spanish wikipedia has 650,000 articles. If people want to create snapshots containing only whitelisted articles, that's fine, but many of the links will be broken and the amount of information will be much reduced. --Ben signature.asc Description: OpenPGP digital signature ___ Sugar-devel mailing list Sugar-devel@lists.sugarlabs.org http://lists.sugarlabs.org/listinfo/sugar-devel