Re: [Sugar-devel] Edit/audit wikipedia activity

2010-11-03 Thread Martin Langhoff
On Tue, Oct 26, 2010 at 4:51 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Thu, Oct 21, 2010 at 12:06 PM, Martin Langhoff
 martin.langh...@gmail.com wrote:
 Unfortunately, there is a clear need to organise a facility to
 audit/edit the wikipedia snapshots we have and repack the archive.

 Some simple rough mods to server.py to allow local edits

Now with a few minor updates and fixes. Documentation at

   http://wiki.laptop.org/go/WikiBrowse_Editing

cheers,


m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] Edit/audit wikipedia activity

2010-10-26 Thread Martin Langhoff
On Thu, Oct 21, 2010 at 12:06 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 Unfortunately, there is a clear need to organise a facility to
 audit/edit the wikipedia snapshots we have and repack the archive.

Some simple rough mods to server.py to allow local edits -- start
server.py with an additional argument (a path to an existing
directory) and it'll save its results there.

Start it like

  ./server.py 8080 /home/martin/wikiedits

The server shows the changed files, which will go into a 'wiki'
subdirectory there.

You can check the edits thus:

  diff -ur /home/martin/wikiedits/wiki.orig /home/martin/wikiedits/wiki

And mergeupdates.py to... um, merge those updates

bzcat es_PE.xml.bz2.processed | tools/mergeupdates.py //wiki | bzip2
 es_PE.xml.bz2.processed.changed

You'll have to re-create the indexes (look at what woip/sh/process
does right after processing the file).

git clone git://dev.laptop.org/users/martin/wikiserver

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] Edit/audit wikipedia activity

2010-10-23 Thread Bastien
Hi Martin,

Martin Langhoff martin.langh...@gmail.com writes:

 Do we have any easy way to do this? Is any of the many static
 wikipedia project already working on audit/edit/polish facilites we
 can reuse (that accepts our format)?

FYI I forwarded your question to the maintainer of Kiwix
(http://www.kiwix.org/) - he is currently actively working 
on a kiwix.xo activity, and I agree your point is important.

I'll follow-up with answers, if any.

Best,

-- 
 Bastien
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] Edit/audit wikipedia activity

2010-10-22 Thread Martin Langhoff
Hi Chris,

(while Mitch does his magic on 1.75... I distract you a bit...)  -- I
am looking at reproducing the re-compile current es_PE Wikipedia
Bundle process.

Looking at the instructions in http://wiki.laptop.org/go/WikiBrowse,
it's not 100% clear. For a trivial example, if I have an updated
blacklist, how would I  re-run the process? Where are the source files
for wikipedia content and traffic stats?

I have a checkout of http://dev.laptop.org/git/projects/wikiserver/
and looking at http://dev.laptop.org/~cjb/eswiki/

My intention is to understand the process better to see how we edit or
override content.

thanks!



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


[Sugar-devel] Edit/audit wikipedia activity

2010-10-21 Thread Martin Langhoff
Hi list,

we are getting interesting news of not-quite-good content in Wikipedia
content included in the Wikipedia activities.

Unfortunately, there is a clear need to organise a facility to
audit/edit the wikipedia snapshots we have and repack the archive.

Do we have any easy way to do this? Is any of the many static
wikipedia project already working on audit/edit/polish facilites we
can reuse (that accepts our format)?

Has anyone been working on this area at all?

Is anyone interested in working on this area?

cheers,



m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] Edit/audit wikipedia activity

2010-10-21 Thread Tabitha Roder
On 22 October 2010 05:06, Martin Langhoff martin.langh...@gmail.com wrote:

 we are getting interesting news of not-quite-good content in Wikipedia
 content included in the Wikipedia activities.


Schools wiki is quite good as it has images as well as text on 5000 pages of
curriculum (British) appropriate content. This could be a good place to
start
Tabitha
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] Edit/audit wikipedia activity

2010-10-21 Thread Benjamin M. Schwartz
On 10/21/2010 12:06 PM, Martin Langhoff wrote:
 Unfortunately, there is a clear need to organise a facility to
 audit/edit the wikipedia snapshots we have and repack the archive.
 
 Do we have any easy way to do this?

I'm the wrong person to answer this question, but the activity's archive
production system does already have support for an article blacklist (and
indeed many articles were excluded from the current bundles).  I don't
know who is in possession of this list, or exactly who took responsibility
for producing the most recent version.  Nonetheless, excluding articles is
easy.

Actually editing article text is not something we have attempted AFAIK.
Ideally, I think, we would fix textual problems upstream as they are
discovered.  The most recent available snapshots for English and Spanish
are 10-14 days old, so this strategy does create a delay, during which
time things can continue to change.

In general, I believe that auditing wikipedia is a fool's errand.  There
are 3.5 million articles in English Wikipedia, growing by over a thousand
a day.   Spanish wikipedia has 650,000 articles.  If people want to
create snapshots containing only whitelisted articles, that's fine, but
many of the links will be broken and the amount of information will be
much reduced.

--Ben



signature.asc
Description: OpenPGP digital signature
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel