I realized that it does not apply to the latest version. So I took the liberty of making a fork on github and merged it. http://github.com/tbaumann/wikireader
I think I cracked the nut, but have a look if you would be so kind. I'm not sure I completely got it. (Please ignore the first commit. I did not test correctly before checking in. :-/ ) Regards Tilman Baumann David Reyes Samblas Martinez wrote: > Here you have :) > > David Reyes Samblas Martinez > http://www.tuxbrain.com > Open ultraportable & embedded solutions > Openmoko, Openpandora, Arduino > Hey, watch out!!! There's a linux in your pocket!!! > > > > > 2009/11/30 Tilman Baumann <til...@baumann.name>: >> Hi, >> >> can you maybe release this as a patch? >> I like to inegrate this in github. But I fear I might miss something if >> I >> try to fiddle out the changes by hand. >> >> Thanks >> >> David Reyes Samblas Martinez wrote: >>> Sorry for the wait Thomas, >>> I was working to solve the broken pipe issue that stops the parser >>> when it finds an error. I have applied a quick and dirty workaround >>> using try-catch technique and now the process will not stop and just >>> skip the faulty article and keeps going :) it logs the faulty ones in >>> a text file (title and position) for posterior forensics, but my first >>> guesses in that is not a codification issue with utf8 is more an >>> unexpected formating tag the php parser don't know how to deal with >>> Actually parsing the german wikipedia with more than 1.3 million >>> articles >>> >>> Count: 1043000 >>> Failing count: 2 >>> >>> and keeps going I supose we can sacrificate two articles for having >>> one milion available now :) >>> >>> as you requested I uploaded my working compiled tools[1] but without >>> any xml sources it's about 113Mb, but if you have a working tools on >>> your system you just have to change >>> host-tools/offline-renderer/ArticleParser.py by the attached on this >>> mail and you can forget to cry like a child that his ice cream has >>> fall to the floor when after more than 24h parsing hundred of thousand >>> articles pased the process you see this ugly python error backtrace >>> blablabla and not your desired file :) >>> >>> by the way the faultyarticles.txt is saved at same >>> host-tools/offline-renderer directory, (i'm too lazy to put a >>> parameter for change that and I hardcoded the name of the file , >>> yes... don't waste typing on correct that bad habit, I know) >>> >>> If you have curiosity of what articles on the german wiki are causing >>> troubles >>> on dewiki-latest-pages-articles.xml (date 2009-11-20) >>> >>> ~Storck Bicycle >>> 832673 >>> ~Musculus serratus posterior inferior >>> 857334 >>> >>> Regards I hope I will upload the German wikipedia on Sunday... and >>> will be available on Monday, sorry for the wait but my Asymmetric DSL >>> is very asymmetric and upload 1.5-2 Gb (expected file size) will take >>> a bunch of hours. >>> >>> For those than wants to compile his own , go for it :) the >>> Quickreference in the doc directory on the souce is all you need to >>> start working, just remember than if you have a 64 bit system you >>> will have to follow the 64 bits method to compile the tools, >>> >>> Regards >>> [1]http://tuxbrain.org/downloads/wikireader/wikireaderbinaries20091127_dsamblas_modified_trycatch.tar.bz2 >>> David Reyes Samblas Martinez >>> http://www.tuxbrain.com >>> Open ultraportable & embedded solutions >>> Openmoko, Openpandora, Arduino >>> Hey, watch out!!! There's a linux in your pocket!!! >>> >>> >>> >>> >>> 2009/11/27 Thomas HOCEDEZ <thomas.hoce...@free.fr>: >>>> Thomas HOCEDEZ a écrit : >>>>> >>>>> Hi DAvid, >>>>> >>>>> Can you share your scripts & configs to do the same in French (and >>>>> other >>>>> languages) ? >>>>> Thanks >>>>> >>>>> Thomas >>>>> >>>>> >>>> >>>> As the Mailing list seems to be broken (or users started hibernating >>>> for >>>> winter...) I find by myself the way to compile things step by step. >>>> I'm for now rendering the French Wikipedia. As it started a few >>>> minutes >>>> ago, >>>> the result will be availabel during the weekend (I hope). >>>> >>>> I'll also post the way I managed to do so ! (I'm at the office for >>>> now, >>>> and >>>> I'm leaving...) >>>> >>>> Regards to you all ! >>>> >>>> Thomas >>>> >>> _______________________________________________ >>> Openmoko community mailing list >>> community@lists.openmoko.org >>> http://lists.openmoko.org/mailman/listinfo/community >>> >> >> >> -- >> >> >> >> _______________________________________________ >> Openmoko community mailing list >> community@lists.openmoko.org >> http://lists.openmoko.org/mailman/listinfo/community >> > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > -- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community