Actually the tar file seems to be broken. So much for potentially missing stuff. ;)
Tilman Baumann wrote: > Hi, > > can you maybe release this as a patch? > I like to inegrate this in github. But I fear I might miss something if I > try to fiddle out the changes by hand. > > Thanks > > David Reyes Samblas Martinez wrote: >> Sorry for the wait Thomas, >> I was working to solve the broken pipe issue that stops the parser >> when it finds an error. I have applied a quick and dirty workaround >> using try-catch technique and now the process will not stop and just >> skip the faulty article and keeps going :) it logs the faulty ones in >> a text file (title and position) for posterior forensics, but my first >> guesses in that is not a codification issue with utf8 is more an >> unexpected formating tag the php parser don't know how to deal with >> Actually parsing the german wikipedia with more than 1.3 million >> articles >> >> Count: 1043000 >> Failing count: 2 >> >> and keeps going I supose we can sacrificate two articles for having >> one milion available now :) >> >> as you requested I uploaded my working compiled tools[1] but without >> any xml sources it's about 113Mb, but if you have a working tools on >> your system you just have to change >> host-tools/offline-renderer/ArticleParser.py by the attached on this >> mail and you can forget to cry like a child that his ice cream has >> fall to the floor when after more than 24h parsing hundred of thousand >> articles pased the process you see this ugly python error backtrace >> blablabla and not your desired file :) >> >> by the way the faultyarticles.txt is saved at same >> host-tools/offline-renderer directory, (i'm too lazy to put a >> parameter for change that and I hardcoded the name of the file , >> yes... don't waste typing on correct that bad habit, I know) >> >> If you have curiosity of what articles on the german wiki are causing >> troubles >> on dewiki-latest-pages-articles.xml (date 2009-11-20) >> >> ~Storck Bicycle >> 832673 >> ~Musculus serratus posterior inferior >> 857334 >> >> Regards I hope I will upload the German wikipedia on Sunday... and >> will be available on Monday, sorry for the wait but my Asymmetric DSL >> is very asymmetric and upload 1.5-2 Gb (expected file size) will take >> a bunch of hours. >> >> For those than wants to compile his own , go for it :) the >> Quickreference in the doc directory on the souce is all you need to >> start working, just remember than if you have a 64 bit system you >> will have to follow the 64 bits method to compile the tools, >> >> Regards >> [1]http://tuxbrain.org/downloads/wikireader/wikireaderbinaries20091127_dsamblas_modified_trycatch.tar.bz2 >> David Reyes Samblas Martinez >> http://www.tuxbrain.com >> Open ultraportable & embedded solutions >> Openmoko, Openpandora, Arduino >> Hey, watch out!!! There's a linux in your pocket!!! >> >> >> >> >> 2009/11/27 Thomas HOCEDEZ <thomas.hoce...@free.fr>: >>> Thomas HOCEDEZ a écrit : >>>> >>>> Hi DAvid, >>>> >>>> Can you share your scripts & configs to do the same in French (and >>>> other >>>> languages) ? >>>> Thanks >>>> >>>> Thomas >>>> >>>> >>> >>> As the Mailing list seems to be broken (or users started hibernating >>> for >>> winter...) I find by myself the way to compile things step by step. >>> I'm for now rendering the French Wikipedia. As it started a few minutes >>> ago, >>> the result will be availabel during the weekend (I hope). >>> >>> I'll also post the way I managed to do so ! (I'm at the office for now, >>> and >>> I'm leaving...) >>> >>> Regards to you all ! >>> >>> Thomas >>> >> _______________________________________________ >> Openmoko community mailing list >> community@lists.openmoko.org >> http://lists.openmoko.org/mailman/listinfo/community >> > > > -- > > -- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community