David We're working on exactly the same thing now :-)
I'll ask Chris to email the list once we get past it. I think the problem is with the mixtures of different encodings (latin-1 and UTF-8) in the Spanish Wikipedia and the way our code is handling this. For some reason Python's print (at times) wants to default to ascii, even after we explicitly tell it to use UTF-8. -Sean On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez <da...@tuxbrain.com> wrote: > > Hi I'm trying to generate the file for a spainsh wikipedia on the WR , > after compiling succsesfuly the source on the git and solve some > annoyings with utf8 encoding on phyton error was somthing like this: > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in > position....: ordinal not in range(128) > this was solved changing the default encode "ascii" to "utf8" int the > /usr/lib/python2.6/site.py file > after this I was hable to execute ok the instruction: > make DESTDIR=image WORKDIR=work > XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index > parse render combine > > Every thing seem fine for a couple(about 6-7h) of hours parsing the > 700000 articles in spanish but then ... the horror > Count: 380000 > Traceback (most recent call last): > File "./ArticleParser.py", line 224, in <module> > main() > File "./ArticleParser.py", line 172, in main > process_article_text(title.encode('utf-8'), f.read(length), newf) > File "./ArticleParser.py", line 218, in process_article_text > newf.write(text + '\n') > IOError: [Errno 32] Broken pipe > make[1]: *** [parse] Error 1 > make[1]: se sale del directorio > `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' > make: *** [parse] Error 2 > > I have relaunched the process again with the (few)hope that was a > temporary fault but If any one has a clue will be helpfull. > > BTW.- I documenting all this proccess to make a step by step howto on > how to put the wikipedia in other languages on the wikireader. > > > > David Reyes Samblas Martinez > http://www.tuxbrain.com > Open ultraportable & embedded solutions > Openmoko, Openpandora, Arduino > Hey, watch out!!! There's a linux in your pocket!!! > > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community