Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Hi I'm trying to generate the file for a spainsh wikipedia on the WR , after compiling succsesfuly the source on the git and solve some annoyings with utf8 encoding on phyton error was somthing like this: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position: ordinal not in range(128) this was solved changing the default encode ascii to utf8 int the /usr/lib/python2.6/site.py file after this I was hable to execute ok the instruction: make DESTDIR=image WORKDIR=work XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index parse render combine Every thing seem fine for a couple(about 6-7h) of hours parsing the 70 articles in spanish but then ... the horror Count: 38 Traceback (most recent call last): File ./ArticleParser.py, line 224, in module main() File ./ArticleParser.py, line 172, in main process_article_text(title.encode('utf-8'), f.read(length), newf) File ./ArticleParser.py, line 218, in process_article_text newf.write(text + '\n') IOError: [Errno 32] Broken pipe make[1]: *** [parse] Error 1 make[1]: se sale del directorio `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' make: *** [parse] Error 2 OK that's fixed now. Chris already checked in the code. Our build worked fine. We need to do a few more tweaks and then we can post a (super) early test image. Give us until early this coming week. -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
Are you uploading this changes to git? can I take a look? David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! 2009/10/30 Sean Moss-Pultz s...@openmoko.com: On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Hi I'm trying to generate the file for a spainsh wikipedia on the WR , after compiling succsesfuly the source on the git and solve some annoyings with utf8 encoding on phyton error was somthing like this: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position: ordinal not in range(128) this was solved changing the default encode ascii to utf8 int the /usr/lib/python2.6/site.py file after this I was hable to execute ok the instruction: make DESTDIR=image WORKDIR=work XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index parse render combine Every thing seem fine for a couple(about 6-7h) of hours parsing the 70 articles in spanish but then ... the horror Count: 38 Traceback (most recent call last): File ./ArticleParser.py, line 224, in module main() File ./ArticleParser.py, line 172, in main process_article_text(title.encode('utf-8'), f.read(length), newf) File ./ArticleParser.py, line 218, in process_article_text newf.write(text + '\n') IOError: [Errno 32] Broken pipe make[1]: *** [parse] Error 1 make[1]: se sale del directorio `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' make: *** [parse] Error 2 OK that's fixed now. Chris already checked in the code. Our build worked fine. We need to do a few more tweaks and then we can post a (super) early test image. Give us until early this coming week. -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Are you uploading this changes to git? can I take a look? Btw is there any plan to implement images rendering? If so, any time estimation? Best regards, Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
just an think I realized , all faulty articles the title starts with the ~ simbol regards David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! 2009/10/30 David Reyes Samblas Martinez da...@tuxbrain.com: Are you uploading this changes to git? can I take a look? David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! 2009/10/30 Sean Moss-Pultz s...@openmoko.com: On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Hi I'm trying to generate the file for a spainsh wikipedia on the WR , after compiling succsesfuly the source on the git and solve some annoyings with utf8 encoding on phyton error was somthing like this: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position: ordinal not in range(128) this was solved changing the default encode ascii to utf8 int the /usr/lib/python2.6/site.py file after this I was hable to execute ok the instruction: make DESTDIR=image WORKDIR=work XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index parse render combine Every thing seem fine for a couple(about 6-7h) of hours parsing the 70 articles in spanish but then ... the horror Count: 38 Traceback (most recent call last): File ./ArticleParser.py, line 224, in module main() File ./ArticleParser.py, line 172, in main process_article_text(title.encode('utf-8'), f.read(length), newf) File ./ArticleParser.py, line 218, in process_article_text newf.write(text + '\n') IOError: [Errno 32] Broken pipe make[1]: *** [parse] Error 1 make[1]: se sale del directorio `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' make: *** [parse] Error 2 OK that's fixed now. Chris already checked in the code. Our build worked fine. We need to do a few more tweaks and then we can post a (super) early test image. Give us until early this coming week. -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
2009/10/30 Laszlo KREKACS laszlo.krekacs.l...@gmail.com: On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Are you uploading this changes to git? can I take a look? Btw is there any plan to implement images rendering? If so, any time estimation? Best regards, Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community Some kind of renderer has been already implemented because keyboard, and the erase history dialog are images . I'm wrong? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 11:22 PM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Are you uploading this changes to git? can I take a look? Yes. The latest commit fixes it. Have a look here: http://github.com/wikireader/wikireader Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 11:29 PM, Laszlo KREKACS laszlo.krekacs.l...@gmail.com wrote: On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Are you uploading this changes to git? can I take a look? Btw is there any plan to implement images rendering? Math (images) are on our roadmap. Hopefully before the end of this year. The screen is only 1bit. So anything else would look kinda funny. -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Sat, Oct 31, 2009 at 2:46 AM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: just an think I realized , all faulty articles the title starts with the ~ simbol David No that's not a problem. That character gets removed in a later build stage. We had to add that because of a integer conversion issue with SQLite. It was automatically converting articles like 1984 into integers (not strings) and storing them in the database. SQLite, BTW, claims this is a feature. Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
David We're working on exactly the same thing now :-) I'll ask Chris to email the list once we get past it. I think the problem is with the mixtures of different encodings (latin-1 and UTF-8) in the Spanish Wikipedia and the way our code is handling this. For some reason Python's print (at times) wants to default to ascii, even after we explicitly tell it to use UTF-8. -Sean On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Hi I'm trying to generate the file for a spainsh wikipedia on the WR , after compiling succsesfuly the source on the git and solve some annoyings with utf8 encoding on phyton error was somthing like this: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position: ordinal not in range(128) this was solved changing the default encode ascii to utf8 int the /usr/lib/python2.6/site.py file after this I was hable to execute ok the instruction: make DESTDIR=image WORKDIR=work XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index parse render combine Every thing seem fine for a couple(about 6-7h) of hours parsing the 70 articles in spanish but then ... the horror Count: 38 Traceback (most recent call last): File ./ArticleParser.py, line 224, in module main() File ./ArticleParser.py, line 172, in main process_article_text(title.encode('utf-8'), f.read(length), newf) File ./ArticleParser.py, line 218, in process_article_text newf.write(text + '\n') IOError: [Errno 32] Broken pipe make[1]: *** [parse] Error 1 make[1]: se sale del directorio `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' make: *** [parse] Error 2 I have relaunched the process again with the (few)hope that was a temporary fault but If any one has a clue will be helpfull. BTW.- I documenting all this proccess to make a step by step howto on how to put the wikipedia in other languages on the wikireader. David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
Great! :) good to see you are working on this!, please count on me for any testing to be done, I will try to make a look on the code myself to kill the bug but no time and nor expertise so no promises :P David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! 2009/10/30 Sean Moss-Pultz s...@openmoko.com: David We're working on exactly the same thing now :-) I'll ask Chris to email the list once we get past it. I think the problem is with the mixtures of different encodings (latin-1 and UTF-8) in the Spanish Wikipedia and the way our code is handling this. For some reason Python's print (at times) wants to default to ascii, even after we explicitly tell it to use UTF-8. -Sean On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Hi I'm trying to generate the file for a spainsh wikipedia on the WR , after compiling succsesfuly the source on the git and solve some annoyings with utf8 encoding on phyton error was somthing like this: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position: ordinal not in range(128) this was solved changing the default encode ascii to utf8 int the /usr/lib/python2.6/site.py file after this I was hable to execute ok the instruction: make DESTDIR=image WORKDIR=work XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index parse render combine Every thing seem fine for a couple(about 6-7h) of hours parsing the 70 articles in spanish but then ... the horror Count: 38 Traceback (most recent call last): File ./ArticleParser.py, line 224, in module main() File ./ArticleParser.py, line 172, in main process_article_text(title.encode('utf-8'), f.read(length), newf) File ./ArticleParser.py, line 218, in process_article_text newf.write(text + '\n') IOError: [Errno 32] Broken pipe make[1]: *** [parse] Error 1 make[1]: se sale del directorio `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' make: *** [parse] Error 2 I have relaunched the process again with the (few)hope that was a temporary fault but If any one has a clue will be helpfull. BTW.- I documenting all this proccess to make a step by step howto on how to put the wikipedia in other languages on the wikireader. David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Thu, Oct 29, 2009 at 6:54 PM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Great! :) good to see you are working on this!, please count on me for any testing to be done, I will try to make a look on the code myself to kill the bug but no time and nor expertise so no promises :P I haven't seen the code but if you don't feel like fixing it now you can add a try/catch on the block that is processing each page so that you have a wiki to play with while the error is fixed. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 7:54 AM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Great! :) good to see you are working on this!, please count on me for any testing to be done, I will try to make a look on the code myself to kill the bug but no time and nor expertise so no promises :P We'll get it working. Just give us a bit of time. And it would be super helpful if you could help test / QA. Thanks a lot for the offer! -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 7:58 AM, Nelson Castillo arhu...@freaks-unidos.net wrote: On Thu, Oct 29, 2009 at 6:54 PM, David Reyes Samblas Martinez da...@tuxbrain.com wrote: Great! :) good to see you are working on this!, please count on me for any testing to be done, I will try to make a look on the code myself to kill the bug but no time and nor expertise so no promises :P I haven't seen the code but if you don't feel like fixing it now you can add a try/catch on the block that is processing each page so that you have a wiki to play with while the error is fixed. Yeah we're trying exactly that Nelson. It's just a long process to render all this stuff. We actually have 9 quad-core systems running in parallel now. Each with at least six GB of ram :-) -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community