Re: [wikireader]Error on parsing the spanish wikipedia
On Sat, Oct 31, 2009 at 2:46 AM, David Reyes Samblas Martinez wrote: > just an think I realized , all faulty articles the title starts with > the "~" simbol David No that's not a problem. That character gets removed in a later build stage. We had to add that because of a integer conversion issue with SQLite. It was automatically converting articles like "1984" into integers (not strings) and storing them in the database. SQLite, BTW, claims this is a "feature". Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 11:29 PM, Laszlo KREKACS wrote: > On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez > wrote: >> Are you uploading this changes to git? can I take a look? > > Btw is there any plan to implement images rendering? Math (images) are on our roadmap. Hopefully before the end of this year. The screen is only 1bit. So anything else would look kinda funny. -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 11:22 PM, David Reyes Samblas Martinez wrote: > Are you uploading this changes to git? can I take a look? Yes. The latest commit fixes it. Have a look here: http://github.com/wikireader/wikireader Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
2009/10/30 Laszlo KREKACS : > On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez > wrote: >> Are you uploading this changes to git? can I take a look? > > Btw is there any plan to implement images rendering? > > If so, any time estimation? > > Best regards, > Laszlo > > ___ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > Some kind of renderer has been already implemented because keyboard, and the erase history dialog are images . I'm wrong? ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
just an think I realized , all faulty articles the title starts with the "~" simbol regards David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable & embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! 2009/10/30 David Reyes Samblas Martinez : > Are you uploading this changes to git? can I take a look? > > David Reyes Samblas Martinez > http://www.tuxbrain.com > Open ultraportable & embedded solutions > Openmoko, Openpandora, Arduino > Hey, watch out!!! There's a linux in your pocket!!! > > > > > 2009/10/30 Sean Moss-Pultz : >> On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez >> wrote: >>> Hi I'm trying to generate the file for a spainsh wikipedia on the WR , >>> after compiling succsesfuly the source on the git and solve some >>> annoyings with utf8 encoding on phyton error was somthing like this: >>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in >>> position: ordinal not in range(128) >>> this was solved changing the default encode "ascii" to "utf8" int the >>> /usr/lib/python2.6/site.py file >>> after this I was hable to execute ok the instruction: >>> make DESTDIR=image WORKDIR=work >>> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index >>> parse render combine >>> >>> Every thing seem fine for a couple(about 6-7h) of hours parsing the >>> 70 articles in spanish but then ... the horror >>> Count: 38 >>> Traceback (most recent call last): >>> File "./ArticleParser.py", line 224, in >>> main() >>> File "./ArticleParser.py", line 172, in main >>> process_article_text(title.encode('utf-8'), f.read(length), newf) >>> File "./ArticleParser.py", line 218, in process_article_text >>> newf.write(text + '\n') >>> IOError: [Errno 32] Broken pipe >>> make[1]: *** [parse] Error 1 >>> make[1]: se sale del directorio >>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' >>> make: *** [parse] Error 2 >> >> OK that's fixed now. Chris already checked in the code. Our build >> worked fine. We need to do a few more tweaks and then we can post a >> (super) early test image. Give us until early this coming week. >> >> -Sean >> >> ___ >> Openmoko community mailing list >> community@lists.openmoko.org >> http://lists.openmoko.org/mailman/listinfo/community >> > ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez wrote: > Are you uploading this changes to git? can I take a look? Btw is there any plan to implement images rendering? If so, any time estimation? Best regards, Laszlo ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
Are you uploading this changes to git? can I take a look? David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable & embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! 2009/10/30 Sean Moss-Pultz : > On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez > wrote: >> Hi I'm trying to generate the file for a spainsh wikipedia on the WR , >> after compiling succsesfuly the source on the git and solve some >> annoyings with utf8 encoding on phyton error was somthing like this: >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in >> position: ordinal not in range(128) >> this was solved changing the default encode "ascii" to "utf8" int the >> /usr/lib/python2.6/site.py file >> after this I was hable to execute ok the instruction: >> make DESTDIR=image WORKDIR=work >> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index >> parse render combine >> >> Every thing seem fine for a couple(about 6-7h) of hours parsing the >> 70 articles in spanish but then ... the horror >> Count: 38 >> Traceback (most recent call last): >> File "./ArticleParser.py", line 224, in >> main() >> File "./ArticleParser.py", line 172, in main >> process_article_text(title.encode('utf-8'), f.read(length), newf) >> File "./ArticleParser.py", line 218, in process_article_text >> newf.write(text + '\n') >> IOError: [Errno 32] Broken pipe >> make[1]: *** [parse] Error 1 >> make[1]: se sale del directorio >> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' >> make: *** [parse] Error 2 > > OK that's fixed now. Chris already checked in the code. Our build > worked fine. We need to do a few more tweaks and then we can post a > (super) early test image. Give us until early this coming week. > > -Sean > > ___ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez wrote: > Hi I'm trying to generate the file for a spainsh wikipedia on the WR , > after compiling succsesfuly the source on the git and solve some > annoyings with utf8 encoding on phyton error was somthing like this: > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in > position: ordinal not in range(128) > this was solved changing the default encode "ascii" to "utf8" int the > /usr/lib/python2.6/site.py file > after this I was hable to execute ok the instruction: > make DESTDIR=image WORKDIR=work > XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index > parse render combine > > Every thing seem fine for a couple(about 6-7h) of hours parsing the > 70 articles in spanish but then ... the horror > Count: 38 > Traceback (most recent call last): > File "./ArticleParser.py", line 224, in > main() > File "./ArticleParser.py", line 172, in main > process_article_text(title.encode('utf-8'), f.read(length), newf) > File "./ArticleParser.py", line 218, in process_article_text > newf.write(text + '\n') > IOError: [Errno 32] Broken pipe > make[1]: *** [parse] Error 1 > make[1]: se sale del directorio > `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' > make: *** [parse] Error 2 OK that's fixed now. Chris already checked in the code. Our build worked fine. We need to do a few more tweaks and then we can post a (super) early test image. Give us until early this coming week. -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 7:58 AM, Nelson Castillo wrote: > On Thu, Oct 29, 2009 at 6:54 PM, David Reyes Samblas Martinez > wrote: >> Great! :) good to see you are working on this!, please count on me for >> any testing to be done, I will try to make a look on the code myself >> to kill the bug but no time and nor expertise so no promises :P > > I haven't seen the code but if you don't feel like fixing it now you > can add a try/catch on the block that is processing each page so that > you have a wiki to play with while the error is fixed. Yeah we're trying exactly that Nelson. It's just a long process to render all this stuff. We actually have 9 quad-core systems running in parallel now. Each with at least six GB of ram :-) -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Fri, Oct 30, 2009 at 7:54 AM, David Reyes Samblas Martinez wrote: > Great! :) good to see you are working on this!, please count on me for > any testing to be done, I will try to make a look on the code myself > to kill the bug but no time and nor expertise so no promises :P We'll get it working. Just give us a bit of time. And it would be super helpful if you could help test / QA. Thanks a lot for the offer! -Sean ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
On Thu, Oct 29, 2009 at 6:54 PM, David Reyes Samblas Martinez wrote: > Great! :) good to see you are working on this!, please count on me for > any testing to be done, I will try to make a look on the code myself > to kill the bug but no time and nor expertise so no promises :P I haven't seen the code but if you don't feel like fixing it now you can add a try/catch on the block that is processing each page so that you have a wiki to play with while the error is fixed. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
Great! :) good to see you are working on this!, please count on me for any testing to be done, I will try to make a look on the code myself to kill the bug but no time and nor expertise so no promises :P David Reyes Samblas Martinez http://www.tuxbrain.com Open ultraportable & embedded solutions Openmoko, Openpandora, Arduino Hey, watch out!!! There's a linux in your pocket!!! 2009/10/30 Sean Moss-Pultz : > David > > We're working on exactly the same thing now :-) > > I'll ask Chris to email the list once we get past it. I think the > problem is with the mixtures of different encodings (latin-1 and > UTF-8) in the Spanish Wikipedia and the way our code is handling this. > For some reason Python's print (at times) wants to default to ascii, > even after we explicitly tell it to use UTF-8. > > -Sean > > > On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez > wrote: >> >> Hi I'm trying to generate the file for a spainsh wikipedia on the WR , >> after compiling succsesfuly the source on the git and solve some >> annoyings with utf8 encoding on phyton error was somthing like this: >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in >> position: ordinal not in range(128) >> this was solved changing the default encode "ascii" to "utf8" int the >> /usr/lib/python2.6/site.py file >> after this I was hable to execute ok the instruction: >> make DESTDIR=image WORKDIR=work >> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index >> parse render combine >> >> Every thing seem fine for a couple(about 6-7h) of hours parsing the >> 70 articles in spanish but then ... the horror >> Count: 38 >> Traceback (most recent call last): >> File "./ArticleParser.py", line 224, in >> main() >> File "./ArticleParser.py", line 172, in main >> process_article_text(title.encode('utf-8'), f.read(length), newf) >> File "./ArticleParser.py", line 218, in process_article_text >> newf.write(text + '\n') >> IOError: [Errno 32] Broken pipe >> make[1]: *** [parse] Error 1 >> make[1]: se sale del directorio >> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' >> make: *** [parse] Error 2 >> >> I have relaunched the process again with the (few)hope that was a >> temporary fault but If any one has a clue will be helpfull. >> >> BTW.- I documenting all this proccess to make a step by step howto on >> how to put the wikipedia in other languages on the wikireader. >> >> >> >> David Reyes Samblas Martinez >> http://www.tuxbrain.com >> Open ultraportable & embedded solutions >> Openmoko, Openpandora, Arduino >> Hey, watch out!!! There's a linux in your pocket!!! >> >> ___ >> Openmoko community mailing list >> community@lists.openmoko.org >> http://lists.openmoko.org/mailman/listinfo/community > > ___ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader]Error on parsing the spanish wikipedia
David We're working on exactly the same thing now :-) I'll ask Chris to email the list once we get past it. I think the problem is with the mixtures of different encodings (latin-1 and UTF-8) in the Spanish Wikipedia and the way our code is handling this. For some reason Python's print (at times) wants to default to ascii, even after we explicitly tell it to use UTF-8. -Sean On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez wrote: > > Hi I'm trying to generate the file for a spainsh wikipedia on the WR , > after compiling succsesfuly the source on the git and solve some > annoyings with utf8 encoding on phyton error was somthing like this: > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in > position: ordinal not in range(128) > this was solved changing the default encode "ascii" to "utf8" int the > /usr/lib/python2.6/site.py file > after this I was hable to execute ok the instruction: > make DESTDIR=image WORKDIR=work > XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index > parse render combine > > Every thing seem fine for a couple(about 6-7h) of hours parsing the > 70 articles in spanish but then ... the horror > Count: 38 > Traceback (most recent call last): > File "./ArticleParser.py", line 224, in > main() > File "./ArticleParser.py", line 172, in main > process_article_text(title.encode('utf-8'), f.read(length), newf) > File "./ArticleParser.py", line 218, in process_article_text > newf.write(text + '\n') > IOError: [Errno 32] Broken pipe > make[1]: *** [parse] Error 1 > make[1]: se sale del directorio > `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' > make: *** [parse] Error 2 > > I have relaunched the process again with the (few)hope that was a > temporary fault but If any one has a clue will be helpfull. > > BTW.- I documenting all this proccess to make a step by step howto on > how to put the wikipedia in other languages on the wikireader. > > > > David Reyes Samblas Martinez > http://www.tuxbrain.com > Open ultraportable & embedded solutions > Openmoko, Openpandora, Arduino > Hey, watch out!!! There's a linux in your pocket!!! > > ___ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community