Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-30 Thread Sean Moss-Pultz
On Sat, Oct 31, 2009 at 2:46 AM, David Reyes Samblas Martinez
 wrote:
> just an think I realized , all faulty articles the title starts with
> the "~" simbol

David

No that's not a problem. That character gets removed in a later build
stage. We had to add that because of a integer conversion issue with
SQLite. It was automatically converting articles like "1984" into
integers (not strings) and storing them in the database.

SQLite, BTW, claims this is a "feature".

Sean

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-30 Thread Sean Moss-Pultz
On Fri, Oct 30, 2009 at 11:29 PM, Laszlo KREKACS
 wrote:
> On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez
>  wrote:
>> Are you uploading this changes to git? can I take a look?
>
> Btw is there any plan to implement images rendering?

Math (images) are on our roadmap. Hopefully before the end of this
year. The screen is only 1bit. So anything else would look kinda
funny.

 -Sean

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-30 Thread Sean Moss-Pultz
On Fri, Oct 30, 2009 at 11:22 PM, David Reyes Samblas Martinez
 wrote:
> Are you uploading this changes to git? can I take a look?

Yes. The latest commit fixes it. Have a look here:

  http://github.com/wikireader/wikireader

Sean

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-30 Thread David Reyes Samblas Martinez
2009/10/30 Laszlo KREKACS :
> On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez
>  wrote:
>> Are you uploading this changes to git? can I take a look?
>
> Btw is there any plan to implement images rendering?
>
> If so, any time estimation?
>
> Best regards,
>  Laszlo
>
> ___
> Openmoko community mailing list
> community@lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>
Some kind of renderer has been already implemented because keyboard,
and the erase history dialog are images .  I'm wrong?

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-30 Thread David Reyes Samblas Martinez
just an think I realized , all faulty articles the title starts with
the "~" simbol
regards
David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable & embedded solutions
Openmoko, Openpandora,  Arduino
Hey, watch out!!! There's a linux in your pocket!!!




2009/10/30 David Reyes Samblas Martinez :
> Are you uploading this changes to git? can I take a look?
>
> David Reyes Samblas Martinez
> http://www.tuxbrain.com
> Open ultraportable & embedded solutions
> Openmoko, Openpandora,  Arduino
> Hey, watch out!!! There's a linux in your pocket!!!
>
>
>
>
> 2009/10/30 Sean Moss-Pultz :
>> On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez
>>  wrote:
>>> Hi I'm trying to generate the file for a spainsh wikipedia on the WR ,
>>> after compiling succsesfuly the source on the git and solve some
>>> annoyings with utf8 encoding on phyton error was somthing like this:
>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
>>> position: ordinal not in range(128)
>>> this was solved changing the default encode "ascii" to "utf8" int the
>>> /usr/lib/python2.6/site.py file
>>> after this I was hable to execute ok the instruction:
>>> make DESTDIR=image WORKDIR=work
>>> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index
>>> parse render combine
>>>
>>> Every thing seem fine for a couple(about 6-7h) of hours parsing the
>>> 70 articles in spanish but  then ... the horror
>>> Count: 38
>>> Traceback (most recent call last):
>>>  File "./ArticleParser.py", line 224, in 
>>>    main()
>>>  File "./ArticleParser.py", line 172, in main
>>>    process_article_text(title.encode('utf-8'),  f.read(length), newf)
>>>  File "./ArticleParser.py", line 218, in process_article_text
>>>    newf.write(text + '\n')
>>> IOError: [Errno 32] Broken pipe
>>> make[1]: *** [parse] Error 1
>>> make[1]: se sale del directorio
>>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>>> make: *** [parse] Error 2
>>
>> OK that's fixed now. Chris already checked in the code. Our build
>> worked fine. We need to do a few more tweaks and then we can post a
>> (super) early test image. Give us until early this coming week.
>>
>>  -Sean
>>
>> ___
>> Openmoko community mailing list
>> community@lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>>
>

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-30 Thread Laszlo KREKACS
On Fri, Oct 30, 2009 at 4:22 PM, David Reyes Samblas Martinez
 wrote:
> Are you uploading this changes to git? can I take a look?

Btw is there any plan to implement images rendering?

If so, any time estimation?

Best regards,
 Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-30 Thread David Reyes Samblas Martinez
Are you uploading this changes to git? can I take a look?

David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable & embedded solutions
Openmoko, Openpandora,  Arduino
Hey, watch out!!! There's a linux in your pocket!!!




2009/10/30 Sean Moss-Pultz :
> On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez
>  wrote:
>> Hi I'm trying to generate the file for a spainsh wikipedia on the WR ,
>> after compiling succsesfuly the source on the git and solve some
>> annoyings with utf8 encoding on phyton error was somthing like this:
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
>> position: ordinal not in range(128)
>> this was solved changing the default encode "ascii" to "utf8" int the
>> /usr/lib/python2.6/site.py file
>> after this I was hable to execute ok the instruction:
>> make DESTDIR=image WORKDIR=work
>> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index
>> parse render combine
>>
>> Every thing seem fine for a couple(about 6-7h) of hours parsing the
>> 70 articles in spanish but  then ... the horror
>> Count: 38
>> Traceback (most recent call last):
>>  File "./ArticleParser.py", line 224, in 
>>    main()
>>  File "./ArticleParser.py", line 172, in main
>>    process_article_text(title.encode('utf-8'),  f.read(length), newf)
>>  File "./ArticleParser.py", line 218, in process_article_text
>>    newf.write(text + '\n')
>> IOError: [Errno 32] Broken pipe
>> make[1]: *** [parse] Error 1
>> make[1]: se sale del directorio
>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>> make: *** [parse] Error 2
>
> OK that's fixed now. Chris already checked in the code. Our build
> worked fine. We need to do a few more tweaks and then we can post a
> (super) early test image. Give us until early this coming week.
>
>  -Sean
>
> ___
> Openmoko community mailing list
> community@lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-30 Thread Sean Moss-Pultz
On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez
 wrote:
> Hi I'm trying to generate the file for a spainsh wikipedia on the WR ,
> after compiling succsesfuly the source on the git and solve some
> annoyings with utf8 encoding on phyton error was somthing like this:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
> position: ordinal not in range(128)
> this was solved changing the default encode "ascii" to "utf8" int the
> /usr/lib/python2.6/site.py file
> after this I was hable to execute ok the instruction:
> make DESTDIR=image WORKDIR=work
> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index
> parse render combine
>
> Every thing seem fine for a couple(about 6-7h) of hours parsing the
> 70 articles in spanish but  then ... the horror
> Count: 38
> Traceback (most recent call last):
>  File "./ArticleParser.py", line 224, in 
>    main()
>  File "./ArticleParser.py", line 172, in main
>    process_article_text(title.encode('utf-8'),  f.read(length), newf)
>  File "./ArticleParser.py", line 218, in process_article_text
>    newf.write(text + '\n')
> IOError: [Errno 32] Broken pipe
> make[1]: *** [parse] Error 1
> make[1]: se sale del directorio
> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
> make: *** [parse] Error 2

OK that's fixed now. Chris already checked in the code. Our build
worked fine. We need to do a few more tweaks and then we can post a
(super) early test image. Give us until early this coming week.

  -Sean

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-29 Thread Sean Moss-Pultz
On Fri, Oct 30, 2009 at 7:58 AM, Nelson Castillo
 wrote:
> On Thu, Oct 29, 2009 at 6:54 PM, David Reyes Samblas Martinez
>  wrote:
>> Great! :) good to see you are working on this!, please count on me for
>> any testing to be done, I will try to make a look on the code myself
>> to kill the bug but no time and nor expertise so no promises :P
>
> I haven't seen the code but if you don't feel like fixing it now you
> can add a try/catch on the block that is processing each page so that
> you have a wiki to play with while the error is fixed.

Yeah we're trying exactly that Nelson. It's just a long process to
render all this stuff. We actually have 9 quad-core systems running in
parallel now. Each with at least six GB of ram :-)

  -Sean

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-29 Thread Sean Moss-Pultz
On Fri, Oct 30, 2009 at 7:54 AM, David Reyes Samblas Martinez
 wrote:
> Great! :) good to see you are working on this!, please count on me for
> any testing to be done, I will try to make a look on the code myself
> to kill the bug but no time and nor expertise so no promises :P

We'll get it working. Just give us a bit of time. And it would be
super helpful if you could help test / QA. Thanks a lot for the offer!

  -Sean

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-29 Thread Nelson Castillo
On Thu, Oct 29, 2009 at 6:54 PM, David Reyes Samblas Martinez
 wrote:
> Great! :) good to see you are working on this!, please count on me for
> any testing to be done, I will try to make a look on the code myself
> to kill the bug but no time and nor expertise so no promises :P

I haven't seen the code but if you don't feel like fixing it now you
can add a try/catch on the block that is processing each page so that
you have a wiki to play with while the error is fixed.

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-29 Thread David Reyes Samblas Martinez
Great! :) good to see you are working on this!, please count on me for
any testing to be done, I will try to make a look on the code myself
to kill the bug but no time and nor expertise so no promises :P
David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable & embedded solutions
Openmoko, Openpandora,  Arduino
Hey, watch out!!! There's a linux in your pocket!!!




2009/10/30 Sean Moss-Pultz :
> David
>
> We're working on exactly the same thing now :-)
>
> I'll ask Chris to email the list once we get past it. I think the
> problem is with the mixtures of different encodings (latin-1 and
> UTF-8) in the Spanish Wikipedia and the way our code is handling this.
> For some reason Python's print  (at times) wants to default to ascii,
> even after we explicitly tell it to use UTF-8.
>
>  -Sean
>
>
> On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez
>  wrote:
>>
>> Hi I'm trying to generate the file for a spainsh wikipedia on the WR ,
>> after compiling succsesfuly the source on the git and solve some
>> annoyings with utf8 encoding on phyton error was somthing like this:
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
>> position: ordinal not in range(128)
>> this was solved changing the default encode "ascii" to "utf8" int the
>> /usr/lib/python2.6/site.py file
>> after this I was hable to execute ok the instruction:
>> make DESTDIR=image WORKDIR=work
>> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index
>> parse render combine
>>
>> Every thing seem fine for a couple(about 6-7h) of hours parsing the
>> 70 articles in spanish but  then ... the horror
>> Count: 38
>> Traceback (most recent call last):
>>  File "./ArticleParser.py", line 224, in 
>>    main()
>>  File "./ArticleParser.py", line 172, in main
>>    process_article_text(title.encode('utf-8'),  f.read(length), newf)
>>  File "./ArticleParser.py", line 218, in process_article_text
>>    newf.write(text + '\n')
>> IOError: [Errno 32] Broken pipe
>> make[1]: *** [parse] Error 1
>> make[1]: se sale del directorio
>> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
>> make: *** [parse] Error 2
>>
>> I have relaunched the process again with the (few)hope that was a
>> temporary fault but If any one has a clue will be helpfull.
>>
>> BTW.- I documenting all this proccess to make a step by step howto on
>> how to put the wikipedia in other languages on the wikireader.
>>
>>
>>
>> David Reyes Samblas Martinez
>> http://www.tuxbrain.com
>> Open ultraportable & embedded solutions
>> Openmoko, Openpandora,  Arduino
>> Hey, watch out!!! There's a linux in your pocket!!!
>>
>> ___
>> Openmoko community mailing list
>> community@lists.openmoko.org
>> http://lists.openmoko.org/mailman/listinfo/community
>
> ___
> Openmoko community mailing list
> community@lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community
>

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [wikireader]Error on parsing the spanish wikipedia

2009-10-29 Thread Sean Moss-Pultz
David

We're working on exactly the same thing now :-)

I'll ask Chris to email the list once we get past it. I think the
problem is with the mixtures of different encodings (latin-1 and
UTF-8) in the Spanish Wikipedia and the way our code is handling this.
For some reason Python's print  (at times) wants to default to ascii,
even after we explicitly tell it to use UTF-8.

  -Sean


On Fri, Oct 30, 2009 at 4:50 AM, David Reyes Samblas Martinez
 wrote:
>
> Hi I'm trying to generate the file for a spainsh wikipedia on the WR ,
> after compiling succsesfuly the source on the git and solve some
> annoyings with utf8 encoding on phyton error was somthing like this:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
> position: ordinal not in range(128)
> this was solved changing the default encode "ascii" to "utf8" int the
> /usr/lib/python2.6/site.py file
> after this I was hable to execute ok the instruction:
> make DESTDIR=image WORKDIR=work
> XML_FILES=xml-file-samples/eswiki-latest-pages-articles.xml index
> parse render combine
>
> Every thing seem fine for a couple(about 6-7h) of hours parsing the
> 70 articles in spanish but  then ... the horror
> Count: 38
> Traceback (most recent call last):
>  File "./ArticleParser.py", line 224, in 
>    main()
>  File "./ArticleParser.py", line 172, in main
>    process_article_text(title.encode('utf-8'),  f.read(length), newf)
>  File "./ArticleParser.py", line 218, in process_article_text
>    newf.write(text + '\n')
> IOError: [Errno 32] Broken pipe
> make[1]: *** [parse] Error 1
> make[1]: se sale del directorio
> `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
> make: *** [parse] Error 2
>
> I have relaunched the process again with the (few)hope that was a
> temporary fault but If any one has a clue will be helpfull.
>
> BTW.- I documenting all this proccess to make a step by step howto on
> how to put the wikipedia in other languages on the wikireader.
>
>
>
> David Reyes Samblas Martinez
> http://www.tuxbrain.com
> Open ultraportable & embedded solutions
> Openmoko, Openpandora,  Arduino
> Hey, watch out!!! There's a linux in your pocket!!!
>
> ___
> Openmoko community mailing list
> community@lists.openmoko.org
> http://lists.openmoko.org/mailman/listinfo/community

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community