Re: [Wikireader] Error on processing the German Wikipedia

2009-11-20 Thread Tilman Baumann
Can you reproduce this with a neutral locale?
 export LC_ALL=C

I'm at the moment trying the same. I had a lot of hickups, caused by many
things. Among them missing tools and not enough memory.

This is currently where I'm stuck with the German wikipedia.

Count: 823000
Count: 824000
Count: 825000
Count: 826000
Count: 827000
Count: 828000
Count: 829000
Count: 83
Count: 831000
Count: 832000
Count: 833000
Traceback (most recent call last):
  File ./ArticleParser.py, line 203, in module
main()
  File ./ArticleParser.py, line 168, in main
process_article_text(title.encode('utf-8'),  f.read(length), newf)
  File ./ArticleParser.py, line 197, in process_article_text
newf.write(text + '\n')
IOError: [Errno 32] Broken pipe
make[1]: *** [parse] Error 1
make[1]: Leaving directory
`/home/tilli/wikireader/host-tools/offline-renderer'
make: *** [parse] Error 2

I suppose it failed somewhere in PARSER_COMMAND


Before that, the following steps went through without fail.
make
make DESTDIR=image WORKDIR=work
XML_FILES=dewiki-20091028-pages-articles.xml index


David Reyes Samblas Martinez wrote:
 After the success of the spanish wikipedia pending to resolve the
 indexing part, I was starting to work on the german wikipedia
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2

 but it fails at first step with the following error

 #make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
 combine

 awk: línea ord.:1: fatal: no se puede abrir el fichero
 `work/counts.text' para lectura (No existe el fichero ó directorio)
 cd host-tools/offline-renderer  make index \
   
 XML_FILES=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 RENDER_BLOCK=0 \
   
 WORKDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work
 DESTDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image
 make[1]: se ingresa al directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 ./ArticleIndex.py  \
   
 --article-index=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db
 \
   
 --article-offsets=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db
 \
   
 --article-counts=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text
 \
   
 --prefix=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 Traceback (most recent call last):
   File ./ArticleIndex.py, line 611, in module
 main()
   File ./ArticleIndex.py, line 172, in main
 limit = processor.process(f, limit)
   File
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py,
 line 141, in process
 if '#' == body[0] and 'redirect' == body[1:9].lower():
 IndexError: string index out of range
 Flushing databases
 Writing: files
 Time: 0s
 Writing: articles
 Time: 0s
 Writing: offsets
 Time: 0s
 Loading: articles
 Time: 0s
 Loading: offsets and files
 Time: 0s
 make[1]: *** [index] Error 1
 make[1]: se sale del directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 make: *** [index] Error 2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



-- 



___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [Wikireader] Error on processing the German Wikipedia

2009-11-20 Thread David Reyes Samblas Martinez
Well spanish one give me the same error before but now it works, I'm
parsing the de wikipedia right now (Count: 173000) lets see whats
happens :)

Note:Parsing the 2009-Nov-11
http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2

Regards

David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable  embedded solutions
Openmoko, Openpandora,  Arduino
Hey, watch out!!! There's a linux in your pocket!!!




2009/11/20 Tilman Baumann til...@baumann.name:
 Can you reproduce this with a neutral locale?
  export LC_ALL=C

 I'm at the moment trying the same. I had a lot of hickups, caused by many
 things. Among them missing tools and not enough memory.

 This is currently where I'm stuck with the German wikipedia.

 Count: 823000
 Count: 824000
 Count: 825000
 Count: 826000
 Count: 827000
 Count: 828000
 Count: 829000
 Count: 83
 Count: 831000
 Count: 832000
 Count: 833000
 Traceback (most recent call last):
  File ./ArticleParser.py, line 203, in module
    main()
  File ./ArticleParser.py, line 168, in main
    process_article_text(title.encode('utf-8'),  f.read(length), newf)
  File ./ArticleParser.py, line 197, in process_article_text
    newf.write(text + '\n')
 IOError: [Errno 32] Broken pipe
 make[1]: *** [parse] Error 1
 make[1]: Leaving directory
 `/home/tilli/wikireader/host-tools/offline-renderer'
 make: *** [parse] Error 2

 I suppose it failed somewhere in PARSER_COMMAND


 Before that, the following steps went through without fail.
 make
 make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-20091028-pages-articles.xml index


 David Reyes Samblas Martinez wrote:
 After the success of the spanish wikipedia pending to resolve the
 indexing part, I was starting to work on the german wikipedia
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2

 but it fails at first step with the following error

 #make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
 combine

 awk: línea ord.:1: fatal: no se puede abrir el fichero
 `work/counts.text' para lectura (No existe el fichero ó directorio)
 cd host-tools/offline-renderer  make index \
               
 XML_FILES=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 RENDER_BLOCK=0 \
               
 WORKDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work
 DESTDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image
 make[1]: se ingresa al directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 ./ArticleIndex.py  \
               
 --article-index=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db
 \
               
 --article-offsets=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db
 \
               
 --article-counts=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text
 \
               
 --prefix=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 Traceback (most recent call last):
   File ./ArticleIndex.py, line 611, in module
     main()
   File ./ArticleIndex.py, line 172, in main
     limit = processor.process(f, limit)
   File
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py,
 line 141, in process
     if '#' == body[0] and 'redirect' == body[1:9].lower():
 IndexError: string index out of range
 Flushing databases
 Writing: files
 Time: 0s
 Writing: articles
 Time: 0s
 Writing: offsets
 Time: 0s
 Loading: articles
 Time: 0s
 Loading: offsets and files
 Time: 0s
 make[1]: *** [index] Error 1
 make[1]: se sale del directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 make: *** [index] Error 2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



 --



 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [Wikireader] Error on processing the German Wikipedia

2009-11-20 Thread Tilman Baumann

David Reyes Samblas Martinez wrote:
 Well spanish one give me the same error before but now it works,
Any idea what solved it? Or is it just random and will go away if I try it
again? :)

 I'm parsing the de wikipedia right now (Count: 173000) lets see whats
 happens :)

I would definitely be interessted in the results...

 Note:Parsing the 2009-Nov-11
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!




 2009/11/20 Tilman Baumann til...@baumann.name:
 Can you reproduce this with a neutral locale?
  export LC_ALL=C

 I'm at the moment trying the same. I had a lot of hickups, caused by
 many
 things. Among them missing tools and not enough memory.

 This is currently where I'm stuck with the German wikipedia.

 Count: 823000
 Count: 824000
 Count: 825000
 Count: 826000
 Count: 827000
 Count: 828000
 Count: 829000
 Count: 83
 Count: 831000
 Count: 832000
 Count: 833000
 Traceback (most recent call last):
  File ./ArticleParser.py, line 203, in module
    main()
  File ./ArticleParser.py, line 168, in main
    process_article_text(title.encode('utf-8'),  f.read(length), newf)
  File ./ArticleParser.py, line 197, in process_article_text
    newf.write(text + '\n')
 IOError: [Errno 32] Broken pipe
 make[1]: *** [parse] Error 1
 make[1]: Leaving directory
 `/home/tilli/wikireader/host-tools/offline-renderer'
 make: *** [parse] Error 2

 I suppose it failed somewhere in PARSER_COMMAND


 Before that, the following steps went through without fail.
 make
 make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-20091028-pages-articles.xml index


 David Reyes Samblas Martinez wrote:
 After the success of the spanish wikipedia pending to resolve the
 indexing part, I was starting to work on the german wikipedia
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2

 but it fails at first step with the following error

 #make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
 combine

 awk: línea ord.:1: fatal: no se puede abrir el fichero
 `work/counts.text' para lectura (No existe el fichero ó directorio)
 cd host-tools/offline-renderer  make index \
              
 XML_FILES=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 RENDER_BLOCK=0 \
              
 WORKDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work
 DESTDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image
 make[1]: se ingresa al directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 ./ArticleIndex.py  \
              
 --article-index=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db
 \
              
 --article-offsets=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db
 \
              
 --article-counts=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text
 \
              
 --prefix=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 Traceback (most recent call last):
   File ./ArticleIndex.py, line 611, in module
     main()
   File ./ArticleIndex.py, line 172, in main
     limit = processor.process(f, limit)
   File
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py,
 line 141, in process
     if '#' == body[0] and 'redirect' == body[1:9].lower():
 IndexError: string index out of range
 Flushing databases
 Writing: files
 Time: 0s
 Writing: articles
 Time: 0s
 Writing: offsets
 Time: 0s
 Loading: articles
 Time: 0s
 Loading: offsets and files
 Time: 0s
 make[1]: *** [index] Error 1
 make[1]: se sale del directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 make: *** [index] Error 2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



 --



 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community


 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



-- 



___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: [Wikireader] Error on processing the German Wikipedia

2009-11-20 Thread David Reyes Samblas Martinez
Don't hold your breath :( failing at Count: 832000

David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable  embedded solutions
Openmoko, Openpandora,  Arduino
Hey, watch out!!! There's a linux in your pocket!!!




2009/11/20 Tilman Baumann til...@baumann.name:

 David Reyes Samblas Martinez wrote:
 Well spanish one give me the same error before but now it works,
 Any idea what solved it? Or is it just random and will go away if I try it
 again? :)

 I'm parsing the de wikipedia right now (Count: 173000) lets see whats
 happens :)

 I would definitely be interessted in the results...

 Note:Parsing the 2009-Nov-11
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!




 2009/11/20 Tilman Baumann til...@baumann.name:
 Can you reproduce this with a neutral locale?
  export LC_ALL=C

 I'm at the moment trying the same. I had a lot of hickups, caused by
 many
 things. Among them missing tools and not enough memory.

 This is currently where I'm stuck with the German wikipedia.

 Count: 823000
 Count: 824000
 Count: 825000
 Count: 826000
 Count: 827000
 Count: 828000
 Count: 829000
 Count: 83
 Count: 831000
 Count: 832000
 Count: 833000
 Traceback (most recent call last):
  File ./ArticleParser.py, line 203, in module
    main()
  File ./ArticleParser.py, line 168, in main
    process_article_text(title.encode('utf-8'),  f.read(length), newf)
  File ./ArticleParser.py, line 197, in process_article_text
    newf.write(text + '\n')
 IOError: [Errno 32] Broken pipe
 make[1]: *** [parse] Error 1
 make[1]: Leaving directory
 `/home/tilli/wikireader/host-tools/offline-renderer'
 make: *** [parse] Error 2

 I suppose it failed somewhere in PARSER_COMMAND


 Before that, the following steps went through without fail.
 make
 make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-20091028-pages-articles.xml index


 David Reyes Samblas Martinez wrote:
 After the success of the spanish wikipedia pending to resolve the
 indexing part, I was starting to work on the german wikipedia
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2

 but it fails at first step with the following error

 #make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
 combine

 awk: línea ord.:1: fatal: no se puede abrir el fichero
 `work/counts.text' para lectura (No existe el fichero ó directorio)
 cd host-tools/offline-renderer  make index \

 XML_FILES=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 RENDER_BLOCK=0 \

 WORKDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work
 DESTDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image
 make[1]: se ingresa al directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 ./ArticleIndex.py  \

 --article-index=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db
 \

 --article-offsets=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db
 \

 --article-counts=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text
 \

 --prefix=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 Traceback (most recent call last):
   File ./ArticleIndex.py, line 611, in module
     main()
   File ./ArticleIndex.py, line 172, in main
     limit = processor.process(f, limit)
   File
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py,
 line 141, in process
     if '#' == body[0] and 'redirect' == body[1:9].lower():
 IndexError: string index out of range
 Flushing databases
 Writing: files
 Time: 0s
 Writing: articles
 Time: 0s
 Writing: offsets
 Time: 0s
 Loading: articles
 Time: 0s
 Loading: offsets and files
 Time: 0s
 make[1]: *** [index] Error 1
 make[1]: se sale del directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 make: *** [index] Error 2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



 --



 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community


 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



 --



 

Re: [Wikireader] Error on processing the German Wikipedia

2009-11-20 Thread Tilman Baumann

David Reyes Samblas Martinez wrote:
 Don't hold your breath :( failing at Count: 832000

Same error as I?

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!




 2009/11/20 Tilman Baumann til...@baumann.name:

 David Reyes Samblas Martinez wrote:
 Well spanish one give me the same error before but now it works,
 Any idea what solved it? Or is it just random and will go away if I try
 it
 again? :)

 I'm parsing the de wikipedia right now (Count: 173000) lets see whats
 happens :)

 I would definitely be interessted in the results...

 Note:Parsing the 2009-Nov-11
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!




 2009/11/20 Tilman Baumann til...@baumann.name:
 Can you reproduce this with a neutral locale?
  export LC_ALL=C

 I'm at the moment trying the same. I had a lot of hickups, caused by
 many
 things. Among them missing tools and not enough memory.

 This is currently where I'm stuck with the German wikipedia.

 Count: 823000
 Count: 824000
 Count: 825000
 Count: 826000
 Count: 827000
 Count: 828000
 Count: 829000
 Count: 83
 Count: 831000
 Count: 832000
 Count: 833000
 Traceback (most recent call last):
  File ./ArticleParser.py, line 203, in module
    main()
  File ./ArticleParser.py, line 168, in main
    process_article_text(title.encode('utf-8'),  f.read(length), newf)
  File ./ArticleParser.py, line 197, in process_article_text
    newf.write(text + '\n')
 IOError: [Errno 32] Broken pipe
 make[1]: *** [parse] Error 1
 make[1]: Leaving directory
 `/home/tilli/wikireader/host-tools/offline-renderer'
 make: *** [parse] Error 2

 I suppose it failed somewhere in PARSER_COMMAND


 Before that, the following steps went through without fail.
 make
 make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-20091028-pages-articles.xml index


 David Reyes Samblas Martinez wrote:
 After the success of the spanish wikipedia pending to resolve the
 indexing part, I was starting to work on the german wikipedia
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2

 but it fails at first step with the following error

 #make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
 combine

 awk: línea ord.:1: fatal: no se puede abrir el fichero
 `work/counts.text' para lectura (No existe el fichero ó directorio)
 cd host-tools/offline-renderer  make index \

 XML_FILES=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 RENDER_BLOCK=0 \

 WORKDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work
 DESTDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image
 make[1]: se ingresa al directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 ./ArticleIndex.py  \

 --article-index=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db
 \

 --article-offsets=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db
 \

 --article-counts=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text
 \

 --prefix=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 Traceback (most recent call last):
   File ./ArticleIndex.py, line 611, in module
     main()
   File ./ArticleIndex.py, line 172, in main
     limit = processor.process(f, limit)
   File
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py,
 line 141, in process
     if '#' == body[0] and 'redirect' == body[1:9].lower():
 IndexError: string index out of range
 Flushing databases
 Writing: files
 Time: 0s
 Writing: articles
 Time: 0s
 Writing: offsets
 Time: 0s
 Loading: articles
 Time: 0s
 Loading: offsets and files
 Time: 0s
 make[1]: *** [index] Error 1
 make[1]: se sale del directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 make: *** [index] Error 2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



 --



 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community


 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 

Re: [Wikireader] Error on processing the German Wikipedia

2009-11-20 Thread David Reyes Samblas Martinez
yes :(
David Reyes Samblas Martinez
http://www.tuxbrain.com
Open ultraportable  embedded solutions
Openmoko, Openpandora,  Arduino
Hey, watch out!!! There's a linux in your pocket!!!




2009/11/20 Tilman Baumann til...@baumann.name:

 David Reyes Samblas Martinez wrote:
 Don't hold your breath :( failing at Count: 832000

 Same error as I?

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!




 2009/11/20 Tilman Baumann til...@baumann.name:

 David Reyes Samblas Martinez wrote:
 Well spanish one give me the same error before but now it works,
 Any idea what solved it? Or is it just random and will go away if I try
 it
 again? :)

 I'm parsing the de wikipedia right now (Count: 173000) lets see whats
 happens :)

 I would definitely be interessted in the results...

 Note:Parsing the 2009-Nov-11
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-articles.xml.bz2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!




 2009/11/20 Tilman Baumann til...@baumann.name:
 Can you reproduce this with a neutral locale?
  export LC_ALL=C

 I'm at the moment trying the same. I had a lot of hickups, caused by
 many
 things. Among them missing tools and not enough memory.

 This is currently where I'm stuck with the German wikipedia.

 Count: 823000
 Count: 824000
 Count: 825000
 Count: 826000
 Count: 827000
 Count: 828000
 Count: 829000
 Count: 83
 Count: 831000
 Count: 832000
 Count: 833000
 Traceback (most recent call last):
  File ./ArticleParser.py, line 203, in module
    main()
  File ./ArticleParser.py, line 168, in main
    process_article_text(title.encode('utf-8'),  f.read(length), newf)
  File ./ArticleParser.py, line 197, in process_article_text
    newf.write(text + '\n')
 IOError: [Errno 32] Broken pipe
 make[1]: *** [parse] Error 1
 make[1]: Leaving directory
 `/home/tilli/wikireader/host-tools/offline-renderer'
 make: *** [parse] Error 2

 I suppose it failed somewhere in PARSER_COMMAND


 Before that, the following steps went through without fail.
 make
 make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-20091028-pages-articles.xml index


 David Reyes Samblas Martinez wrote:
 After the success of the spanish wikipedia pending to resolve the
 indexing part, I was starting to work on the german wikipedia
 http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2

 but it fails at first step with the following error

 #make DESTDIR=image WORKDIR=work
 XML_FILES=dewiki-latest-pages-meta-current.xml index parse render
 combine

 awk: línea ord.:1: fatal: no se puede abrir el fichero
 `work/counts.text' para lectura (No existe el fichero ó directorio)
 cd host-tools/offline-renderer  make index \

 XML_FILES=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 RENDER_BLOCK=0 \

 WORKDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work
 DESTDIR=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image
 make[1]: se ingresa al directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 ./ArticleIndex.py  \

 --article-index=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db
 \

 --article-offsets=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db
 \

 --article-counts=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text
 \

 --prefix=/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml
 Traceback (most recent call last):
   File ./ArticleIndex.py, line 611, in module
     main()
   File ./ArticleIndex.py, line 172, in main
     limit = processor.process(f, limit)
   File
 /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py,
 line 141, in process
     if '#' == body[0] and 'redirect' == body[1:9].lower():
 IndexError: string index out of range
 Flushing databases
 Writing: files
 Time: 0s
 Writing: articles
 Time: 0s
 Writing: offsets
 Time: 0s
 Loading: articles
 Time: 0s
 Loading: offsets and files
 Time: 0s
 make[1]: *** [index] Error 1
 make[1]: se sale del directorio
 `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer'
 make: *** [index] Error 2

 Regards

 David Reyes Samblas Martinez
 http://www.tuxbrain.com
 Open ultraportable  embedded solutions
 Openmoko, Openpandora,  Arduino
 Hey, watch out!!! There's a linux in your pocket!!!

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community



 --



 ___