Can you reproduce this with a neutral locale? export LC_ALL=C I'm at the moment trying the same. I had a lot of hickups, caused by many things. Among them missing tools and not enough memory.
This is currently where I'm stuck with the German wikipedia. Count: 823000 Count: 824000 Count: 825000 Count: 826000 Count: 827000 Count: 828000 Count: 829000 Count: 830000 Count: 831000 Count: 832000 Count: 833000 Traceback (most recent call last): File "./ArticleParser.py", line 203, in <module> main() File "./ArticleParser.py", line 168, in main process_article_text(title.encode('utf-8'), f.read(length), newf) File "./ArticleParser.py", line 197, in process_article_text newf.write(text + '\n') IOError: [Errno 32] Broken pipe make[1]: *** [parse] Error 1 make[1]: Leaving directory `/home/tilli/wikireader/host-tools/offline-renderer' make: *** [parse] Error 2 I suppose it failed somewhere in PARSER_COMMAND Before that, the following steps went through without fail. make make DESTDIR=image WORKDIR=work XML_FILES=dewiki-20091028-pages-articles.xml index David Reyes Samblas Martinez wrote: > After the "success" of the spanish wikipedia pending to resolve the > indexing part, I was starting to work on the german wikipedia > http://download.wikipedia.org/dewiki/latest/dewiki-latest-pages-meta-current.xml.bz2 > > but it fails at first step with the following error > > #make DESTDIR=image WORKDIR=work > XML_FILES=dewiki-latest-pages-meta-current.xml index parse render > combine > > awk: línea ord.:1: fatal: no se puede abrir el fichero > `work/counts.text' para lectura (No existe el fichero ó directorio) > cd host-tools/offline-renderer && make index \ > > XML_FILES="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml" > RENDER_BLOCK="0" \ > > WORKDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work" > DESTDIR="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image" > make[1]: se ingresa al directorio > `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' > ./ArticleIndex.py \ > > --article-index="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/articles.db" > \ > > --article-offsets="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/offsets.db" > \ > > --article-counts="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/work/counts.text" > \ > > --prefix="/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/image/pedia" > /OE/Proyectos/tuxbrain/productos/wikireader/wikireader/dewiki-latest-pages-meta-current.xml > Traceback (most recent call last): > File "./ArticleIndex.py", line 611, in <module> > main() > File "./ArticleIndex.py", line 172, in main > limit = processor.process(f, limit) > File > "/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer/FileScanner.py", > line 141, in process > if '#' == body[0] and 'redirect' == body[1:9].lower(): > IndexError: string index out of range > Flushing databases > Writing: files > Time: 0s > Writing: articles > Time: 0s > Writing: offsets > Time: 0s > Loading: articles > Time: 0s > Loading: offsets and files > Time: 0s > make[1]: *** [index] Error 1 > make[1]: se sale del directorio > `/OE/Proyectos/tuxbrain/productos/wikireader/wikireader/host-tools/offline-renderer' > make: *** [index] Error 2 > > Regards > > David Reyes Samblas Martinez > http://www.tuxbrain.com > Open ultraportable & embedded solutions > Openmoko, Openpandora, Arduino > Hey, watch out!!! There's a linux in your pocket!!! > > _______________________________________________ > Openmoko community mailing list > community@lists.openmoko.org > http://lists.openmoko.org/mailman/listinfo/community > -- _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community