Felix Zimmermann wrote:
Hi,

I use the ArcSegmentCreator to convert Heritrix Arcs to Nutch Segments.

1. What means "Ignoring position: xxxxx" while converting? I have a lot
of these errors/infos.

The arc format is gzip appended. This program looks for the gzip magic number as a start point. Sometimes it finds a gzip magic number (bytes) that isn't actually a gzip header and it ignores it.


2. The ARC-file has about 25 MB (compressed) and the process of
converting into segments is running since ~30 minutes. Is this OK or are
there (config-) possibilities in order to get more performance, apart
from buying faster hardware;-) ?

Yup this program is slow. I never had time to come up with an optimized version of this, just a working one. :)

Dennis


Thanks!
Felix.

Reply via email to