Felix Zimmermann wrote:
Hi, I use the ArcSegmentCreator to convert Heritrix Arcs to Nutch Segments. 1. What means "Ignoring position: xxxxx" while converting? I have a lot of these errors/infos.
The arc format is gzip appended. This program looks for the gzip magic number as a start point. Sometimes it finds a gzip magic number (bytes) that isn't actually a gzip header and it ignores it.
2. The ARC-file has about 25 MB (compressed) and the process of converting into segments is running since ~30 minutes. Is this OK or are there (config-) possibilities in order to get more performance, apart from buying faster hardware;-) ?
Yup this program is slow. I never had time to come up with an optimized version of this, just a working one. :)
Dennis
Thanks! Felix.
