Hey all,

Does anyone else have the problem of the pdf parser taking up so many resources that it slows down the whole parsing process? I ran the fetch with the -noParsing option (thanks John!). I then ran the parser on the documents with the pdf parser enabled. The speed for parsing was quite slow. It was only parsing about 5 pages/second. When I disabled the pdf parser and ran the parser again on those documents, I was parsing over 30 pages/second. All this on the same machine which is a P4 2.66 with 512MB of RAM. The iowait is 0%, so I don't think it is thrashing or using swap that much. Is the pdf parser just really CPU intensive? What does everyone else do? 5 pages/second is not really acceptable, but it'd be great to be able to parse pdfs.

Thanks,

Luke


------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to