Hi, I read a lot of mails about the time consuming pdf-parsing and tried myself some solutions. My example PDF file has 181 pages in 1,5 MB (mostly text nearly no grafics). -with pdfbox.org's toolkit it took 17m32s to parse&read it's content -after installing ghostscript and ps2text / ps2ascii my parsing failed after page 54 and 2m51s because of irregular fonts -installing XPDF and using it's tool pdftotext parsing completed after 7-10seconds
My machine is a Celeren 1700 with VMWare Workstation 3.2 (128 MB assigned) and Linux Suse 7.3. I will parse my pdf files with xpdf and something like Runtime.getRuntime().exec("pdftotext -nopgbrk -raw "+pdfFileName+" "+txtFileName); Paul P.S. look at http://www.jguru.com/faq/view.jsp?EID=1074237 for links and tipps --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]