Hi,
I read a lot of mails about the time consuming pdf-parsing and tried
myself some solutions. My example PDF file has 181 pages in 1,5 MB
(mostly text nearly no grafics).
-with pdfbox.org's toolkit it took 17m32s to parse&read it's content
-after installing ghostscript and ps2text / ps2ascii my parsing failed
after page 54 and 2m51s because of irregular fonts
-installing XPDF and using it's tool pdftotext parsing completed after
7-10seconds

My machine is a Celeren 1700 with VMWare Workstation 3.2 (128 MB
assigned) and Linux Suse 7.3.

I will parse my pdf files with xpdf and something like
Runtime.getRuntime().exec("pdftotext -nopgbrk -raw "+pdfFileName+"
"+txtFileName);


Paul

P.S. look at http://www.jguru.com/faq/view.jsp?EID=1074237 for links and tipps

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to