According to Frank Guangxin Liu:
> This afternoon, I noticed htdig didn't do anything except
> running on a pdf file. The file is about
> 700k, ~80 pages of text. I tried run pdftotext on this
> file and it took about a minute to produce a 6M text file.
> Both xpdf and acroread can open this file almost immediately.
> I am wondering why it took the whole afternoon
> to parse this one file. "top" shows it uses 90% of CPU.
> Is there anything we can do to speed up ""?
> If any of you want to re-produce this, I can send you
> the pdf file.
> After this file, I keep checking how htdig runs, it seems
> to me it almost always takes more than an hour to 
> a pdf file. This really is unacceptable.
> By the way, I switch to use from acroread
> this weekend after reading the FAQ. is an interpreted Perl script, so it's not going to
be super efficient.  However, more than one hour to parse an 80 page
document seems quite unusually long.  I don't have PDFs that large, but
on my system a 2 page PDF gets parsed in under a second.  I have a 200
MHz AMD-K6 with 64 MB RAM, running Linux kernel 2.0.36 and Perl 5.004.
How does that compare to what you have?  Have you noticed any difference
if you run directly on one of these PDFs, instead of running
it from htdig?  If you let me know where I could fetch a copy of this PDF,
I'll try it out on my system.

