This afternoon, I noticed htdig didn't do anything except
running parse_doc.pl on a pdf file. The file is about
700k, ~80 pages of text. I tried run pdftotext on this
file and it took about a minute to produce a 6M text file.
Both xpdf and acroread can open this file almost immediately.
I am wondering why it took parse_doc.pl the whole afternoon
to parse this one file. "top" shows it uses 90% of CPU.
Is there anything we can do to speed up "parse_doc.pl"?
If any of you want to re-produce this, I can send you
the pdf file.
After this file, I keep checking how htdig runs, it seems
to me it almost always takes more than an hour to 
parse_doc.pl a pdf file. This really is unacceptable.

By the way, I switch to use parse_doc.pl from acroread
this weekend after reading the FAQ. 

Frank


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to