Hi,

First, the symptoms: I was doing some tests on sites with many PDFs, and the Fetcher was gradually slowing down, until it became stuck. This was repeatable. A thread dump showed all threads waiting somewhere in PDFBox code (which is used by parse-pdf). In an email exchange with the author (Ben Litchfield) he confirmed that there was a problem in the latest official release of PDFBox, which could result in such behaviour.

If you experienced such problems, the fix is to use the latest CVS version of PDFBox, where this problem is believed to be fixed.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to