James Hardwick created TIKA-1462: ------------------------------------ Summary: PDFont consumes all heap space Key: TIKA-1462 URL: https://issues.apache.org/jira/browse/TIKA-1462 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.6 Reporter: James Hardwick Priority: Critical
See https://issues.apache.org/jira/browse/PDFBOX-2200 for more details. In short, PDFont will not release resources, and will eventually amass enough objects to consume all available memory. We are encountering this in productions environments, causing our solr server to crash when ingesting large amounts of PDF documents. The fix is supposedly in for the 2.0.0 release of PDFBox, but that version has been outstanding for so long that I'd suggest implementing the workaround as proposed in the PDFBox issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)