Nicolas M created PDFBOX-4019:
---------------------------------
Summary: Expected 'Page' but found COSName{Font} in PDPageTree
Key: PDFBOX-4019
URL: https://issues.apache.org/jira/browse/PDFBOX-4019
Project: PDFBox
Issue Type: Improvement
Components: PDModel, Text extraction
Affects Versions: 2.0.8
Environment: Debian 9 / MacOs (not OS related)
Reporter: Nicolas M
Attachments: Sterlite Technologies.pdf
Hello,
I have a PDF document that produces the following stack trace :
{code:java}
INFO: OpenType Layout tables used in font FreeSans are not implemented in
PDFBox and will be ignored
Exception in thread "Thread-1" java.lang.IllegalStateException: Expected 'Page'
but found COSName{Font}
at
org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:227)
at org.apache.pdfbox.pdmodel.PDPageTree.access$300(PDPageTree.java:38)
at
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:189)
at
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:153)
at
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:314)
at
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
at
org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
{code}
I found a similar problem here
https://mail-archives.apache.org/mod_mbox/pdfbox-users/201610.mbox/%[email protected]%3E
So, I understand that the problem comes from the pdf itself but given that some
readers recover from it, is there any plan to add some recovery methods in
PdfBox too?
Thanks
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]