[
https://issues.apache.org/jira/browse/PDFBOX-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758737#action_12758737
]
Navendu Garg commented on PDFBOX-533:
-------------------------------------
Mel,
I tried to use PDFTextStripper2. However, it is giving me the following
info/error messages:
INFO: unsupported/disabled operation: BDC
Sep 23, 2009 10:35:54 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: g
Sep 23, 2009 10:35:54 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Exception in thread "main" java.lang.ExceptionInInitializerError
at
org.apache.pdfbox.encoding.EncodingManager.<clinit>(EncodingManager.java:38)
at org.apache.pdfbox.pdmodel.font.PDFont.getEncoding(PDFont.java:518)
at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:438)
at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:343)
at
org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:66)
at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:516)
at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:229)
at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:188)
at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
at
org.apache.pdfbox.util.TestPDFTextStripperPerf.main(TestPDFTextStripperPerf.java:27)
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Reader.java:61)
at java.io.InputStreamReader.<init>(InputStreamReader.java:55)
at org.apache.pdfbox.encoding.Encoding.loadGlyphList(Encoding.java:98)
at org.apache.pdfbox.encoding.Encoding.<clinit>(Encoding.java:58)
... 12 more
> PDFTextStripper.writeCharacters is called no where in the class
> ---------------------------------------------------------------
>
> Key: PDFBOX-533
> URL: https://issues.apache.org/jira/browse/PDFBOX-533
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Navendu Garg
> Attachments: TestPDFTextStripperPerf.java
>
>
> It seems writeCharacters method is not called anywhere in the PDFTextStripper
> class. This makes it impossible for handling character TextPosition as well
> as Line Separator because processLineSeparator method is no longer there and
> writeLineSeparator is called when actual writing happens.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.