[ https://issues.apache.org/jira/browse/PDFBOX-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-3966. ----------------------------------- Resolution: Duplicate Fixed in PDFBOX-3950 after finding that sometimes we get meaningful content (see the second attachment there). > Operator not found in resources > ------------------------------- > > Key: PDFBOX-3966 > URL: https://issues.apache.org/jira/browse/PDFBOX-3966 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.7 > Reporter: Jorge Spinsanti > > I got an exception to extract HTML from PDF. Source PDF is not available. > {code} > Main cause: > org.apache.tika.exception.TikaException: Unable to extract PDF content > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:139) > at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:167) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > .... > Caused by: java.io.IOException: name for 'gs' operator not found in > resources: /R8 > at > org.apache.pdfbox.contentstream.operator.state.SetGraphicsStateParameters.process(SetGraphicsStateParameters.java:54) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150) > at > org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139) > at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391) > at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147) > at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) > at > org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) > at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117) > ... 27 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org