[
https://issues.apache.org/jira/browse/TIKA-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reassigned TIKA-778:
---------------------------------------
Assignee: Michael McCandless
> NullPointerException in tika-app, parsing PDF content
> -----------------------------------------------------
>
> Key: TIKA-778
> URL: https://issues.apache.org/jira/browse/TIKA-778
> Project: Tika
> Issue Type: Bug
> Components: gui, parser
> Affects Versions: 1.0
> Reporter: Bastian Mathes
> Assignee: Michael McCandless
>
> I try to extract text from some pdf files with the tika app. In version 0.10
> the error
> ERROR - Error: Could not parse predefined CMAP file for '--UCS2'
> is printed on the command line, but text extraction works and is correct.
> In version 1.0 I get the same error message on the command line, but also
> receive an exception and no text is extracted:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.pdf.PDFParser@62bc36ff
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
> at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:279)
> at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:238)
> at
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995)
> at
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)
> at
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)
> at
> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
> at javax.swing.AbstractButton.doClick(AbstractButton.java:357)
> at
> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:809)
> at
> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:850)
> at java.awt.Component.processMouseEvent(Component.java:6288)
> at javax.swing.JComponent.processMouseEvent(JComponent.java:3267)
> at java.awt.Component.processEvent(Component.java:6053)
> at java.awt.Container.processEvent(Container.java:2041)
> at java.awt.Component.dispatchEventImpl(Component.java:4651)
> at java.awt.Container.dispatchEventImpl(Container.java:2099)
> at java.awt.Component.dispatchEvent(Component.java:4481)
> at
> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4577)
> at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4238)
> at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4168)
> at java.awt.Container.dispatchEventImpl(Container.java:2085)
> at java.awt.Window.dispatchEventImpl(Window.java:2478)
> at java.awt.Component.dispatchEvent(Component.java:4481)
> at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:643)
> at java.awt.EventQueue.access$000(EventQueue.java:84)
> at java.awt.EventQueue$1.run(EventQueue.java:602)
> at java.awt.EventQueue$1.run(EventQueue.java:600)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
> at
> java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:98)
> at java.awt.EventQueue$2.run(EventQueue.java:616)
> at java.awt.EventQueue$2.run(EventQueue.java:614)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
> at java.awt.EventQueue.dispatchEvent(EventQueue.java:613)
> at
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:269)
> at
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
> at
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:174)
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169)
> at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161)
> at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
> Caused by: java.lang.NullPointerException
> at
> com.sun.org.apache.xml.internal.serializer.ToHTMLStream.endElement(ToHTMLStream.java:907)
> at
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endElement(TransformerHandlerImpl.java:273)
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
> at org.apache.tika.gui.TikaGUI$2.endElement(TikaGUI.java:519)
> at
> org.apache.tika.sax.TeeContentHandler.endElement(TeeContentHandler.java:94)
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
> at
> org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
> at
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
> at
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)
> at
> org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:216)
> at org.apache.tika.parser.pdf.PDF2XHTML.endDocument(PDF2XHTML.java:112)
> at
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:323)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:96)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> ... 43 more
> I tried the same pdf files (and can switch forth and back between version
> 0.10 and 1.0, this behavior is stable) and it looks like the exact same
> pdfbox version is inside the tika-app-0.10.jar and tika-app-1.0.jar. It would
> be great if version 1.0 could do what 0.10 can. Sorry that I cannot provide
> the pdf.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira