[ 
https://issues.apache.org/jira/browse/TIKA-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved TIKA-778.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
    
> NullPointerException in tika-app, parsing PDF content
> -----------------------------------------------------
>
>                 Key: TIKA-778
>                 URL: https://issues.apache.org/jira/browse/TIKA-778
>             Project: Tika
>          Issue Type: Bug
>          Components: gui, parser
>    Affects Versions: 1.0
>            Reporter: Bastian Mathes
>            Assignee: Michael McCandless
>             Fix For: 1.1
>
>
> I try to extract text from some pdf files with the tika app. In version 0.10 
> the error 
> ERROR - Error: Could not parse predefined CMAP file for '--UCS2'
> is printed on the command line, but text extraction works and is correct.
> In version 1.0 I get the same error message on the command line, but also 
> receive an exception and no text is extracted:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.pdf.PDFParser@62bc36ff
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
>       at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:279)
>       at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:238)
>       at 
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995)
>       at 
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)
>       at 
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)
>       at 
> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
>       at javax.swing.AbstractButton.doClick(AbstractButton.java:357)
>       at 
> javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:809)
>       at 
> javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:850)
>       at java.awt.Component.processMouseEvent(Component.java:6288)
>       at javax.swing.JComponent.processMouseEvent(JComponent.java:3267)
>       at java.awt.Component.processEvent(Component.java:6053)
>       at java.awt.Container.processEvent(Container.java:2041)
>       at java.awt.Component.dispatchEventImpl(Component.java:4651)
>       at java.awt.Container.dispatchEventImpl(Container.java:2099)
>       at java.awt.Component.dispatchEvent(Component.java:4481)
>       at 
> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4577)
>       at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4238)
>       at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4168)
>       at java.awt.Container.dispatchEventImpl(Container.java:2085)
>       at java.awt.Window.dispatchEventImpl(Window.java:2478)
>       at java.awt.Component.dispatchEvent(Component.java:4481)
>       at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:643)
>       at java.awt.EventQueue.access$000(EventQueue.java:84)
>       at java.awt.EventQueue$1.run(EventQueue.java:602)
>       at java.awt.EventQueue$1.run(EventQueue.java:600)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
>       at 
> java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:98)
>       at java.awt.EventQueue$2.run(EventQueue.java:616)
>       at java.awt.EventQueue$2.run(EventQueue.java:614)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87)
>       at java.awt.EventQueue.dispatchEvent(EventQueue.java:613)
>       at 
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:269)
>       at 
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
>       at 
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:174)
>       at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169)
>       at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161)
>       at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
> Caused by: java.lang.NullPointerException
>       at 
> com.sun.org.apache.xml.internal.serializer.ToHTMLStream.endElement(ToHTMLStream.java:907)
>       at 
> com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endElement(TransformerHandlerImpl.java:273)
>       at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>       at org.apache.tika.gui.TikaGUI$2.endElement(TikaGUI.java:519)
>       at 
> org.apache.tika.sax.TeeContentHandler.endElement(TeeContentHandler.java:94)
>       at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>       at 
> org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)
>       at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>       at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>       at 
> org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>       at 
> org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)
>       at 
> org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:216)
>       at org.apache.tika.parser.pdf.PDF2XHTML.endDocument(PDF2XHTML.java:112)
>       at 
> org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:323)
>       at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:61)
>       at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:96)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>       ... 43 more
> I tried the same pdf files (and can switch forth and back between version 
> 0.10 and 1.0, this behavior is stable) and it looks like the exact same 
> pdfbox version is inside the tika-app-0.10.jar and tika-app-1.0.jar. It would 
> be great if version 1.0 could do what 0.10 can. Sorry that I cannot provide 
> the pdf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to