[ https://issues.apache.org/jira/browse/TIKA-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163301#comment-13163301 ]
Arthur Meneau commented on TIKA-802: ------------------------------------ I have copy and pasted the java method I've been using to try to detect iWork content and extract metadata. I commented out the "AutoDetectParser", but have tested iWork detection with the AutoDetectParser instead of with the ForkParser and have seen the same NullPointerException produced in both cases. It looks like you were using the Tika 1.1 beta client, I'm running Tika 1.0 and using Tika as a library instead of a client. Has anything major changed between 1.0 and 1.1 beta? Thanks for your quick reply! -Arthur Meneau public static Metadata getMetadata(File f) { FileInputStream fis = null; ToXMLContentHandler contentHandler = new ToXMLContentHandler(); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); // AutoDetectParser parser = new AutoDetectParser(); ForkParser parser = new ForkParser(); parser.setJavaCommand("/usr/local/java6/bin/java -Xmx64m"); try { fis = new FileInputStream(f); parser.parse(fis, contentHandler, metadata, context); } catch (java.io.FileNotFoundException e) { if (f != null) logger.error("file " + f.toString() + " could not be found, exception: " + e, e); else logger.error("file could not be found, exception: " + e, e); return null; } catch (Throwable e) { logger.error("Exception while analyzing file\n" + "CAUTION: metadata may still have useful content in it!\n" + "Exception: " + e, e); } finally { if (fis != null) { try { fis.close(); } catch (java.io.IOException e){ logger.error("input stream could not be closed: " + e, e); } } } String contentType = null; if (metadata.get("Content-Type") == null) { Tika tikaDetect = new Tika(); try { contentType = tikaDetect.detect(f); } catch (Exception e) {logger.error("problem with detection: " + e, e); } } if (contentType != null) logger.error("contentType: " + contentType); if (contentHandler != null) logger.error("content handler: " + contentHandler.toString()); if (metadata != null) logger.error("metadata: " + metadata.toString()); return metadata; } > NullPointerException when parsing iWork files > ---------------------------------------------- > > Key: TIKA-802 > URL: https://issues.apache.org/jira/browse/TIKA-802 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.0 > Environment: Java 6, Mac OS X 10.6, Keynote 5.1.1, Numbers 2.1, Pages > 4.1 > Reporter: Arthur Meneau > Labels: NullPointerException, iWork, parse > Attachments: testKeynote.key, testNumbers.numbers, testPages.pages > > > Excerpt from mailing list: > I am having trouble parsing iWork documents with Tika 1.0. These documents > are being saved with the appropriate versions specified by Tika's API > (Keynote 5.1.1, Numbers 2.1, Pages 4.1). I have copy and pasted the error I > am receiving below. How can I get iWork documents to correctly parse? > Mailing list thread: > http://mail-archives.apache.org/mod_mbox/tika-user/201112.mbox/%3C8E630733-FD82-4A16-89BB-74488A1F7C9F%40xetus.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira