[ 
https://issues.apache.org/jira/browse/TIKA-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163301#comment-13163301
 ] 

Arthur Meneau commented on TIKA-802:
------------------------------------

I have copy and pasted the java method I've been using to try to detect iWork 
content and extract metadata.  I commented out the "AutoDetectParser", but have 
tested iWork detection with the AutoDetectParser instead of with the ForkParser 
and have seen the same NullPointerException produced in both cases.

It looks like you were using the Tika 1.1 beta client, I'm running Tika 1.0 and 
using Tika as a library instead of a client. Has anything major changed between 
1.0 and 1.1 beta?

Thanks for your quick reply!
-Arthur Meneau 

        public static Metadata getMetadata(File f) {
                
                FileInputStream         fis                     = null;

                ToXMLContentHandler     contentHandler  = new 
ToXMLContentHandler();
                Metadata                        metadata                = new 
Metadata();
                ParseContext            context                 = new 
ParseContext();
//              AutoDetectParser        parser                  = new 
AutoDetectParser();
                ForkParser                      parser                  = new 
ForkParser();

                parser.setJavaCommand("/usr/local/java6/bin/java -Xmx64m");

                try {
                        fis = new FileInputStream(f);
                        parser.parse(fis, contentHandler, metadata, context);

                } catch (java.io.FileNotFoundException e) {
                        if (f != null)
                                logger.error("file " + f.toString() + " could 
not be found, exception: " + e, e);
                        else
                                logger.error("file could not be found, 
exception: " + e, e);
                        
                        return null;
                } catch (Throwable e) {
                        logger.error("Exception while analyzing file\n" +
                        "CAUTION: metadata may still have useful content in 
it!\n" +
                        "Exception: " + e, e);
                } finally {
                        if (fis != null) {
                                try { 
                                        fis.close(); 
                                } catch (java.io.IOException e){ 
                                        logger.error("input stream could not be 
closed: " + e, e); 
                                }
                        }
                }

                String contentType = null;
                if (metadata.get("Content-Type") == null) {
                        Tika tikaDetect = new Tika();
                        
                        try { contentType = tikaDetect.detect(f); } catch 
(Exception e) {logger.error("problem with detection: " + e, e); }
                }

                if (contentType != null)
                        logger.error("contentType: " + contentType);
                if (contentHandler != null)
                        logger.error("content handler: " + 
contentHandler.toString());
                if (metadata != null)
                        logger.error("metadata: " + metadata.toString());

                return metadata;
        }






                
> NullPointerException when parsing iWork files 
> ----------------------------------------------
>
>                 Key: TIKA-802
>                 URL: https://issues.apache.org/jira/browse/TIKA-802
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Java 6, Mac OS X 10.6, Keynote 5.1.1, Numbers 2.1, Pages 
> 4.1
>            Reporter: Arthur Meneau
>              Labels: NullPointerException, iWork, parse
>         Attachments: testKeynote.key, testNumbers.numbers, testPages.pages
>
>
> Excerpt from mailing list:
> I am having trouble parsing iWork documents with Tika 1.0.  These documents 
> are being saved with the appropriate versions specified by Tika's API 
> (Keynote 5.1.1, Numbers 2.1, Pages 4.1).  I have copy and pasted the error I 
> am receiving below. How can I get iWork documents to correctly parse?
> Mailing list thread:
> http://mail-archives.apache.org/mod_mbox/tika-user/201112.mbox/%3C8E630733-FD82-4A16-89BB-74488A1F7C9F%40xetus.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to