[ 
https://issues.apache.org/jira/browse/PDFBOX-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Anderson updated PDFBOX-2318:
------------------------------------
    Description: 
Discovered this bug while trying to sync both TIKA and PDFBOX to their current 
SNAPSHOT builds.

Issue came to light when running Tika's 
JpegParserTest.testJPEGEmptyEXIFDateTime() junit test case since the test file 
contains the following property photoshop:LegacyIPTCDigest which is a non 
defined property in the PhotoshopSchema.

This causes a null Type to be created in DomXmpParser.parseDescriptionRoot().  
The solution in my patch is to default to text for any undefined types.  It may 
be beneficial to also log a warning about such types so that the schema files 
can be properly amended.  (Currently the LegacyIPTCDigest has not been added to 
the Schema in this patch)

Relates to work done via Tika in TIKA-1285


  was:
Discovered this bug while trying to sync both TIKA and PDFBOX to their current 
SNAPSHOT builds.

Issue came to light when running Tika's 
JpegParserTest.testJPEGEmptyEXIFDateTime() junit test case since the test file 
contains the following property photoshop:LegacyIPTCDigest which is a non 
defined property in the PhotoshopSchema.

This causes a null Type to be created in DomXmpParser.parseDescriptionRoot().  
The solution in my patch is to default to text for any undefined types.  It may 
be beneficial to also log a warning about such types so that the schema files 
can be properly amended.  (Currently the LegacyIPTCDigest has not been added to 
the Schema in this patch)



> NPE in new DomXmpParser when no type is found
> ---------------------------------------------
>
>                 Key: PDFBOX-2318
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2318
>             Project: PDFBox
>          Issue Type: Bug
>          Components: XmpBox
>    Affects Versions: 2.0.0
>            Reporter: Jeremy Anderson
>              Labels: patch
>         Attachments: PDFBOX-2318.patch, jsr170-1.0.pdf
>
>
> Discovered this bug while trying to sync both TIKA and PDFBOX to their 
> current SNAPSHOT builds.
> Issue came to light when running Tika's 
> JpegParserTest.testJPEGEmptyEXIFDateTime() junit test case since the test 
> file contains the following property photoshop:LegacyIPTCDigest which is a 
> non defined property in the PhotoshopSchema.
> This causes a null Type to be created in DomXmpParser.parseDescriptionRoot(). 
>  The solution in my patch is to default to text for any undefined types.  It 
> may be beneficial to also log a warning about such types so that the schema 
> files can be properly amended.  (Currently the LegacyIPTCDigest has not been 
> added to the Schema in this patch)
> Relates to work done via Tika in TIKA-1285



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to