[ 
https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077578#comment-14077578
 ] 

Nick Burch commented on TIKA-1369:
----------------------------------

Please send the pull request to the main github repo - 
https://github.com/apache/tika/ - or post a patch here

Please see the Contributing to Apache Tika page - 
http://tika.apache.org/contribute.html - for more on the various supported ways 
to build / test / contribute enhancements and fixes!

> Date parsing and thread safety in ImageMetadataExtractor
> --------------------------------------------------------
>
>                 Key: TIKA-1369
>                 URL: https://issues.apache.org/jira/browse/TIKA-1369
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>         Environment: OS X 10.9.4 Java 7_60
>            Reporter: John Gibson
>            Priority: Critical
>
> The {{ImageMetadataExtractor}} uses a static instance of 
> {{SimpleDateFormat}}.  This is not thread safe.
> {code:title=ImageMetadataExtractor.java}
>     static class ExifHandler implements DirectoryHandler {
>         private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new 
> SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss");
>         ...
>         public void handleDateTags(Directory directory, Metadata metadata)
>                 throws MetadataException {
>             // Date/Time Original overrides value from 
> ExifDirectory.TAG_DATETIME
>             Date original = null;
>             if 
> (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) {
>                 original = 
> directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
>                 // Unless we have GPS time we don't know the time zone so 
> date must be set
>                 // as ISO 8601 datetime without timezone suffix (no Z or +/-)
>                 if (original != null) {
>                     String datetimeNoTimeZone = 
> DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor 
> uses
>                     metadata.set(TikaCoreProperties.CREATED, 
> datetimeNoTimeZone);
>                     metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone);
>                 }
>             }
>        ...
> {code}
> This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. 
> In the discussion there the idea of using alternative thread-safe (and 
> faster) formatters from either Joda time or Commons Lang were dismissed 
> because they would add too many dependencies. Given that Tika already has a 
> fairly large laundry list of dependencies to parse content, adding one more 
> JAR to make sure things don't break is probably a good idea.
> In addition, because no timezone or locale are specified by either Tika's 
> formatter or the call to com.drew.metadata.Directory it can wreak havok 
> during randomized testing. Given that the timezone is unknown, why not just 
> default it to UTC and let the caller guess the timezone? As it stands I have 
> to reparse all of the dates into UTC to get stable behavior across timezones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to