[ https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160234#comment-14160234 ]
ASF GitHub Bot commented on TIKA-1369: -------------------------------------- GitHub user vilmospapp opened a pull request: https://github.com/apache/tika/pull/17 TIKA-1369 Avoid ThreadLocal usage from Memory Leak Hi @chrismattmann , Based on our discussion from https://github.com/apache/tika/pull/15 I've added the ThreadLocal clean up part, so theoretically it won't suffer from the scenario that @grossws mentioned. Cheers, Vilmos You can merge this pull request into a Git repository by running: $ git pull https://github.com/vilmospapp/tika TIKA-1369-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tika/pull/17.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17 ---- commit f95fad94619946ef1d4fe7cf407deab6317ad2fd Author: Vilmos Papp <papp.gyorgy.vil...@gmail.com> Date: 2014-10-06T12:10:14Z TIKA-1369 Avoid ThreadLocal usage from Memory Leak ---- > Date parsing and thread safety in ImageMetadataExtractor > -------------------------------------------------------- > > Key: TIKA-1369 > URL: https://issues.apache.org/jira/browse/TIKA-1369 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.5 > Environment: OS X 10.9.4 Java 7_60 > Reporter: John Gibson > Assignee: Chris A. Mattmann > Priority: Critical > Fix For: 1.7 > > > The {{ImageMetadataExtractor}} uses a static instance of > {{SimpleDateFormat}}. This is not thread safe. > {code:title=ImageMetadataExtractor.java} > static class ExifHandler implements DirectoryHandler { > private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new > SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss"); > ... > public void handleDateTags(Directory directory, Metadata metadata) > throws MetadataException { > // Date/Time Original overrides value from > ExifDirectory.TAG_DATETIME > Date original = null; > if > (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) { > original = > directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL); > // Unless we have GPS time we don't know the time zone so > date must be set > // as ISO 8601 datetime without timezone suffix (no Z or +/-) > if (original != null) { > String datetimeNoTimeZone = > DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor > uses > metadata.set(TikaCoreProperties.CREATED, > datetimeNoTimeZone); > metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone); > } > } > ... > {code} > This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. > In the discussion there the idea of using alternative thread-safe (and > faster) formatters from either Joda time or Commons Lang were dismissed > because they would add too many dependencies. Given that Tika already has a > fairly large laundry list of dependencies to parse content, adding one more > JAR to make sure things don't break is probably a good idea. > In addition, because no timezone or locale are specified by either Tika's > formatter or the call to com.drew.metadata.Directory it can wreak havok > during randomized testing. Given that the timezone is unknown, why not just > default it to UTC and let the caller guess the timezone? As it stands I have > to reparse all of the dates into UTC to get stable behavior across timezones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)