[jira] [Commented] (TIKA-1369) Date parsing and thread safety in ImageMetadataExtractor

2014-08-26 Thread Vilmos Papp (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110592#comment-14110592
 ] 

Vilmos Papp commented on TIKA-1369:
---

Hi Nick,

In ImageMetadataExtrctorTest there are a test cases where new ExifHanlders 
created like:
{code}
ExifSubIFDDirectory exif = mock(ExifSubIFDDirectory.class);
 
when(exif.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)).thenReturn(true);
GregorianCalendar calendar = new 
GregorianCalendar(TimeZone.getDefault(), Locale.getDefault());
calendar.setTimeInMillis(0);
calendar.set(2000, 0, 1, 0, 0, 0);
 
when(exif.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)).thenReturn(
calendar.getTime()); // jvm default timezone as in Metadata 
Extractor
 Metadata metadata = new Metadata();
 
 new ImageMetadataExtractor.ExifHandler().handle(exif, metadata);
{code}

What happens if we cache ExifHandler like this:
{code}
ImageMetadataExtractor.ExifHandler cachedHandler = new 
ImageMetadataExtractor.ExifHandler();
{code}

And use it like this (possibly several times for different metadata):
{code}
cachedHandler.handle(exif, metadata);
{code}

This seems to me a valid scenario, and as caching an object for later usage 
should be fine, perhaps the original thread safety issue could occur, but I 
should create a test case for it to prove it or prove that it not occurs.

What do you think?

> Date parsing and thread safety in ImageMetadataExtractor
> 
>
> Key: TIKA-1369
> URL: https://issues.apache.org/jira/browse/TIKA-1369
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: OS X 10.9.4 Java 7_60
>Reporter: John Gibson
>Priority: Critical
>
> The {{ImageMetadataExtractor}} uses a static instance of 
> {{SimpleDateFormat}}.  This is not thread safe.
> {code:title=ImageMetadataExtractor.java}
> static class ExifHandler implements DirectoryHandler {
> private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss");
> ...
> public void handleDateTags(Directory directory, Metadata metadata)
> throws MetadataException {
> // Date/Time Original overrides value from 
> ExifDirectory.TAG_DATETIME
> Date original = null;
> if 
> (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) {
> original = 
> directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
> // Unless we have GPS time we don't know the time zone so 
> date must be set
> // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> if (original != null) {
> String datetimeNoTimeZone = 
> DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor 
> uses
> metadata.set(TikaCoreProperties.CREATED, 
> datetimeNoTimeZone);
> metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone);
> }
> }
>...
> {code}
> This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. 
> In the discussion there the idea of using alternative thread-safe (and 
> faster) formatters from either Joda time or Commons Lang were dismissed 
> because they would add too many dependencies. Given that Tika already has a 
> fairly large laundry list of dependencies to parse content, adding one more 
> JAR to make sure things don't break is probably a good idea.
> In addition, because no timezone or locale are specified by either Tika's 
> formatter or the call to com.drew.metadata.Directory it can wreak havok 
> during randomized testing. Given that the timezone is unknown, why not just 
> default it to UTC and let the caller guess the timezone? As it stands I have 
> to reparse all of the dates into UTC to get stable behavior across timezones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1369) Date parsing and thread safety in ImageMetadataExtractor

2014-08-26 Thread Vilmos Papp (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110444#comment-14110444
 ] 

Vilmos Papp commented on TIKA-1369:
---

Hi Nick,

Yes, that should fix it, I think.

> Date parsing and thread safety in ImageMetadataExtractor
> 
>
> Key: TIKA-1369
> URL: https://issues.apache.org/jira/browse/TIKA-1369
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: OS X 10.9.4 Java 7_60
>Reporter: John Gibson
>Priority: Critical
>
> The {{ImageMetadataExtractor}} uses a static instance of 
> {{SimpleDateFormat}}.  This is not thread safe.
> {code:title=ImageMetadataExtractor.java}
> static class ExifHandler implements DirectoryHandler {
> private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss");
> ...
> public void handleDateTags(Directory directory, Metadata metadata)
> throws MetadataException {
> // Date/Time Original overrides value from 
> ExifDirectory.TAG_DATETIME
> Date original = null;
> if 
> (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) {
> original = 
> directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
> // Unless we have GPS time we don't know the time zone so 
> date must be set
> // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> if (original != null) {
> String datetimeNoTimeZone = 
> DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor 
> uses
> metadata.set(TikaCoreProperties.CREATED, 
> datetimeNoTimeZone);
> metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone);
> }
> }
>...
> {code}
> This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. 
> In the discussion there the idea of using alternative thread-safe (and 
> faster) formatters from either Joda time or Commons Lang were dismissed 
> because they would add too many dependencies. Given that Tika already has a 
> fairly large laundry list of dependencies to parse content, adding one more 
> JAR to make sure things don't break is probably a good idea.
> In addition, because no timezone or locale are specified by either Tika's 
> formatter or the call to com.drew.metadata.Directory it can wreak havok 
> during randomized testing. Given that the timezone is unknown, why not just 
> default it to UTC and let the caller guess the timezone? As it stands I have 
> to reparse all of the dates into UTC to get stable behavior across timezones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1369) Date parsing and thread safety in ImageMetadataExtractor

2014-08-25 Thread Vilmos Papp (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108981#comment-14108981
 ] 

Vilmos Papp commented on TIKA-1369:
---

Hi Nick,

Did you manage to take a look at my fix?

Regards,
Vilmos

> Date parsing and thread safety in ImageMetadataExtractor
> 
>
> Key: TIKA-1369
> URL: https://issues.apache.org/jira/browse/TIKA-1369
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: OS X 10.9.4 Java 7_60
>Reporter: John Gibson
>Priority: Critical
>
> The {{ImageMetadataExtractor}} uses a static instance of 
> {{SimpleDateFormat}}.  This is not thread safe.
> {code:title=ImageMetadataExtractor.java}
> static class ExifHandler implements DirectoryHandler {
> private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss");
> ...
> public void handleDateTags(Directory directory, Metadata metadata)
> throws MetadataException {
> // Date/Time Original overrides value from 
> ExifDirectory.TAG_DATETIME
> Date original = null;
> if 
> (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) {
> original = 
> directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
> // Unless we have GPS time we don't know the time zone so 
> date must be set
> // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> if (original != null) {
> String datetimeNoTimeZone = 
> DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor 
> uses
> metadata.set(TikaCoreProperties.CREATED, 
> datetimeNoTimeZone);
> metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone);
> }
> }
>...
> {code}
> This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. 
> In the discussion there the idea of using alternative thread-safe (and 
> faster) formatters from either Joda time or Commons Lang were dismissed 
> because they would add too many dependencies. Given that Tika already has a 
> fairly large laundry list of dependencies to parse content, adding one more 
> JAR to make sure things don't break is probably a good idea.
> In addition, because no timezone or locale are specified by either Tika's 
> formatter or the call to com.drew.metadata.Directory it can wreak havok 
> during randomized testing. Given that the timezone is unknown, why not just 
> default it to UTC and let the caller guess the timezone? As it stands I have 
> to reparse all of the dates into UTC to get stable behavior across timezones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1369) Date parsing and thread safety in ImageMetadataExtractor

2014-07-29 Thread Vilmos Papp (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077587#comment-14077587
 ] 

Vilmos Papp commented on TIKA-1369:
---

Hi Nick,

Thanks, for the quick answer. I prefer pull request over attachments of patches.

Cheers,
Vilmos

> Date parsing and thread safety in ImageMetadataExtractor
> 
>
> Key: TIKA-1369
> URL: https://issues.apache.org/jira/browse/TIKA-1369
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: OS X 10.9.4 Java 7_60
>Reporter: John Gibson
>Priority: Critical
>
> The {{ImageMetadataExtractor}} uses a static instance of 
> {{SimpleDateFormat}}.  This is not thread safe.
> {code:title=ImageMetadataExtractor.java}
> static class ExifHandler implements DirectoryHandler {
> private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss");
> ...
> public void handleDateTags(Directory directory, Metadata metadata)
> throws MetadataException {
> // Date/Time Original overrides value from 
> ExifDirectory.TAG_DATETIME
> Date original = null;
> if 
> (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) {
> original = 
> directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
> // Unless we have GPS time we don't know the time zone so 
> date must be set
> // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> if (original != null) {
> String datetimeNoTimeZone = 
> DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor 
> uses
> metadata.set(TikaCoreProperties.CREATED, 
> datetimeNoTimeZone);
> metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone);
> }
> }
>...
> {code}
> This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. 
> In the discussion there the idea of using alternative thread-safe (and 
> faster) formatters from either Joda time or Commons Lang were dismissed 
> because they would add too many dependencies. Given that Tika already has a 
> fairly large laundry list of dependencies to parse content, adding one more 
> JAR to make sure things don't break is probably a good idea.
> In addition, because no timezone or locale are specified by either Tika's 
> formatter or the call to com.drew.metadata.Directory it can wreak havok 
> during randomized testing. Given that the timezone is unknown, why not just 
> default it to UTC and let the caller guess the timezone? As it stands I have 
> to reparse all of the dates into UTC to get stable behavior across timezones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1369) Date parsing and thread safety in ImageMetadataExtractor

2014-07-29 Thread Vilmos Papp (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077561#comment-14077561
 ] 

Vilmos Papp commented on TIKA-1369:
---

Hi,

I've sent a pull request on github to fix this: 
https://github.com/chrismattmann/tika/pull/1, I hope I sent it to the proper 
person, if not, where should I send it?

Regards,
Vilmos

> Date parsing and thread safety in ImageMetadataExtractor
> 
>
> Key: TIKA-1369
> URL: https://issues.apache.org/jira/browse/TIKA-1369
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.5
> Environment: OS X 10.9.4 Java 7_60
>Reporter: John Gibson
>Priority: Critical
>
> The {{ImageMetadataExtractor}} uses a static instance of 
> {{SimpleDateFormat}}.  This is not thread safe.
> {code:title=ImageMetadataExtractor.java}
> static class ExifHandler implements DirectoryHandler {
> private static final SimpleDateFormat DATE_UNSPECIFIED_TZ = new 
> SimpleDateFormat("-MM-dd'T'HH:mm:ss");
> ...
> public void handleDateTags(Directory directory, Metadata metadata)
> throws MetadataException {
> // Date/Time Original overrides value from 
> ExifDirectory.TAG_DATETIME
> Date original = null;
> if 
> (directory.containsTag(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL)) {
> original = 
> directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
> // Unless we have GPS time we don't know the time zone so 
> date must be set
> // as ISO 8601 datetime without timezone suffix (no Z or +/-)
> if (original != null) {
> String datetimeNoTimeZone = 
> DATE_UNSPECIFIED_TZ.format(original); // Same time zone as Metadata Extractor 
> uses
> metadata.set(TikaCoreProperties.CREATED, 
> datetimeNoTimeZone);
> metadata.set(Metadata.ORIGINAL_DATE, datetimeNoTimeZone);
> }
> }
>...
> {code}
> This is not the first time that SDF has caused problems: TIKA-495, TIKA-864. 
> In the discussion there the idea of using alternative thread-safe (and 
> faster) formatters from either Joda time or Commons Lang were dismissed 
> because they would add too many dependencies. Given that Tika already has a 
> fairly large laundry list of dependencies to parse content, adding one more 
> JAR to make sure things don't break is probably a good idea.
> In addition, because no timezone or locale are specified by either Tika's 
> formatter or the call to com.drew.metadata.Directory it can wreak havok 
> during randomized testing. Given that the timezone is unknown, why not just 
> default it to UTC and let the caller guess the timezone? As it stands I have 
> to reparse all of the dates into UTC to get stable behavior across timezones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)