[ 
https://issues.apache.org/jira/browse/TIKA-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604516#comment-16604516
 ] 

Uwe Schindler commented on TIKA-2722:
-------------------------------------

[~dsmiley]: I think this is a bug in Java 11. I know there were some changes 
with formatting time zones. According to their docs, the timezones are now 
printed according to the selected locale, if none given, the default one. This 
is fine in most cases, but seems to affect locales where the digits are 
different (non-ascii). Previously timezones that have no name (numeric only) 
seem to have been printed in ASCII digits. Nevertheless, only the timezone is 
printed with locale dependent digits, not the date itsself (reason: no date 
formatter is used, it just concats integers to format the date in toString for 
compatibility reasons).

Did you send Rory O'Donnel a note, he can speedup assigning the JDK issue ID?!

IMHO: TIKA should stop using java.util.Date and should go for java.time APIs, 
maybe start with using Instant instead of Date.

> Don't call Date.toString (Possible issue with JDK 11)
> -----------------------------------------------------
>
>                 Key: TIKA-2722
>                 URL: https://issues.apache.org/jira/browse/TIKA-2722
>             Project: Tika
>          Issue Type: Bug
>         Environment: Tika 1.18, JDK 11 with locale set to "ar-EG".  
>            Reporter: David Smiley
>            Priority: Major
>
> I'm troubleshooting [a test failure in Apache 
> Lucene/Sor|https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22799/] 
> "extracting" contrib that occurs in JDK 11 with locale "ar-EG".  JDK 8 & 9 
> passes; I don't know about JDK 10. It has to do with extracting date metadata 
> from a PDF, particularly the created date but perhaps others too.
> I stepped through the code into Tika and I think I've found out where the 
> troublesome code is.  First note PDFParser line 271: {{addMetadata(metadata, 
> "created", info.getCreationDate());}}.  That addMetadata overload variant 
> will call toString on a Date.  IMO that's asking for trouble since the output 
> of that is Locale-dependent.  I think that's okay to show to a user but not 
> for machine-to-machine information exchange.  In the case of the test, it 
> yielded this odd looking date string:
> Thu Nov 13 18:35:51 GMT+٠٥:٠٠ 2008
> I pasted that in and it looks consistent with what I see in IntelliJ and in 
> Jenkins logs; hopefully will post correctly to JIRA.  The odd part is the 
> hour & minutes relative to GMT.  I won't be certain until after I click 
> "Create".
> Perhaps this problem is also indicative of a JDK 11 bug?  Nevertheless I 
> think Tika should avoid calling Date.toString().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to