[ https://issues.apache.org/jira/browse/TIKA-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763933#comment-17763933 ]
ASF GitHub Bot commented on TIKA-4126: -------------------------------------- patrickdalla commented on code in PR #1329: URL: https://github.com/apache/tika/pull/1329#discussion_r1322066704 ########## tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-xmp-commons/src/test/java/org/apache/tika/parser/xmp/JempboxExtractorTest.java: ########## @@ -134,4 +134,14 @@ public void testMaxXMPMMHistory() throws Exception { } } + @Test + public void testModifiedTZ() throws Exception { + Metadata m = new Metadata(); + JempboxExtractor ex = new JempboxExtractor(m); + try (InputStream is = getResourceAsStream("/test-documents/testXMP.xmp")) { + ex.parse(is); + } + assertEquals("2014-03-04T22:50:41Z", m.get(XMPMM.HISTORY_WHEN)); Review Comment: How could I download a tika app snapshot with this patch to test on my PDF file? > PDF XMP ModifyDate extracted without TimeZone info > -------------------------------------------------- > > Key: TIKA-4126 > URL: https://issues.apache.org/jira/browse/TIKA-4126 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.4.0, 2.7.0, 2.8.0, 2.9.0 > Reporter: Patrick Dalla Bernardina > Assignee: Tim Allison > Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I've run: > {{[root@localhost Downloads]# java -jar tika-app-2.9.0.jar > sobreavisoEditado3.pdf | grep xmp}} > that returned a time in UTC > _WARN [main] 07:42:34,238 org.apache.pdfbox.pdmodel.font.PDType1Font Using > fallback font LiberationSans for base font Symbol_ > _WARN [main] 07:42:34,241 org.apache.pdfbox.pdmodel.font.PDType1Font Using > fallback font LiberationSans for base font ZapfDingbats_ > _<meta name="xmp:ModifyDate" content="2023-09-06T13:35:38Z"/>_ > _<meta name="xmp:MetadataDate" content="2023-09-06T13:35:38Z"/>_ > _<meta name="xmpTPg:NPages" content="11"/>_ > > > While running: > \{{java -jar pdfbox-app-2.0.29.jar ExtractXMP -console > sobreavisoEditado3.pdf }} > Returned the correct info with the timezone info (-04:00): > _{{{}<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta > xmlns:x="adobe:ns:meta/"><rdf:RDF > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><rdf:Description > rdf:about="" > xmp:ModifyDate="{*}2023-09-06T13:35:38{color:#de350b}+-04:00+{color}{*}" > xmlns:xmp="http://ns.adobe.com/xap/1.0/"><xmp:MetadataDate>2023-09-06T13:35:38-04:00</xmp:MetadataDate></rdf:Description></rdf:RDF></x:xmpmeta><?xpacket > end="w"?>{}}}{{{{}}{}}}_ > > _So the metadata string had striped its timezone info, without making any > HOUR OF DAY shift to the UTC timezone._ -- This message was sent by Atlassian Jira (v8.20.10#820010)