[ 
https://issues.apache.org/jira/browse/TIKA-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763933#comment-17763933
 ] 

ASF GitHub Bot commented on TIKA-4126:
--------------------------------------

patrickdalla commented on code in PR #1329:
URL: https://github.com/apache/tika/pull/1329#discussion_r1322066704


##########
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-xmp-commons/src/test/java/org/apache/tika/parser/xmp/JempboxExtractorTest.java:
##########
@@ -134,4 +134,14 @@ public void testMaxXMPMMHistory() throws Exception {
         }
     }
 
+    @Test
+    public void testModifiedTZ() throws Exception {
+        Metadata m = new Metadata();
+        JempboxExtractor ex = new JempboxExtractor(m);
+        try (InputStream is = 
getResourceAsStream("/test-documents/testXMP.xmp")) {
+            ex.parse(is);
+        }
+        assertEquals("2014-03-04T22:50:41Z", m.get(XMPMM.HISTORY_WHEN));

Review Comment:
   How could I download a tika app snapshot with this patch to test on my PDF 
file?





> PDF XMP ModifyDate extracted without TimeZone info
> --------------------------------------------------
>
>                 Key: TIKA-4126
>                 URL: https://issues.apache.org/jira/browse/TIKA-4126
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.4.0, 2.7.0, 2.8.0, 2.9.0
>            Reporter: Patrick Dalla Bernardina
>            Assignee: Tim Allison
>            Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I've run:
> {{[root@localhost Downloads]# java -jar tika-app-2.9.0.jar 
> sobreavisoEditado3.pdf | grep xmp}}
> that returned a time in UTC
> _WARN [main] 07:42:34,238 org.apache.pdfbox.pdmodel.font.PDType1Font Using 
> fallback font LiberationSans for base font Symbol_
> _WARN [main] 07:42:34,241 org.apache.pdfbox.pdmodel.font.PDType1Font Using 
> fallback font LiberationSans for base font ZapfDingbats_
> _<meta name="xmp:ModifyDate" content="2023-09-06T13:35:38Z"/>_
> _<meta name="xmp:MetadataDate" content="2023-09-06T13:35:38Z"/>_
> _<meta name="xmpTPg:NPages" content="11"/>_
>  
>  
> While running:
>  \{{java -jar pdfbox-app-2.0.29.jar ExtractXMP -console 
> sobreavisoEditado3.pdf }}
> Returned the correct info with the timezone info (-04:00):
> _{{{}<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta 
> xmlns:x="adobe:ns:meta/"><rdf:RDF 
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";><rdf:Description 
> rdf:about="" 
> xmp:ModifyDate="{*}2023-09-06T13:35:38{color:#de350b}+-04:00+{color}{*}" 
> xmlns:xmp="http://ns.adobe.com/xap/1.0/";><xmp:MetadataDate>2023-09-06T13:35:38-04:00</xmp:MetadataDate></rdf:Description></rdf:RDF></x:xmpmeta><?xpacket
>  end="w"?>{}}}{{{{}}{}}}_
>  
> _So the metadata string had striped its timezone info, without making any 
> HOUR OF DAY shift to the UTC timezone._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to