[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document with half hour timezone
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139126#comment-14139126 ] Tilman Hausherr commented on PDFBOX-2356: - You can get the snapshot version here in a few hours: https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/preflight/1.8.8-SNAPSHOT/ (look at the date) > Error Validating PDF Archive Document with half hour timezone > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8, 2.0.0 >Reporter: Cetra Free >Assignee: Tilman Hausherr > Fix For: 1.8.8, 2.0.0 > > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document with half hour timezone
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139114#comment-14139114 ] ASF subversion and git services commented on PDFBOX-2356: - Commit 1626017 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1626017 ] PDFBOX-2356: add tests for part hour timezones > Error Validating PDF Archive Document with half hour timezone > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document with half hour timezone
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139113#comment-14139113 ] ASF subversion and git services commented on PDFBOX-2356: - Commit 1626016 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1626016 ] PDFBOX-2356: add tests for part hour timezones > Error Validating PDF Archive Document with half hour timezone > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document with half hour timezone
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139110#comment-14139110 ] ASF subversion and git services commented on PDFBOX-2356: - Commit 1626015 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1626015 ] PDFBOX-2356: correct handling of part hour timezones > Error Validating PDF Archive Document with half hour timezone > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document with half hour timezone
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139109#comment-14139109 ] ASF subversion and git services commented on PDFBOX-2356: - Commit 1626014 from [~tilman] in branch 'pdfbox/branches/1.8' [ https://svn.apache.org/r1626014 ] PDFBOX-2356: correct handling of part hour timezones > Error Validating PDF Archive Document with half hour timezone > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document with half hour timezone
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139105#comment-14139105 ] Tilman Hausherr commented on PDFBOX-2356: - I will also add tests for the timezone "Australia/Adelaide" in your honour :-) > Error Validating PDF Archive Document with half hour timezone > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document with half hour timezone
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138591#comment-14138591 ] Cetra Free commented on PDFBOX-2356: Well, it's actually Adelaide/Australia! Which is +9:30 or +10:30 depending on the time of the year (http://en.wikipedia.org/wiki/Australian_Time) There is also a +8:45 time zone just to make things confusing. > Error Validating PDF Archive Document with half hour timezone > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138580#comment-14138580 ] Tilman Hausherr commented on PDFBOX-2356: - 1) My change does work with your file, but I will expand the tests tonight and then commit the changes and they'll be available in the snapshot and in the next release in a few months 2) Your issue indicates that PDFBox preflight has never been used in India. http://geography.about.com/od/culturalgeography/a/offsettimezones.htm > Error Validating PDF Archive Document > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6, 1.8.7, 1.8.8 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138197#comment-14138197 ] Cetra Free commented on PDFBOX-2356: I'm just using the code from here: http://pdfbox.apache.org/cookbook/pdfavalidation.html {code} ValidationResult result = null; FileDataSource fd = new FileDataSource(args[0]); PreflightParser parser = new PreflightParser(fd); try { /* Parse the PDF file with PreflightParser that inherits from the NonSequentialParser. * Some additional controls are present to check a set of PDF/A requirements. * (Stream length consistency, EOL after some Keyword...) */ parser.parse(); /* Once the syntax validation is done, * the parser can provide a PreflightDocument * (that inherits from PDDocument) * This document process the end of PDF/A validation. */ PreflightDocument document = parser.getPreflightDocument(); document.validate(); // Get validation result result = document.getResult(); document.close(); } catch (SyntaxValidationException e) { /* the parse method can throw a SyntaxValidationException *if the PDF file can't be parsed. */ In this case, the exception contains an instance of ValidationResult result = e.getResult(); } // display validation result if (result.isValid()) { System.out.println("The file " + args[0] + " is a valid PDF/A-1b file"); } else { System.out.println("The file" + args[0] + " is not valid, error(s) :"); for (ValidationError error : result.getErrorsList()) { System.out.println(error.getErrorCode() + " : " + error.getDetails()); } } {code} > Error Validating PDF Archive Document > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2356) Error Validating PDF Archive Document
[ https://issues.apache.org/jira/browse/PDFBOX-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137490#comment-14137490 ] Tilman Hausherr commented on PDFBOX-2356: - Are you building from source? If yes, please try this: {code} private static void adjustTimeZoneNicely(GregorianCalendar cal, TimeZone tz) { cal.setTimeZone(tz); int offset = (cal.get(Calendar.ZONE_OFFSET) + cal.get(Calendar.DST_OFFSET)) / MILLIS_PER_MINUTE; cal.add(Calendar.MINUTE, -offset); } {code} If no, please post a minimal code or command line that you used to check your file (I never use preflight) and I'll test it. > Error Validating PDF Archive Document > - > > Key: PDFBOX-2356 > URL: https://issues.apache.org/jira/browse/PDFBOX-2356 > Project: PDFBox > Issue Type: Bug > Components: Preflight >Affects Versions: 1.8.4, 1.8.5, 1.8.6 >Reporter: Cetra Free > Attachments: pdfafile.pdf > > > When trying to validate a PDF archive file (attached to this ticket) we get > the following error: > {code} > 7.2 - Error on MetaData, ModificationDate present in the document catalog > dictionary doesn't match with XMP information > {code} > This is because the the Modification Date in the Dictionary is parsed > differently from the XMP Metadata. The XMP Metadata is correct, but the Date > from the Dictionary appends an extra 30 minutes. > The following is the raw COSObject from the PDF File > {code} > COSString{D:20140917122850+09'30'} > {code} > The Long value should be *141092273* > The *org.apache.pdfbox.util.DateConverter* *parseDate* method returns the > Date with Long *141092453* which is 30 minutes ahead. > XMP Modification Date is parsed differently and returns the correct date. > This means that validation will fail for PDF Archives. > My suggestion would be to refactor the parseDate function to use the Standard > Java library. > Here's an example class which will be compatible with the PDF Specification: > {code} > static class DateParser { > private Map formats = >new HashMap(); > > public DateParser() { >String expr = ""; > > for(String part: Arrays.asList("", "MM", "dd", "HH", "mm", "ss", "Z")) { > expr = expr + part; > formats.put(expr.length(), new SimpleDateFormat(expr)); >} > } > > public Calendar parseDate(String expr) { >try { > expr = expr.replace("D:", "").replace("'", "").replace("Z", "+"); > Date date = formats.get(Math.min(expr.length(), 15)).parse(expr); > > > Calendar calendar = Calendar.getInstance(); > calendar.setTime(date); > > return calendar; >} catch (ParseException e) { > return null; >} > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)