[ https://issues.apache.org/jira/browse/TIKA-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995135#comment-13995135 ]
Tim Allison commented on TIKA-1233: ----------------------------------- [~lfcnassif], please reopen if you are still finding problems on your test set with trunk. > PDFBox can throw StringIndexOutOfBoundsException on some dates > -------------------------------------------------------------- > > Key: TIKA-1233 > URL: https://issues.apache.org/jira/browse/TIKA-1233 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.5 > Reporter: Tim Allison > Priority: Trivial > Labels: easyfix > Fix For: 1.6 > > > PDFBOX's date parser can throw a StringIndexOutOfBoundsException if a date > string for parsing is empty or contains only spaces. A few of my test pdfs > have this "feature." > Until PDFBOX-1803 is resolved, we can add an extra catch to prevent this from > causing problems in TIKA > {noformat} > @@ -171,6 +171,9 @@ > addMetadata(metadata, TikaCoreProperties.CREATED, > info.getCreationDate()); > } catch (IOException e) { > // Invalid date format, just ignore > + } catch (StringIndexOutOfBoundsException e){ > + //remove after PDFBOX-1883 is fixed > + // Invalid date format, just ignore > } > try { > Calendar modified = info.getModificationDate(); > @@ -178,6 +181,9 @@ > addMetadata(metadata, TikaCoreProperties.MODIFIED, modified); > } catch (IOException e) { > // Invalid date format, just ignore > + } catch (StringIndexOutOfBoundsException e){ > + //remove after PDFBOX-1883 is fixed > + // Invalid date format, just ignore > } > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)