[ https://issues.apache.org/jira/browse/TIKA-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383880#comment-16383880 ]
Nick Burch commented on TIKA-2597: ---------------------------------- Trying to fully re-implement the Windows case-insensitivity rules doesn't sound that much fun... Unless someone can find a small library / JRE system function that does it for us? Otherwise, Microsoft have been doing some work recently to fix various Windows bugs and limitations around their case-sensitivity. You might find it easier to just turn that on for your extraction directories! Details from a few days ago at https://blogs.msdn.microsoft.com/commandline/2018/02/28/per-directory-case-sensitivity-and-wsl/ > Attachment Extraction Case Sensitivity > -------------------------------------- > > Key: TIKA-2597 > URL: https://issues.apache.org/jira/browse/TIKA-2597 > Project: Tika > Issue Type: Bug > Components: app > Affects Versions: 1.17 > Environment: windows > Reporter: Todd Dixon > Priority: Major > > Using the --extract option on a pdf with embedded files I am seeing that not > all of the attachments are extracted. There are several files embedded that > contain the same name. The names that are exactly the same are accounted for > with a suffix of (1) etc. However when there is a similar name that is not > the same case the parse does not account for changing the name with the > suffix and thus overwrites the file on disk. Example > FW Letter,.msg > FW letter.msg > Will result in only one attachment extracted. Would it be possible to update > the filename comparison to account for windows file systems which see those > two files as the same name? > Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)