[ 
https://issues.apache.org/jira/browse/TIKA-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383880#comment-16383880
 ] 

Nick Burch commented on TIKA-2597:
----------------------------------

Trying to fully re-implement the Windows case-insensitivity rules doesn't sound 
that much fun... Unless someone can find a small library / JRE system function 
that does it for us?

Otherwise, Microsoft have been doing some work recently to fix various Windows 
bugs and limitations around their case-sensitivity. You might find it easier to 
just turn that on for your extraction directories! Details from a few days ago 
at 
https://blogs.msdn.microsoft.com/commandline/2018/02/28/per-directory-case-sensitivity-and-wsl/

> Attachment Extraction Case Sensitivity
> --------------------------------------
>
>                 Key: TIKA-2597
>                 URL: https://issues.apache.org/jira/browse/TIKA-2597
>             Project: Tika
>          Issue Type: Bug
>          Components: app
>    Affects Versions: 1.17
>         Environment: windows
>            Reporter: Todd Dixon
>            Priority: Major
>
> Using the --extract option on a pdf with embedded files I am seeing that not 
> all of the attachments are extracted.  There are several files embedded that 
> contain the same name.  The names that are exactly the same are accounted for 
> with a suffix of (1) etc.  However when there is a similar name that is not 
> the same case the parse does not account for changing the name with the 
> suffix and thus overwrites the file on disk.  Example
> FW Letter,.msg
> FW letter.msg
> Will result in only one attachment extracted.  Would it be possible to update 
> the filename comparison to account for windows file systems which see those 
> two files as the same name?
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to