[ https://issues.apache.org/jira/browse/TIKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-4447: ------------------------------ Attachment: screenshot-1.png > eml attachement duplicate filename on extract > --------------------------------------------- > > Key: TIKA-4447 > URL: https://issues.apache.org/jira/browse/TIKA-4447 > Project: Tika > Issue Type: Bug > Affects Versions: 3.2.0 > Reporter: Gregory Lepore > Priority: Minor > Attachments: 12.eml, screenshot-1.png > > > Not sure if this is a bug or something wrong with the source files. I'm > extracting and analyzing attachments from a huge set of eml files (originally > in pst format). However, attachments are getting the filename doubled on > extraction. For example, for the attached eml file I get: > java -jar /media/lepore/Work/tika/tika.jar --extract 12.eml > Extracting 'rtf-body.rtfrtf-body.rtf' (application/rtf) to > ./cc9d8ebd-b93c-4235-b766-79b0aa841ef2-rtf-body.rtfrtf-body.rtf > Extracting '03-005 ACF GA Plan1.doc03-005 ACF GA Plan1.doc' > (application/msword) to ./0220432f-6dcc-4beb-b659-66be0fe0f60f-03-005 ACF GA > Plan1.doc03-005 ACF GA Plan1.doc > Extracting 'Talking Point1 1-17.docTalking Point1 1-17.doc' > (application/msword) to ./24bbaeab-448e-4d47-8b6d-ee9651156f89-Talking Point1 > 1-17.docTalking Point1 1-17.doc > All of the extracted file names are doubled. In the eml file I see: > Content-Type: application/msword > Content-Transfer-Encoding: base64 > Content-Disposition: attachment; > filename*=utf-8''Talking%20Point1%201-17.doc; > filename="Talking Point1 1-17.doc" > perhaps the doubled filename here is contributing to the problem? > Extracting the files with pffexport doesn't double the filename, but ripmime > has trouble, and munpack also has trouble. -- This message was sent by Atlassian Jira (v8.20.10#820010)