[
https://issues.apache.org/jira/browse/TIKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4447:
------------------------------
Attachment: screenshot-1.png
> eml attachement duplicate filename on extract
> ---------------------------------------------
>
> Key: TIKA-4447
> URL: https://issues.apache.org/jira/browse/TIKA-4447
> Project: Tika
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Gregory Lepore
> Priority: Minor
> Attachments: 12.eml, screenshot-1.png
>
>
> Not sure if this is a bug or something wrong with the source files. I'm
> extracting and analyzing attachments from a huge set of eml files (originally
> in pst format). However, attachments are getting the filename doubled on
> extraction. For example, for the attached eml file I get:
> java -jar /media/lepore/Work/tika/tika.jar --extract 12.eml
> Extracting 'rtf-body.rtfrtf-body.rtf' (application/rtf) to
> ./cc9d8ebd-b93c-4235-b766-79b0aa841ef2-rtf-body.rtfrtf-body.rtf
> Extracting '03-005 ACF GA Plan1.doc03-005 ACF GA Plan1.doc'
> (application/msword) to ./0220432f-6dcc-4beb-b659-66be0fe0f60f-03-005 ACF GA
> Plan1.doc03-005 ACF GA Plan1.doc
> Extracting 'Talking Point1 1-17.docTalking Point1 1-17.doc'
> (application/msword) to ./24bbaeab-448e-4d47-8b6d-ee9651156f89-Talking Point1
> 1-17.docTalking Point1 1-17.doc
> All of the extracted file names are doubled. In the eml file I see:
> Content-Type: application/msword
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment;
> filename*=utf-8''Talking%20Point1%201-17.doc;
> filename="Talking Point1 1-17.doc"
> perhaps the doubled filename here is contributing to the problem?
> Extracting the files with pffexport doesn't double the filename, but ripmime
> has trouble, and munpack also has trouble.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)