[jira] [Updated] (TIKA-4447) eml attachement duplicate filename on extract

Tim Allison (Jira) Wed, 02 Jul 2025 09:28:11 -0700


     [ 
https://issues.apache.org/jira/browse/TIKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Allison updated TIKA-4447:
------------------------------
    Attachment: screenshot-1.png

> eml attachement duplicate filename on extract
> ---------------------------------------------
>
>                 Key: TIKA-4447
>                 URL: https://issues.apache.org/jira/browse/TIKA-4447
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Gregory Lepore
>            Priority: Minor
>         Attachments: 12.eml, screenshot-1.png
>
>
> Not sure if this is a bug or something wrong with the source files. I'm 
> extracting and analyzing attachments from a huge set of eml files (originally 
> in pst format). However, attachments are getting the filename doubled on 
> extraction. For example, for the attached eml file I get:
> java -jar /media/lepore/Work/tika/tika.jar --extract  12.eml 
> Extracting 'rtf-body.rtfrtf-body.rtf' (application/rtf) to 
> ./cc9d8ebd-b93c-4235-b766-79b0aa841ef2-rtf-body.rtfrtf-body.rtf 
> Extracting '03-005 ACF GA Plan1.doc03-005 ACF GA Plan1.doc' 
> (application/msword) to ./0220432f-6dcc-4beb-b659-66be0fe0f60f-03-005 ACF GA 
> Plan1.doc03-005 ACF GA Plan1.doc 
> Extracting 'Talking Point1 1-17.docTalking Point1 1-17.doc' 
> (application/msword) to ./24bbaeab-448e-4d47-8b6d-ee9651156f89-Talking Point1 
> 1-17.docTalking Point1 1-17.doc
> All of the extracted file names are doubled. In the eml file I see:
> Content-Type: application/msword
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment; 
>         filename*=utf-8''Talking%20Point1%201-17.doc;
>         filename="Talking Point1 1-17.doc"
> perhaps the doubled filename here is contributing to the problem?
> Extracting the files with pffexport doesn't double the filename, but ripmime 
> has trouble, and munpack also has trouble.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (TIKA-4447) eml attachement duplicate filename on extract

Reply via email to