[ 
https://issues.apache.org/jira/browse/TIKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17987942#comment-17987942
 ] 

Tim Allison commented on TIKA-4447:
-----------------------------------

That looks like a bug/area for improvement in the upstream james mime4j 
library. When I open the eml in Thunderbird, it gets the correct names.

 !screenshot-1.png! 

I'm going to try to generate a minimal reproducer and will then open an issue 
with mime4j.

Thank you, [~g...@rhobard.com]!

> eml attachement duplicate filename on extract
> ---------------------------------------------
>
>                 Key: TIKA-4447
>                 URL: https://issues.apache.org/jira/browse/TIKA-4447
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Gregory Lepore
>            Priority: Minor
>         Attachments: 12.eml, screenshot-1.png
>
>
> Not sure if this is a bug or something wrong with the source files. I'm 
> extracting and analyzing attachments from a huge set of eml files (originally 
> in pst format). However, attachments are getting the filename doubled on 
> extraction. For example, for the attached eml file I get:
> java -jar /media/lepore/Work/tika/tika.jar --extract  12.eml 
> Extracting 'rtf-body.rtfrtf-body.rtf' (application/rtf) to 
> ./cc9d8ebd-b93c-4235-b766-79b0aa841ef2-rtf-body.rtfrtf-body.rtf 
> Extracting '03-005 ACF GA Plan1.doc03-005 ACF GA Plan1.doc' 
> (application/msword) to ./0220432f-6dcc-4beb-b659-66be0fe0f60f-03-005 ACF GA 
> Plan1.doc03-005 ACF GA Plan1.doc 
> Extracting 'Talking Point1 1-17.docTalking Point1 1-17.doc' 
> (application/msword) to ./24bbaeab-448e-4d47-8b6d-ee9651156f89-Talking Point1 
> 1-17.docTalking Point1 1-17.doc
> All of the extracted file names are doubled. In the eml file I see:
> Content-Type: application/msword
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment; 
>         filename*=utf-8''Talking%20Point1%201-17.doc;
>         filename="Talking Point1 1-17.doc"
> perhaps the doubled filename here is contributing to the problem?
> Extracting the files with pffexport doesn't double the filename, but ripmime 
> has trouble, and munpack also has trouble.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to