[
https://issues.apache.org/jira/browse/TIKA-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3968:
------------------------------
Description:
I'm starting to see among several users communicating with me privately that
Microsoft -has changed their basic behavior- for files attached to at least
docx files (possibly pptx and xlsx?). Rather than storing the original file
name, the file associates an EMF file with an attachment. The filename that a
human sees in the application is spelled/painted out in the EMF file, but does
NOT exist in any of the XML.
I'm attaching an example file.
In fixing this issue, I've noticed that some of our fairly old docx files use
this technique. Not clear that it is a new thing, just happen to be hearing
about it from several people.
was:
I'm starting to see among several users communicating with me privately that
Microsoft has changed their basic behavior for files attached to at least docx
files (possibly pptx and xlsx?). Rather than storing the original file name,
the file associates an EMF file with an attachment. The filename that a human
sees in the application is spelled/painted out in the EMF file, but does NOT
exist in any of the XML.
I'm attaching an example file.
> Reconstruct embedded file names from associated emf files within docx files
> ---------------------------------------------------------------------------
>
> Key: TIKA-3968
> URL: https://issues.apache.org/jira/browse/TIKA-3968
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
> Attachments: Microsoft_Word_Document.docx,
> image-2023-02-06-15-46-05-678.png, image-2023-02-06-15-58-20-443.png,
> image1-1.emf, image1-2.emf, image1.emf, image2.emf, image3.emf,
> oleObject1.bin, oleObject2.bin, testWORD has attachment.docx
>
>
> I'm starting to see among several users communicating with me privately that
> Microsoft -has changed their basic behavior- for files attached to at least
> docx files (possibly pptx and xlsx?). Rather than storing the original file
> name, the file associates an EMF file with an attachment. The filename that
> a human sees in the application is spelled/painted out in the EMF file, but
> does NOT exist in any of the XML.
> I'm attaching an example file.
> In fixing this issue, I've noticed that some of our fairly old docx files use
> this technique. Not clear that it is a new thing, just happen to be hearing
> about it from several people.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)