Tim Allison created TIKA-4617:
---------------------------------

             Summary: Fix embedded stream translator to avoid changing file name
                 Key: TIKA-4617
                 URL: https://issues.apache.org/jira/browse/TIKA-4617
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


In 3.x, while reviewing the results in preparation for the next release, I 
noticed embedded ole file names were different between the last release and the 
current branch_3x. 

The cause of this difference is that we're now applying the stream translators 
on embedded oles when we run digesting. I had copy+pasted the stream 
translation code from tika-app or maybe tika-server's /unpack and not thought 
clearly enough about 2nd order consequences.

During the digesting phase, the stream translators are modifying the embedded 
file name.

We should fix this so that digesting doesn't modify the metadata at all...aside 
from adding digests.

In looking at some of the other diffs, I think this causes quite a few second 
and third order problems. Once we fix this, I _think_ we'll have addressed most 
of the issues in the 3.x diffs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to