manish-003 opened a new pull request, #2386:
URL: https://github.com/apache/tika/pull/2386

   changes were made to address 
[TIKA-4474](https://issues.apache.org/jira/projects/TIKA/issues/TIKA-4474)
   
   These changes forces spooling files for ooxml files. 
   Similar discussion have happened in 
[TIKA-4459](https://issues.apache.org/jira/projects/TIKA/issues/TIKA-4459)
   
   ## Performance analysis
   
   I threw in the ooxml files i had to the test. those files weren't all that 
big as you can infer from the size axis yet the performance difference is 
significant. the contrast is more f
   
[OOXMLExtractPerfTest.java](https://github.com/user-attachments/files/23387032/OOXMLExtractPerfTest.java)
   or larger files
   
   <img width="800" height="600" alt="non_spooling_perf_chart" 
src="https://github.com/user-attachments/assets/2a12965c-51ae-4e03-8bff-314590f15a9f";
 />
   <img width="800" height="600" alt="spooling_perf_chart" 
src="https://github.com/user-attachments/assets/68c98cbc-d4d0-4fd0-97bf-91ccb5f9b0ec";
 />
   
   I'll attach the script and text outputs in the issue page
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to