+1
I just finished the run against 38k documents. We're getting more attachments
from doc files, and ~251 ppt files are no longer throwing exceptions.
I did discover a potential multithreading issue in ppts, but I can only
reproduce it so far with tika-app in batch mode when I run against files sorted
by mime type (all ppts at once). I can reproduce it for 3.13 with the same set
up (tika-app, batch mode with a list of files sorted by mime type).
I can't reproduce it yet in junit. I'll open an issue on our tracker for that.
Cheers,
Tim