The govdocs file has 1290 MACRO (javascript) "attachments" with Tika 1.26-SNAPSHOT and 930 with Tika 1.25. I have no idea why there are more macros in the more recent version of Tika, but there are "attachments" broadly speaking.
I'll look into the NPEs. If those are a Java bug, I don't think those are a blocker. Still working on the open office document issues... LIBRE_OFFICE-45041-0.ods is showing some weird behavior. On Tue, Mar 23, 2021 at 2:58 PM Tilman Hausherr <[email protected]> wrote: > > Am 23.03.2021 um 17:31 schrieb Tim Allison: > > Reports are available here: > > https://corpora.tika.apache.org/base/reports/1_25_v_1_26.tgz > > > govdocs1/966/966679.pdf > > claims to have 360 attachments more than last time. I don't see a single > attachment, and when I run tika-app with "--extract" I get nothing??? > > > There are also some NPEs for BMP files, seems to be a java bug. > > > Tilman >
