I bumped the maximum recursion depth recently. When I reverted that depth temporarily to max depth of 10, I got 653 attachments, which doesn't align with either 1.25 or 1.26-SNAPSHOT, but is smaller.
On Tue, Mar 23, 2021 at 3:51 PM Tim Allison <[email protected]> wrote: > > The govdocs file has 1290 MACRO (javascript) "attachments" with Tika > 1.26-SNAPSHOT and 930 with Tika 1.25. I have no idea why there are > more macros in the more recent version of Tika, but there are > "attachments" broadly speaking. > > I'll look into the NPEs. If those are a Java bug, I don't think those > are a blocker. > > Still working on the open office document issues... > LIBRE_OFFICE-45041-0.ods is showing some weird behavior. > > On Tue, Mar 23, 2021 at 2:58 PM Tilman Hausherr <[email protected]> wrote: > > > > Am 23.03.2021 um 17:31 schrieb Tim Allison: > > > Reports are available here: > > > https://corpora.tika.apache.org/base/reports/1_25_v_1_26.tgz > > > > > > govdocs1/966/966679.pdf > > > > claims to have 360 attachments more than last time. I don't see a single > > attachment, and when I run tika-app with "--extract" I get nothing??? > > > > > > There are also some NPEs for BMP files, seems to be a java bug. > > > > > > Tilman > >
