I bumped the maximum recursion depth recently.  When I reverted that
depth temporarily to max depth of 10, I got 653 attachments, which
doesn't align with either 1.25 or 1.26-SNAPSHOT, but is smaller.

On Tue, Mar 23, 2021 at 3:51 PM Tim Allison <[email protected]> wrote:
>
> The govdocs file has 1290 MACRO (javascript) "attachments" with Tika
> 1.26-SNAPSHOT and 930 with Tika 1.25.  I have no idea why there are
> more macros in the more recent version of Tika, but there are
> "attachments" broadly speaking.
>
> I'll look into the NPEs.  If those are a Java bug, I don't think those
> are a blocker.
>
> Still working on the open office document issues...
> LIBRE_OFFICE-45041-0.ods is showing some weird behavior.
>
> On Tue, Mar 23, 2021 at 2:58 PM Tilman Hausherr <[email protected]> wrote:
> >
> > Am 23.03.2021 um 17:31 schrieb Tim Allison:
> > > Reports are available here:
> > > https://corpora.tika.apache.org/base/reports/1_25_v_1_26.tgz
> >
> >
> > govdocs1/966/966679.pdf
> >
> > claims to have 360 attachments more than last time. I don't see a single
> > attachment, and when I run tika-app with "--extract" I get nothing???
> >
> >
> > There are also some NPEs for BMP files, seems to be a java bug.
> >
> >
> > Tilman
> >

Reply via email to