On 25.07.2023 21:13, Tim Allison wrote:
New reports are here:
https://corpora.tika.apache.org/base/reports/tika-2.8.1-prerc1-b.tgz
The alignment in tika-eval is still not working as planned, but the content
looks ok...more work to do on tika-eval.
We're getting many more exceptions with gz, and I noticed that we're also
getting many fewer attachments in some OLE2 based Office files. Both items
were happening in the last reports I ran...as I look back.
Agreed. I l always look at the TOP_10_MORE_IN_A column in the content
file, and the only occurrences where there's more look like something
got switched around in the same ppt file.
Tilman