Am 28.07.20 um 23:51 schrieb Tim Allison:
Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.21-SNAPSHOT.tgz

Looks like extraction improved slightly.  I found a bug at the Tika level
that is creating a few more exceptions (will fix soon), but this is not a
problem for PDFBox.

I was able to turn back on our unit test that counted characters and
non-unicode mapped characters.

I'll look a bit tomorrow, but this looks good to me.
Do you some time to rerun those tests using the latest SNAPSHOT?

Andreas


Again, many thanks to Maruan!  The processing speeds were, um, much, much
faster.

Best,

        Tim

On Tue, Jul 28, 2020 at 10:56 AM Andreas Lehmkuehler <andr...@lehmi.de>
wrote:

Yes, please

Thanks in advance!

Am 28.07.20 um 12:45 schrieb Tim Allison:
Y. I can run these today

On Tue, Jul 28, 2020 at 2:58 AM Andreas Lehmkuehler <andr...@lehmi.de>
wrote:

Hi,

is there any chance to run the PDFBox regression tests (2.0.20 vs.
SNAPSHOT) on
our new box? Does anyone had the cycles to prepare something ready to
start?

If not, is there anything I can do to help? I'm planning to cut a new
PDFBox
release soon.

Cheers
Andreas






Reply via email to