Am 28.07.20 um 23:51 schrieb Tim Allison:
Reports are here:
https://corpora.tika.apache.org/base/reports/pdfbox-2.0.21-SNAPSHOT.tgz
Looks like extraction improved slightly. I found a bug at the Tika level
that is creating a few more exceptions (will fix soon), but this is not a
problem for PDFBox.
I was able to turn back on our unit test that counted characters and
non-unicode mapped characters.
I'll look a bit tomorrow, but this looks good to me.
Do you some time to rerun those tests using the latest SNAPSHOT?
Andreas
Again, many thanks to Maruan! The processing speeds were, um, much, much
faster.
Best,
Tim
On Tue, Jul 28, 2020 at 10:56 AM Andreas Lehmkuehler <andr...@lehmi.de>
wrote:
Yes, please
Thanks in advance!
Am 28.07.20 um 12:45 schrieb Tim Allison:
Y. I can run these today
On Tue, Jul 28, 2020 at 2:58 AM Andreas Lehmkuehler <andr...@lehmi.de>
wrote:
Hi,
is there any chance to run the PDFBox regression tests (2.0.20 vs.
SNAPSHOT) on
our new box? Does anyone had the cycles to prepare something ready to
start?
If not, is there anything I can do to help? I'm planning to cut a new
PDFBox
release soon.
Cheers
Andreas