Bug#888917: ocrmypdf fails to run it's testsuite

James R Barlow Thu, 15 Feb 2018 13:12:53 -0800

Tesseract 4 is now in Debian unstable. When running on a processor that
lacks the AVX2 extensions (added at the Intel Haswell microarch, around
2013), it falls back on a slower version in SSE or something, which is so
much slower that it regularly hits the timeout. (Some of these failures are
less than graceful and I will fix that.)

The reason I think this is the case is that ci-worker[01,02].debian.net and
Sean's laptop consistently fail, and they fail on different tests at
different times all by hitting timeouts. My 2013 desktop which has a
Haswell processor and Matthias' new laptop are fine.

For the CI workers I looked over every pass and failure back to Jan 30, and
every test log that fails had worker 01 or 02. It wouldn't be surprising
for the lowest numbered boxes to be the oldest ones.
https://ci.debian.net/packages/o/ocrmypdf/unstable/amd64/

To confirm I compiled a version of Tesseract 4 with AVX2 disabled and using
the "best quality" training set. Results were as follows (ratios being
relevant).

Tesseract 4, AVX2, best quality training data: 5s
Tesseract 4, AVX2 disabled, best quality training data: 32s
Tesseract 4, AVX2 disabled, fast training data: 10s
Tesseract 3.05: 4s

So I will need to fix this because the test suite should be consistent even
if Tesseract isn't. I'll revise how the existing test cache works so that I
can ship precalculated OCR files with it.

On Mon, 12 Feb 2018 at 18:24 Sean Whitton <spwhit...@spwhitton.name> wrote:

> control: retitle -1 Test suite failures
>
> Hello James,
>
> On Fri, Feb 02 2018, James R. Barlow wrote:
>
> > Do you think you could take a few minutes to identify which test is
> > taking this long and report it? This may be an upstream bug, if some
> > input triggers an infinite loop.
>
> I ran the test suite on one of Debian's machines, in an up-to-date
> Debian unstable chroot.  It took 100 minutes and there were many
> failures.  Some of the test failed due to timeouts, and some of them
> failed for other reasons.  I'm attaching the full log.
>
> I see you have released 5.6.0, but from the release notes it seems
> likely there would be the same failures.
>
> Please let me know if you still need me to run individual tests and see
> how long they take.
>
> > I have my suspicions. My guess is that:
> >
> >     pytest tests/test_qpdf.py # will never finish
> >
> > and
> >
> >     pytest -n0 tests/test_qpdf.py # will fail in 15 seconds
> >
> > If so, you might have qpdf < 7.0.0 and upgrading to qpdf >= 7.0.0 will
> > fix it.
>
> We have qpdf 7.1.1 in Debian unstable right now, so this can't be it.
>
> --
> Sean Whitton
>

Bug#888917: ocrmypdf fails to run it's testsuite

Reply via email to