Bug#888917: ocrmypdf fails to run it's testsuite
v6.0.0 should fix this issue, as it includes a cache that allows most OCR to be skipped.
Bug#888917: ocrmypdf fails to run it's testsuite
control: tag -1 +upstream Hello James, That's some impressive detective work. Thank you for taking the time to write up your conclusions. I assume you don't need an upstream bug report, but let me know if you want me to file one. -- Sean Whitton signature.asc Description: PGP signature
Bug#888917: ocrmypdf fails to run it's testsuite
Tesseract 4 is now in Debian unstable. When running on a processor that lacks the AVX2 extensions (added at the Intel Haswell microarch, around 2013), it falls back on a slower version in SSE or something, which is so much slower that it regularly hits the timeout. (Some of these failures are less than graceful and I will fix that.) The reason I think this is the case is that ci-worker[01,02].debian.net and Sean's laptop consistently fail, and they fail on different tests at different times all by hitting timeouts. My 2013 desktop which has a Haswell processor and Matthias' new laptop are fine. For the CI workers I looked over every pass and failure back to Jan 30, and every test log that fails had worker 01 or 02. It wouldn't be surprising for the lowest numbered boxes to be the oldest ones. https://ci.debian.net/packages/o/ocrmypdf/unstable/amd64/ To confirm I compiled a version of Tesseract 4 with AVX2 disabled and using the "best quality" training set. Results were as follows (ratios being relevant). Tesseract 4, AVX2, best quality training data: 5s Tesseract 4, AVX2 disabled, best quality training data: 32s Tesseract 4, AVX2 disabled, fast training data: 10s Tesseract 3.05: 4s So I will need to fix this because the test suite should be consistent even if Tesseract isn't. I'll revise how the existing test cache works so that I can ship precalculated OCR files with it. On Mon, 12 Feb 2018 at 18:24 Sean Whittonwrote: > control: retitle -1 Test suite failures > > Hello James, > > On Fri, Feb 02 2018, James R. Barlow wrote: > > > Do you think you could take a few minutes to identify which test is > > taking this long and report it? This may be an upstream bug, if some > > input triggers an infinite loop. > > I ran the test suite on one of Debian's machines, in an up-to-date > Debian unstable chroot. It took 100 minutes and there were many > failures. Some of the test failed due to timeouts, and some of them > failed for other reasons. I'm attaching the full log. > > I see you have released 5.6.0, but from the release notes it seems > likely there would be the same failures. > > Please let me know if you still need me to run individual tests and see > how long they take. > > > I have my suspicions. My guess is that: > > > > pytest tests/test_qpdf.py # will never finish > > > > and > > > > pytest -n0 tests/test_qpdf.py # will fail in 15 seconds > > > > If so, you might have qpdf < 7.0.0 and upgrading to qpdf >= 7.0.0 will > > fix it. > > We have qpdf 7.1.1 in Debian unstable right now, so this can't be it. > > -- > Sean Whitton >
Bug#888917: ocrmypdf fails to run it's testsuite
control: retitle -1 Test suite failures Hello James, On Fri, Feb 02 2018, James R. Barlow wrote: > Do you think you could take a few minutes to identify which test is > taking this long and report it? This may be an upstream bug, if some > input triggers an infinite loop. I ran the test suite on one of Debian's machines, in an up-to-date Debian unstable chroot. It took 100 minutes and there were many failures. Some of the test failed due to timeouts, and some of them failed for other reasons. I'm attaching the full log. I see you have released 5.6.0, but from the release notes it seems likely there would be the same failures. Please let me know if you still need me to run individual tests and see how long they take. > I have my suspicions. My guess is that: > > pytest tests/test_qpdf.py # will never finish > > and > > pytest -n0 tests/test_qpdf.py # will fail in 15 seconds > > If so, you might have qpdf < 7.0.0 and upgrading to qpdf >= 7.0.0 will > fix it. We have qpdf 7.1.1 in Debian unstable right now, so this can't be it. -- Sean Whitton ocrmypdf_5.5_tests.log Description: Binary data signature.asc Description: PGP signature
Bug#888917: ocrmypdf fails to run it's testsuite
Hello, On Fri, Feb 02 2018, James R Barlow wrote: > Do you think you could take a few minutes to identify which test is > taking this long and report it? This may be an upstream bug, if some > input triggers an infinite loop. > > I have my suspicions. [...] Thanks for the hints. I will attempt to reproduce the hang at some point in the next few weeks, and report back, with an upstream bug report if I can pinpoint the issue. -- Sean Whitton signature.asc Description: PGP signature
Bug#888917: ocrmypdf fails to run it's testsuite
control: tag -1 -moreinfo Hello, On Thu, Feb 01 2018, Matthias Klose wrote: >> 4) Your implicit comment that I lied in the changelog and disabled >> the test suite because I knew it would fail is entirely uncalled for. >> Please do not treat fellow package maintainers like that. > > well, looking at the changelog is the way to see what is changed, and > sometimes why. I didn't see that in 2). But yes, I was feeling that > you didn't tell everything. I'll add some more text to the 5.5-2 changelog entry in the next upload. > Now reduced the severity. Feel free to close it if you think that any > tests during the build is not necessary. I'll attempt to reproduce the various failures you report in your most recent message on a porterbox. I'll keep the bug open until then, at least. -- Sean Whitton signature.asc Description: PGP signature
Bug#888917: ocrmypdf fails to run it's testsuite
Hello Sean, On Wed, 31 Jan 2018 22:06:42 -0700 Sean Whittonwrote: > I further suspect that the test suite took 30 seconds only because so > many tests failed early. In recent upstream versions, the test suite > has never finished running on my laptop after leaving it for multiple > hours. When you run the test suite on a totally ordinary file system, > please report how long it takes, and whether your laptop is very > new/high spec. Do you think you could take a few minutes to identify which test is taking this long and report it? This may be an upstream bug, if some input triggers an infinite loop. I have my suspicions. My guess is that: pytest tests/test_qpdf.py # will never finish and pytest -n0 tests/test_qpdf.py # will fail in 15 seconds If so, you might have qpdf < 7.0.0 and upgrading to qpdf >= 7.0.0 will fix it. But I'd appreciate if you can confirm. Thanks.
Bug#888917: ocrmypdf fails to run it's testsuite
Control: severity -1 important On 01.02.2018 06:06, Sean Whitton wrote: > control: tag -1 +moreinfo > > Dear Matthias, > > On Wed, Jan 31 2018, Matthias Klose wrote: > >> The recent changelog reads: >> >> * Disable test suite at package build time. >> It now takes a prohibitively long time to run, so we are relying on >> autopkgtest instead. >> >> Sorry, but this is one of the most lame excuses I have ever seen. Trying to >> run >> it on my laptop in unstable needs 30 seconds. However re-enabling it and >> running it reveals >> >> === 122 failed, 24 passed, 4 skipped in 33.92 seconds >> >> these results are after adding tesseract-ocr qpdf unpaper as build >> dependencies. > > Looking at the errors, I strongly suspect that this is because you are > running the test suite on a tmpfs -- we have seen these permission > errors before under those conditions. Could you try running the test > suite on a totally ordinary file system, please? I tried running these in a chroot created with debootstrap, and manually entering the chroot. [sid] type=directory description=debian (sid) directory=/srv/chroot/sid users=doko groups=sbuild so this should be on an ordinary file system? unless the testsuite uses /tmp mounted on a /tmpfs? > I further suspect that the test suite took 30 seconds only because so > many tests failed early. In recent upstream versions, the test suite > has never finished running on my laptop after leaving it for multiple > hours. When you run the test suite on a totally ordinary file system, > please report how long it takes, and whether your laptop is very > new/high spec. well, it's a new one, two cores. > I note that Policy does not require that a package be buildable under a > tmpfs, and certainly does not require that its test suite run under a > tmpfs. > >> doubting that the primary reason for this change was build time ... > > Several things: > > 1) I ran the test suite using deb-o-matic[1] before uploading. Needless >to say, I would not have uploaded had there been failures. which only runs on amd64 afaik. > 2) I should have mentioned in the changelog that another reason for this >change was to reduce the number of heavy build dependencies. > >A further reason is that it reduces the amount of fragile code in >d/rules needed to get the test suite running -- upstream's test suite >is designed to be run on the installed package. > > 3) I am of the view that very heavy test suites are better run under >autopkgtest. We will soon have testing migration gating on >autopkgtest, and it is not clear to me that it makes sense for the >process of stitching the .deb to abort when a single integration test >fails. I'm not sure about that. I dislike packages like the whole KDE which disables testing during the build and then rebuilds and runs tests in the autopkg tests. It doesn't hinder broken packages into the archive. >(Ideally tests would be separated into those that should abort the >build and those that should not, but in the absence of this work >being done, it is reasonable not to run any of them.) I'm a bit biased here, because I saw the autopkg test failures first in launchpad: http://autopkgtest.ubuntu.com/packages/o/ocrmypdf/bionic/amd64 https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/amd64/o/ocrmypdf/20180130_155249_1faa5@/log.gz but yes, there seem to be less failures > 4) Your implicit comment that I lied in the changelog and disabled the >test suite because I knew it would fail is entirely uncalled for. >Please do not treat fellow package maintainers like that. well, looking at the changelog is the way to see what is changed, and sometimes why. I didn't see that in 2). But yes, I was feeling that you didn't tell everything. Now reduced the severity. Feel free to close it if you think that any tests during the build is not necessary.
Bug#888917: ocrmypdf fails to run it's testsuite
Hello James, On Wed, Jan 31 2018, James R Barlow wrote: > Upstream here. Thanks for the info. > The reason the suite fails like that is that mandatory-for-testing > dependencies were also removed. > > The test suite runs on Travis CI in 10-12 minutes. On Debian CI, 15 > minutes. For comparison ffmpeg, another compute intensive CLI program, > takes 10 minutes. > > This is an OCR program and OCR takes a long time. There are > opportunities to speed up testing on my end but no low hanging fruit > without removing tests. I've done the obvious: use all cores, use > caches and dummies where possible. Some OCR on the fly is essential > because Tesseract is complex enough that output is not identical > across platforms. > > Preserving the dynamically created tests/cache/ folder between test > runs, if possible in Debian CI, would speed it up a lot. Unfortunately not possible. > I could mark a subset of essential tests for packagers so that Debian > CI can specify it only wants those. There's a number of tests that are > very unlikely to pass upstream testing (macOS and Ubuntu) then somehow > fail downstream in Debian. Just to be clear, this bug is about the tests run during the package build, which is completely independent of Debian CI (in our terminology, "autopkgtest" refers to Debian CI). -- Sean Whitton signature.asc Description: PGP signature
Bug#888917: ocrmypdf fails to run it's testsuite
control: tag -1 +moreinfo Dear Matthias, On Wed, Jan 31 2018, Matthias Klose wrote: > The recent changelog reads: > > * Disable test suite at package build time. > It now takes a prohibitively long time to run, so we are relying on > autopkgtest instead. > > Sorry, but this is one of the most lame excuses I have ever seen. Trying to > run > it on my laptop in unstable needs 30 seconds. However re-enabling it and > running it reveals > > === 122 failed, 24 passed, 4 skipped in 33.92 seconds > > these results are after adding tesseract-ocr qpdf unpaper as build > dependencies. Looking at the errors, I strongly suspect that this is because you are running the test suite on a tmpfs -- we have seen these permission errors before under those conditions. Could you try running the test suite on a totally ordinary file system, please? I further suspect that the test suite took 30 seconds only because so many tests failed early. In recent upstream versions, the test suite has never finished running on my laptop after leaving it for multiple hours. When you run the test suite on a totally ordinary file system, please report how long it takes, and whether your laptop is very new/high spec. I note that Policy does not require that a package be buildable under a tmpfs, and certainly does not require that its test suite run under a tmpfs. > doubting that the primary reason for this change was build time ... Several things: 1) I ran the test suite using deb-o-matic[1] before uploading. Needless to say, I would not have uploaded had there been failures. 2) I should have mentioned in the changelog that another reason for this change was to reduce the number of heavy build dependencies. A further reason is that it reduces the amount of fragile code in d/rules needed to get the test suite running -- upstream's test suite is designed to be run on the installed package. 3) I am of the view that very heavy test suites are better run under autopkgtest. We will soon have testing migration gating on autopkgtest, and it is not clear to me that it makes sense for the process of stitching the .deb to abort when a single integration test fails. (Ideally tests would be separated into those that should abort the build and those that should not, but in the absence of this work being done, it is reasonable not to run any of them.) 4) Your implicit comment that I lied in the changelog and disabled the test suite because I knew it would fail is entirely uncalled for. Please do not treat fellow package maintainers like that. [1] http://debomatic-amd64.debian.net/ -- Sean Whitton signature.asc Description: PGP signature
Bug#888917: ocrmypdf fails to run it's testsuite
Upstream here. The reason the suite fails like that is that mandatory-for-testing dependencies were also removed. The test suite runs on Travis CI in 10-12 minutes. On Debian CI, 15 minutes. For comparison ffmpeg, another compute intensive CLI program, takes 10 minutes. This is an OCR program and OCR takes a long time. There are opportunities to speed up testing on my end but no low hanging fruit without removing tests. I've done the obvious: use all cores, use caches and dummies where possible. Some OCR on the fly is essential because Tesseract is complex enough that output is not identical across platforms. Preserving the dynamically created tests/cache/ folder between test runs, if possible in Debian CI, would speed it up a lot. I could mark a subset of essential tests for packagers so that Debian CI can specify it only wants those. There's a number of tests that are very unlikely to pass upstream testing (macOS and Ubuntu) then somehow fail downstream in Debian.
Bug#888917: ocrmypdf fails to run it's testsuite
Package: src:ocrmypdf Version: 5.5-2 Severity: serious Tags: sid buster The recent changelog reads: * Disable test suite at package build time. It now takes a prohibitively long time to run, so we are relying on autopkgtest instead. Sorry, but this is one of the most lame excuses I have ever seen. Trying to run it on my laptop in unstable needs 30 seconds. However re-enabling it and running it reveals === 122 failed, 24 passed, 4 skipped in 33.92 seconds these results are after adding tesseract-ocr qpdf unpaper as build dependencies. doubting that the primary reason for this change was build time ... dpkg-buildpackage: info: source package ocrmypdf dpkg-buildpackage: info: source version 5.5-2 dpkg-buildpackage: info: source distribution unstable dpkg-buildpackage: info: source changed by Sean Whittondpkg-source --before-build ocrmypdf-5.5 dpkg-buildpackage: info: host architecture amd64 dpkg-source: info: using options from ocrmypdf-5.5/debian/source/options: --single-debian-patch --auto-commit --extend-diff-ignore=\.git_archival\.txt fakeroot debian/rules clean dh clean --with python3,sphinxdoc --buildsystem=pybuild dh_auto_clean -O--buildsystem=pybuild I: pybuild base:184: python3.6 setup.py clean Skipping external program tests because of --force running clean removing '/home/packages/tmp/ocrmypdf-5.5/.pybuild/pythonX.Y_3.6/build' (and everything under it) 'build/bdist.linux-amd64' does not exist -- can't clean it 'build/scripts-3.6' does not exist -- can't clean it dh_clean -O--buildsystem=pybuild debian/rules build dh build --with python3,sphinxdoc --buildsystem=pybuild dh_update_autotools_config -O--buildsystem=pybuild dh_autoreconf -O--buildsystem=pybuild dh_auto_configure -O--buildsystem=pybuild I: pybuild base:184: python3.6 setup.py config Skipping external program tests because of --force running config debian/rules override_dh_auto_build make[1]: Entering directory '/home/packages/tmp/ocrmypdf-5.5' mkdir -p debian/.debhelper cp -R ocrmypdf debian/.debhelper sed -i debian/.debhelper/ocrmypdf/__init__.py -e \ "s|^__version__ =.*|__version__ = \"5.5\"|" PYTHONPATH=debian/.debhelper sphinx-build docs html Running Sphinx v1.6.6 making output directory... loading pickled environment... not yet created building [mo]: targets for 0 po files that are out of date building [html]: targets for 10 source files that are out of date updating environment: 10 added, 0 changed, 0 removed reading sources... [ 10%] advanced reading sources... [ 20%] batch reading sources... [ 30%] cookbook reading sources... [ 40%] errors reading sources... [ 50%] index reading sources... [ 60%] installation reading sources... [ 70%] introduction reading sources... [ 80%] languages reading sources... [ 90%] release_notes reading sources... [100%] security /home/packages/tmp/ocrmypdf-5.5/docs/installation.rst:2: WARNING: Duplicate explicit target name: "docker". /home/packages/tmp/ocrmypdf-5.5/docs/introduction.rst:108: WARNING: Unknown target name: "using ocrmypdf online". looking for now-outdated files... none found pickling environment... done checking consistency... /home/packages/tmp/ocrmypdf-5.5/docs/installation.rst: WARNING: document isn't included in any toctree done preparing documents... done writing output... [ 10%] advanced writing output... [ 20%] batch writing output... [ 30%] cookbook writing output... [ 40%] errors writing output... [ 50%] index writing output... [ 60%] installation writing output... [ 70%] introduction writing output... [ 80%] languages writing output... [ 90%] release_notes writing output... [100%] security generating indices... genindex writing additional pages... search copying images... [100%] bitmap_vs_svg.svg copying static files... WARNING: html_static_path entry '/home/packages/tmp/ocrmypdf-5.5/docs/_static' does not exist done copying extra files... done dumping search index in English (code: en) ... done dumping object inventory... done build succeeded, 4 warnings. dh_auto_build -O--buildsystem=pybuild I: pybuild base:184: /usr/bin/python3 setup.py build Skipping external program tests because of --force running build running build_py creating /home/packages/tmp/ocrmypdf-5.5/.pybuild/pythonX.Y_3.6/build/ocrmypdf copying ocrmypdf/_unicodefun.py -> /home/packages/tmp/ocrmypdf-5.5/.pybuild/pythonX.Y_3.6/build/ocrmypdf copying ocrmypdf/__main__.py -> /home/packages/tmp/ocrmypdf-5.5/.pybuild/pythonX.Y_3.6/build/ocrmypdf copying ocrmypdf/pdfa.py -> /home/packages/tmp/ocrmypdf-5.5/.pybuild/pythonX.Y_3.6/build/ocrmypdf copying ocrmypdf/leptonica.py -> /home/packages/tmp/ocrmypdf-5.5/.pybuild/pythonX.Y_3.6/build/ocrmypdf copying ocrmypdf/__init__.py -> /home/packages/tmp/ocrmypdf-5.5/.pybuild/pythonX.Y_3.6/build/ocrmypdf copying ocrmypdf/hocrtransform.py -> /home/packages/tmp/ocrmypdf-5.5/.pybuild/pythonX.Y_3.6/build/ocrmypdf copying ocrmypdf/helpers.py ->