Bug#813562: Test suite failure
On Sun, Feb 21, 2016 at 12:46:52PM +, James R Barlow wrote: > Great news. 4.0.2 is ready now. Sweet. Package build in progress. > I did find while updating my Docker image that Debian stretch's > version of Ghostscript (gs 9.16~dfsg-2.1) produces error messages and > blank pages on JPEG 2000 images. It's fixed in Sid, but the fix hasn't > moved downstream yet. Thanks for letting me know -- I've added a dependency bound on the version in Sid, so that ocrmypdf won't migrate to stretch until ghostscript does. > Thanks again for doing this. No, thank you for your help with the process. I'm very grateful to be able to use ocrmypdf as part of my effort to work paperlessly. It works really well combined with Recoll desktop search. -- Sean Whitton signature.asc Description: PGP signature
Bug#813562: Test suite failure
Great news. 4.0.2 is ready now. I did find while updating my Docker image that Debian stretch's version of Ghostscript (gs 9.16~dfsg-2.1) produces error messages and blank pages on JPEG 2000 images. It's fixed in Sid, but the fix hasn't moved downstream yet. Thanks again for doing this. On Sat, 20 Feb 2016 at 17:05 Sean Whitton wrote: > Hello, > > On Sat, Feb 20, 2016 at 03:27:00AM +, James R Barlow wrote: > > Thanks for your help. Output order is due to multiprocessing. > > No problem. > > > That nailed it. tesseract 3.04.01 changed its output when asked to > > determine page orientation. It's an improved, but it breaks parsing. > > > > I will throw together a patch to make the appropriate distinctions. > > I thought you might appreciate knowing that version 4.0.2rc1 builds fine > in a clean Debian Sid chroot, and the test suite passes as part of the > package build. > > I'm looking forward to 4.0.2! (Release candidates are not generally > uploaded to Debian.) > > -- > Sean Whitton >
Bug#813562: Test suite failure
Hello, On Sat, Feb 20, 2016 at 03:27:00AM +, James R Barlow wrote: > Thanks for your help. Output order is due to multiprocessing. No problem. > That nailed it. tesseract 3.04.01 changed its output when asked to > determine page orientation. It's an improved, but it breaks parsing. > > I will throw together a patch to make the appropriate distinctions. I thought you might appreciate knowing that version 4.0.2rc1 builds fine in a clean Debian Sid chroot, and the test suite passes as part of the package build. I'm looking forward to 4.0.2! (Release candidates are not generally uploaded to Debian.) -- Sean Whitton signature.asc Description: PGP signature
Bug#813562: Test suite failure
Thanks for your help. Output order is due to multiprocessing. That nailed it. tesseract 3.04.01 changed its output when asked to determine page orientation. It's an improved, but it breaks parsing. I will throw together a patch to make the appropriate distinctions. $ tess-3.04.01 -psm 0 tests/resources/linn-west.jpg stdout Page number: 0 Orientation in degrees: 270 Rotate: 90 Orientation confidence: 29.34 Script: Latin Script confidence: 45.33 $ tess-3.04.00 -psm 0 tests/resources/linn-west.jpg stdout Orientation: 3 Orientation in degrees: 90 Orientation confidence: 29.34 Script: 1 Script confidence: 45.33 On Fri, Feb 19, 2016 at 16:28 Sean Whitton wrote: > Hello, > > On Fri, Feb 19, 2016 at 10:45:51PM +, James R Barlow wrote: > > In any case, could you try running this: > > ocrmypdf --rotate-pages tests/resources/cardinal.pdf out.pdf > > > > In cardinal.pdf the same page is rotated in each cardinal direction. > out.pdf > > should have all pages facing up. Is this the case? The output will also > give > > information on rotation status: > > INFO - 1: page is facing ⇧, confidence 18.69 > > INFO - 3: page is facing ⇩, confidence 21.86 - correcting rotation > > INFO - 4: page is facing ⇦, confidence 20.71 - correcting rotation > > INFO - 2: page is facing ⇨, confidence 21.63 - correcting rotation > > INFO - 3: rotating image layer 180 degrees > > INFO - 2: rotating image layer 90 degrees > > INFO - 4: rotating image layer 270 degrees > > No, it gets it wrong. Result attached, and the output: > > , > | root@artemis:/build/ocrmypdf-4.0.1# ocrmypdf --rotate-pages > tests/resources/cardinal.pdf out.pdf > | INFO -1: page is facing ⇧, confidence 18.69 > | INFO -2: page is facing ⇦, confidence 21.63 - correcting rotation > | INFO -3: page is facing ⇩, confidence 21.86 - correcting rotation > | INFO -4: page is facing ⇨, confidence 20.71 - correcting rotation > | INFO -2: rotating image layer 270 degrees > | INFO -3: rotating image layer 180 degrees > | INFO -4: rotating image layer 90 degrees > ` > > (note that the order it processes the pages in is different to your > example) > > > It would also help to try in python3: > > > > >>> import ocrmypdf.leptonica as lp > > >>> lp.getLeptonicaVersion() > > > > ...to see if there's anything unusual about how debian sid is reporting > the > > leptonica version. > > , > | root@artemis:/build/ocrmypdf-4.0.1# cd /usr/lib/python3/dist-packages > | root@artemis:/usr/lib/python3/dist-packages# python3 > | Python 3.5.1+ (default, Jan 13 2016, 15:09:18) > | [GCC 5.3.1 20160101] on linux > | Type "help", "copyright", "credits" or "license" for more information. > | >>> import ocrmypdf.leptonica as lp > | >>> lp.getLeptonicaVersion() > | 'leptonica-1.73' > ` > > -- > Sean Whitton >
Bug#813562: Test suite failure
I ran into a similar failure because leptonica 1.71 has an integer overflow bug in the function pixCorrelationBinary which I use only in the test suite to check if some output PDFs visually resemble an expected reference PDF. I rewrote that function in Python for the older versions. The relevant code is ocrmypdf.leptonica.Pix.correlation_binary. I added a test that only exercises pixCorrelationBinary (test_monochrome_correlation), and this one passed. I checked that the tests can pass in the Docker version (they are slightly broken for an unrelated reason), which is debian stretch which has leptonica 1.73 (good version) and the same set of libraries as yours. The one difference is tesseract 3.04.01 vs .00, but I compiled the tesseract 3.04.01 and found that made no difference. In any case, could you try running this: ocrmypdf --rotate-pages tests/resources/cardinal.pdf out.pdf In cardinal.pdf the same page is rotated in each cardinal direction. out.pdf should have all pages facing up. Is this the case? The output will also give information on rotation status: INFO - 1: page is facing ⇧, confidence 18.69 INFO - 3: page is facing ⇩, confidence 21.86 - correcting rotation INFO - 4: page is facing ⇦, confidence 20.71 - correcting rotation INFO - 2: page is facing ⇨, confidence 21.63 - correcting rotation INFO - 3: rotating image layer 180 degrees INFO - 2: rotating image layer 90 degrees INFO - 4: rotating image layer 270 degrees That would help establish whether something is actually wrong or the test case is somehow at fault. It would also help to try in python3: >>> import ocrmypdf.leptonica as lp >>> lp.getLeptonicaVersion() ...to see if there's anything unusual about how debian sid is reporting the leptonica version. On Fri, 19 Feb 2016 at 12:04 Sean Whitton wrote: > Hello, > > On Fri, Feb 19, 2016 at 07:11:32AM +, James R Barlow wrote: > > What version of leptonica is installed? > > tesseract --version will report this. > > From within my Sid chroot: > > root@artemis:/build/ocrmypdf-4.0.1# tesseract --version > tesseract 3.04.01 > leptonica-1.73 > libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 1.4.2) : libpng 1.2.54 : > libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0 > > > Also what's the file name for liblept? > > The Debian liblept package provides: > > /usr/lib/liblept.so.5 > /usr/lib/liblept.so.5.0.0 > > -- > Sean Whitton >
Bug#813562: Test suite failure
Hello, On Fri, Feb 19, 2016 at 07:11:32AM +, James R Barlow wrote: > What version of leptonica is installed? > tesseract --version will report this. From within my Sid chroot: root@artemis:/build/ocrmypdf-4.0.1# tesseract --version tesseract 3.04.01 leptonica-1.73 libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0 > Also what's the file name for liblept? The Debian liblept package provides: /usr/lib/liblept.so.5 /usr/lib/liblept.so.5.0.0 -- Sean Whitton signature.asc Description: PGP signature
Bug#813562: Test suite failure
I have seen a similar problem. What version of leptonica is installed? tesseract --version will report this. Also what's the file name for liblept? On Thu, Feb 18, 2016 at 21:29 Sean Whitton wrote: > Dear James, > > OCRmyPDF's test suite is currently failing under a freshly-installed > Debian Sid chroot. I've attached the output to this e-mail. > > Since the test suite worked on yesterday's version of Debian Sid, I > think that this must be due to a bug introduced in a new version of one > the dependencies. That means it's my job to figure out what the problem > is, and it is unlikely to be a bug in OCRmyPDF for you to fix. I'm > e-mailing you just in case the problem is obvious to you from reading > the output. > > Thanks. > > -- > Sean Whitton >
Bug#813562: Test suite failure
Dear James, OCRmyPDF's test suite is currently failing under a freshly-installed Debian Sid chroot. I've attached the output to this e-mail. Since the test suite worked on yesterday's version of Debian Sid, I think that this must be due to a bug introduced in a new version of one the dependencies. That means it's my job to figure out what the problem is, and it is unlikely to be a bug in OCRmyPDF for you to fix. I'm e-mailing you just in case the problem is obvious to you from reading the output. Thanks. -- Sean Whitton = test session starts == platform linux -- Python 3.4.4, pytest-2.8.7, py-1.4.31, pluggy-0.3.1 rootdir: /build/ocrmypdf-4.0.1, inifile: pytest.ini collected 44 items tests/test_hocrtransform.py . tests/test_main.py ...FF.. tests/test_pageinfo.py === FAILURES === test_autorotate[hocr] _ spoof_tesseract_cache = {'BIBINPUTS': '/home/swhitton/doc:/home/swhitton/doc/papers:', 'BROWSER': 'iceweasel', 'BUILDRESULTGID': '1000', 'BUILDRESULTUID': '1000', ...} renderer = 'hocr' @pytest.mark.parametrize('renderer', [ 'hocr', 'tesseract', ]) def test_autorotate(spoof_tesseract_cache, renderer): import ocrmypdf.ghostscript as ghostscript import logging gslog = logging.getLogger() # cardinal.pdf contains four copies of an image rotated in each cardinal # direction - these ones are "burned in" not tagged with /Rotate out = check_ocrmypdf('cardinal.pdf', 'test_autorotate_%s.pdf' % renderer, '-r', '-v', '1', env=spoof_tesseract_cache) for n in range(1, 4+1): correlation = check_monochrome_correlation( reference_pdf=_infile('cardinal.pdf'), reference_pageno=1, test_pdf=out, test_pageno=n) > assert correlation > 0.80 E assert 0.01808749884366989 > 0.8 tests/test_main.py:310: AssertionError - Captured stdout call - /build/ocrmypdf-4.0.1/.pybuild/pythonX.Y_3.4/build/tests/output/main/cardinal.pdf.ref0001.png /build/ocrmypdf-4.0.1/.pybuild/pythonX.Y_3.4/build/tests/output/main/cardinal.pdf.ref0001.png __ test_autorotate[tesseract] __ spoof_tesseract_cache = {'BIBINPUTS': '/home/swhitton/doc:/home/swhitton/doc/papers:', 'BROWSER': 'iceweasel', 'BUILDRESULTGID': '1000', 'BUILDRESULTUID': '1000', ...} renderer = 'tesseract' @pytest.mark.parametrize('renderer', [ 'hocr', 'tesseract', ]) def test_autorotate(spoof_tesseract_cache, renderer): import ocrmypdf.ghostscript as ghostscript import logging gslog = logging.getLogger() # cardinal.pdf contains four copies of an image rotated in each cardinal # direction - these ones are "burned in" not tagged with /Rotate out = check_ocrmypdf('cardinal.pdf', 'test_autorotate_%s.pdf' % renderer, '-r', '-v', '1', env=spoof_tesseract_cache) for n in range(1, 4+1): correlation = check_monochrome_correlation( reference_pdf=_infile('cardinal.pdf'), reference_pageno=1, test_pdf=out, test_pageno=n) > assert correlation > 0.80 E assert 0.01808749884366989 > 0.8 tests/test_main.py:310: AssertionError - Captured stdout call - /build/ocrmypdf-4.0.1/.pybuild/pythonX.Y_3.4/build/tests/output/main/cardinal.pdf.ref0001.png /build/ocrmypdf-4.0.1/.pybuild/pythonX.Y_3.4/build/tests/output/main/cardinal.pdf.ref0001.png 2 failed, 42 passed in 667.14 seconds = signature.asc Description: PGP signature