Bug#1009680: ghostscript breaks ocrmypdf autopkgtest: seemingly multiple issues

Paul Gevers Thu, 14 Apr 2022 02:15:20 -0700

Source: ghostscript, ocrmypdf
Control: found -1 ghostscript/9.56.0~dfsg-1
Control: found -1 ocrmypdf/13.4.0+dfsg-1
Severity: serious
Tags: sid bookworm
User: debian...@lists.debian.org
Usertags: breaks needs-update


Dear maintainer(s),

With a recent upload of ghostscript the autopkgtest of ocrmypdf fails in testing when that autopkgtest is run with the binary packages of ghostscript from unstable. It passes when run with only packages from testing. In tabular form:


                       pass            fail
ghostscript            from testing    9.56.0~dfsg-1
ocrmypdf               from testing    13.4.0+dfsg-1
all others             from testing    from testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of ghostscript to testing [1]. Due to the nature of this issue, I filed this bug report against both packages. Can you please investigate the situation and reassign the bug to the right package?


More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[1] https://qa.debian.org/excuses.php?package=ghostscript

https://ci.debian.net/data/autopkgtest/testing/amd64/o/ocrmypdf/20818050/log.gz

=================================== FAILURES =================================== ________________________________ test_force_ocr ________________________________

resources = PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')

outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_force_ocr0/out.pdf')

    def test_force_ocr(resources, outpdf):
        out = check_ocrmypdf(
            resources / 'graph_ocred.pdf',
            outpdf,
            '-f',
            '--plugin',
            'tests/plugins/tesseract_cache.py',
        )
        pdfinfo = PdfInfo(out)

      assert pdfinfo[0].has_text

E       assert False

E + where False = <PageInfo pageno=0 7.573333333333333333333333333"x6.16" rotation=0 dpi=400.000000x400.000000 has_text=False>.has_text


tests/test_main.py:83: AssertionError

----------------------------- Captured stderr call -----------------------------


Scanning contents:   0%|          | 0/1 [00:00<?, ?page/s]
Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 62.30page/s]

OCR:   0%|          | 0.0/1.0 [00:00<?, ?page/s]
OCR:  50%|█████     | 0.5/1.0 [00:02<00:02,  5.47s/page]
OCR: 100%|██████████| 1.0/1.0 [00:02<00:00,  2.75s/page]

PDF/A conversion:   0%|          | 0/1 [00:00<?, ?page/s]

Recompressing JPEGs: 0image [00:00, ?image/s][A
Recompressing JPEGs: 0image [00:00, ?image/s]


Deflating JPEGs:   0%|          | 0/1 [00:00<?, ?image/s][A
Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 74.34image/s]


JBIG2: 0item [00:00, ?item/s][A
JBIG2: 0item [00:00, ?item/s]

------------------------------ Captured log call ------------------------------- INFO ocrmypdf._pipeline:_pipeline.py:275 page already has text! - rasterizing text and running OCR anyway

INFO     ocrmypdf._sync:_sync.py:301 Postprocessing...

WARNING ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata. INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.52 savings: 34.1%

INFO     ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)

WARNING ocrmypdf._validation:_validation.py:381 The output file size is 2.45× larger than the input file.

Possible reasons for this include:
The argument --force-ocr was issued, causing transcoding.

The optional dependency 'jbig2' was not found, so some image optimizations could not be attempted.

PDF/A conversion was enabled. (Try `--output-type pdf`.)
Plugins were used.

--------------------------- Captured stderr teardown ---------------------------


PDF/A conversion: 100%|██████████| 1/1 [00:01<00:00,  1.20s/page]

________________________________ test_skip_ocr _________________________________

resources = PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')

outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_skip_ocr0/out.pdf')

    def test_skip_ocr(resources, outpdf):
        out = check_ocrmypdf(
            resources / 'graph_ocred.pdf',
            outpdf,
            '-s',
            '--plugin',
            'tests/plugins/tesseract_cache.py',
        )
        pdfinfo = PdfInfo(out)

      assert pdfinfo[0].has_text

E       assert False

E + where False = <PageInfo pageno=0 7.573333333333333333333333333"x6.16" rotation=0 dpi=150.000000x150.000000 has_text=False>.has_text


tests/test_main.py:95: AssertionError

----------------------------- Captured stderr call -----------------------------


Scanning contents:   0%|          | 0/1 [00:00<?, ?page/s]
Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 70.71page/s]

OCR:   0%|          | 0.0/1.0 [00:00<?, ?page/s]
OCR: 100%|██████████| 1.0/1.0 [00:00<00:00, 47.12page/s]

PDF/A conversion:   0%|          | 0/1 [00:00<?, ?page/s]

Recompressing JPEGs: 0image [00:00, ?image/s][A
Recompressing JPEGs: 0image [00:00, ?image/s]


Deflating JPEGs:   0%|          | 0/1 [00:00<?, ?image/s][A
Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 235.24image/s]


JBIG2: 0item [00:00, ?item/s][A
JBIG2: 0item [00:00, ?item/s]

------------------------------ Captured log call ------------------------------- INFO ocrmypdf._pipeline:_pipeline.py:287 skipping all processing on this page

INFO     ocrmypdf._sync:_sync.py:301 Postprocessing...

INFO     ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)

--------------------------- Captured stderr teardown ---------------------------


PDF/A conversion: 100%|██████████| 1/1 [00:00<00:00,  4.16page/s]

________________________________ test_redo_ocr _________________________________

resources = PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')

outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_redo_ocr0/out.pdf')

    def test_redo_ocr(resources, outpdf):
        in_ = resources / 'graph_ocred.pdf'
        before = PdfInfo(in_, detailed_analysis=True)
        out = outpdf
        out = check_ocrmypdf(in_, out, '--redo-ocr')
        after = PdfInfo(out, detailed_analysis=True)

      assert before[0].has_text and after[0].has_text

E       assert (True and False)

E + where True = <PageInfo pageno=0 7.573333333333333333333333333"x6.16" rotation=0 dpi=150.000000x150.000000 has_text=True>.has_text E + and False = <PageInfo pageno=0 7.573333333333333333333333333"x6.16" rotation=0 dpi=150.000000x150.000000 has_text=False>.has_text


tests/test_main.py:104: AssertionError

----------------------------- Captured stderr call -----------------------------


Scanning contents:   0%|          | 0/1 [00:00<?, ?page/s]
Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 20.63page/s]

OCR:   0%|          | 0.0/1.0 [00:00<?, ?page/s]
OCR:  50%|█████     | 0.5/1.0 [00:04<00:04,  8.64s/page]
OCR: 100%|██████████| 1.0/1.0 [00:04<00:00,  4.35s/page]

PDF/A conversion:   0%|          | 0/1 [00:00<?, ?page/s]

Recompressing JPEGs: 0image [00:00, ?image/s][A
Recompressing JPEGs: 0image [00:00, ?image/s]


Deflating JPEGs:   0%|          | 0/1 [00:00<?, ?image/s][A
Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 254.88image/s]


JBIG2: 0item [00:00, ?item/s][A
JBIG2: 0item [00:00, ?item/s]

------------------------------ Captured log call -------------------------------

INFO     ocrmypdf._pipeline:_pipeline.py:284 redoing OCR
INFO     ocrmypdf._sync:_sync.py:301 Postprocessing...

ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 GPL Ghostscript 9.56.0 (2022-03-29)

Copyright (C) 2022 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1

The following warnings were encountered at least once while processing this file:

        number uses illegal exponent form

ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 This file had errors that were repaired or ignored. ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 The file was produced by: ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 >>>> GPL Ghostscript 9.15 <<<< ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 Please notify the author of the software that produced this ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 file that it does not conform to Adobe's published PDF

   ERROR    ocrmypdf._exec.ghostscript:ghostscript.py:277  specification.

INFO     ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)

--------------------------- Captured stderr teardown ---------------------------


PDF/A conversion: 100%|██████████| 1/1 [00:00<00:00,  3.91page/s]

=========================== short test summary info ============================

FAILED tests/test_main.py::test_force_ocr - assert False
FAILED tests/test_main.py::test_skip_ocr - assert False
FAILED tests/test_main.py::test_redo_ocr - assert (True and False)

======= 3 failed, 274 passed, 37 skipped, 4 xfailed in 397.41s (0:06:37) =======

autopkgtest [08:17:33]: test test-suite

OpenPGP_signature
Description: OpenPGP digital signature

Bug#1009680: ghostscript breaks ocrmypdf autopkgtest: seemingly multiple issues

Reply via email to