--- Begin Message ---
Source: ghostscript, ocrmypdf
Control: found -1 ghostscript/9.56.0~dfsg-1
Control: found -1 ocrmypdf/13.4.0+dfsg-1
Severity: serious
Tags: sid bookworm
User: debian...@lists.debian.org
Usertags: breaks needs-update
Dear maintainer(s),
With a recent upload of ghostscript the autopkgtest of ocrmypdf fails in
testing when that autopkgtest is run with the binary packages of
ghostscript from unstable. It passes when run with only packages from
testing. In tabular form:
pass fail
ghostscript from testing 9.56.0~dfsg-1
ocrmypdf from testing 13.4.0+dfsg-1
all others from testing from testing
I copied some of the output at the bottom of this report.
Currently this regression is blocking the migration of ghostscript to
testing [1]. Due to the nature of this issue, I filed this bug report
against both packages. Can you please investigate the situation and
reassign the bug to the right package?
More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation
Paul
[1] https://qa.debian.org/excuses.php?package=ghostscript
https://ci.debian.net/data/autopkgtest/testing/amd64/o/ocrmypdf/20818050/log.gz
=================================== FAILURES
===================================
________________________________ test_force_ocr
________________________________
resources =
PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_force_ocr0/out.pdf')
def test_force_ocr(resources, outpdf):
out = check_ocrmypdf(
resources / 'graph_ocred.pdf',
outpdf,
'-f',
'--plugin',
'tests/plugins/tesseract_cache.py',
)
pdfinfo = PdfInfo(out)
assert pdfinfo[0].has_text
E assert False
E + where False = <PageInfo pageno=0
7.573333333333333333333333333"x6.16" rotation=0
dpi=400.000000x400.000000 has_text=False>.has_text
tests/test_main.py:83: AssertionError
----------------------------- Captured stderr call
-----------------------------
Scanning contents: 0%| | 0/1 [00:00<?, ?page/s]
Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 62.30page/s]
OCR: 0%| | 0.0/1.0 [00:00<?, ?page/s]
OCR: 50%|█████ | 0.5/1.0 [00:02<00:02, 5.47s/page]
OCR: 100%|██████████| 1.0/1.0 [00:02<00:00, 2.75s/page]
PDF/A conversion: 0%| | 0/1 [00:00<?, ?page/s]
Recompressing JPEGs: 0image [00:00, ?image/s][A
Recompressing JPEGs: 0image [00:00, ?image/s]
Deflating JPEGs: 0%| | 0/1 [00:00<?, ?image/s][A
Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 74.34image/s]
JBIG2: 0item [00:00, ?item/s][A
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call
-------------------------------
INFO ocrmypdf._pipeline:_pipeline.py:275 page already has text! -
rasterizing text and running OCR anyway
INFO ocrmypdf._sync:_sync.py:301 Postprocessing...
WARNING ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could
not be copied because it is not permitted in PDF/A. You may wish to
examine the output PDF's XMP metadata.
INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.52 savings:
34.1%
INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)
WARNING ocrmypdf._validation:_validation.py:381 The output file size is
2.45× larger than the input file.
Possible reasons for this include:
The argument --force-ocr was issued, causing transcoding.
The optional dependency 'jbig2' was not found, so some image
optimizations could not be attempted.
PDF/A conversion was enabled. (Try `--output-type pdf`.)
Plugins were used.
--------------------------- Captured stderr teardown
---------------------------
PDF/A conversion: 100%|██████████| 1/1 [00:01<00:00, 1.20s/page]
________________________________ test_skip_ocr
_________________________________
resources =
PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_skip_ocr0/out.pdf')
def test_skip_ocr(resources, outpdf):
out = check_ocrmypdf(
resources / 'graph_ocred.pdf',
outpdf,
'-s',
'--plugin',
'tests/plugins/tesseract_cache.py',
)
pdfinfo = PdfInfo(out)
assert pdfinfo[0].has_text
E assert False
E + where False = <PageInfo pageno=0
7.573333333333333333333333333"x6.16" rotation=0
dpi=150.000000x150.000000 has_text=False>.has_text
tests/test_main.py:95: AssertionError
----------------------------- Captured stderr call
-----------------------------
Scanning contents: 0%| | 0/1 [00:00<?, ?page/s]
Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 70.71page/s]
OCR: 0%| | 0.0/1.0 [00:00<?, ?page/s]
OCR: 100%|██████████| 1.0/1.0 [00:00<00:00, 47.12page/s]
PDF/A conversion: 0%| | 0/1 [00:00<?, ?page/s]
Recompressing JPEGs: 0image [00:00, ?image/s][A
Recompressing JPEGs: 0image [00:00, ?image/s]
Deflating JPEGs: 0%| | 0/1 [00:00<?, ?image/s][A
Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 235.24image/s]
JBIG2: 0item [00:00, ?item/s][A
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call
-------------------------------
INFO ocrmypdf._pipeline:_pipeline.py:287 skipping all processing on
this page
INFO ocrmypdf._sync:_sync.py:301 Postprocessing...
WARNING ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could
not be copied because it is not permitted in PDF/A. You may wish to
examine the output PDF's XMP metadata.
INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.14 savings:
12.6%
INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)
--------------------------- Captured stderr teardown
---------------------------
PDF/A conversion: 100%|██████████| 1/1 [00:00<00:00, 4.16page/s]
________________________________ test_redo_ocr
_________________________________
resources =
PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')
outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_redo_ocr0/out.pdf')
def test_redo_ocr(resources, outpdf):
in_ = resources / 'graph_ocred.pdf'
before = PdfInfo(in_, detailed_analysis=True)
out = outpdf
out = check_ocrmypdf(in_, out, '--redo-ocr')
after = PdfInfo(out, detailed_analysis=True)
assert before[0].has_text and after[0].has_text
E assert (True and False)
E + where True = <PageInfo pageno=0
7.573333333333333333333333333"x6.16" rotation=0
dpi=150.000000x150.000000 has_text=True>.has_text
E + and False = <PageInfo pageno=0
7.573333333333333333333333333"x6.16" rotation=0
dpi=150.000000x150.000000 has_text=False>.has_text
tests/test_main.py:104: AssertionError
----------------------------- Captured stderr call
-----------------------------
Scanning contents: 0%| | 0/1 [00:00<?, ?page/s]
Scanning contents: 100%|██████████| 1/1 [00:00<00:00, 20.63page/s]
OCR: 0%| | 0.0/1.0 [00:00<?, ?page/s]
OCR: 50%|█████ | 0.5/1.0 [00:04<00:04, 8.64s/page]
OCR: 100%|██████████| 1.0/1.0 [00:04<00:00, 4.35s/page]
PDF/A conversion: 0%| | 0/1 [00:00<?, ?page/s]
Recompressing JPEGs: 0image [00:00, ?image/s][A
Recompressing JPEGs: 0image [00:00, ?image/s]
Deflating JPEGs: 0%| | 0/1 [00:00<?, ?image/s][A
Deflating JPEGs: 100%|██████████| 1/1 [00:00<00:00, 254.88image/s]
JBIG2: 0item [00:00, ?item/s][A
JBIG2: 0item [00:00, ?item/s]
------------------------------ Captured log call
-------------------------------
INFO ocrmypdf._pipeline:_pipeline.py:284 redoing OCR
INFO ocrmypdf._sync:_sync.py:301 Postprocessing...
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 GPL Ghostscript
9.56.0 (2022-03-29)
Copyright (C) 2022 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1
The following warnings were encountered at least once while processing
this file:
number uses illegal exponent form
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 This file
had errors that were repaired or ignored.
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 The file was
produced by: ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277
>>>> GPL Ghostscript 9.15 <<<<
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 Please
notify the author of the software that produced this
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 file that it
does not conform to Adobe's published PDF
ERROR ocrmypdf._exec.ghostscript:ghostscript.py:277 specification.
WARNING ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could
not be copied because it is not permitted in PDF/A. You may wish to
examine the output PDF's XMP metadata.
INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.14 savings:
12.6%
INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)
--------------------------- Captured stderr teardown
---------------------------
PDF/A conversion: 100%|██████████| 1/1 [00:00<00:00, 3.91page/s]
=========================== short test summary info
============================
FAILED tests/test_main.py::test_force_ocr - assert False
FAILED tests/test_main.py::test_skip_ocr - assert False
FAILED tests/test_main.py::test_redo_ocr - assert (True and False)
======= 3 failed, 274 passed, 37 skipped, 4 xfailed in 397.41s (0:06:37)
=======
autopkgtest [08:17:33]: test test-suite
OpenPGP_signature
Description: OpenPGP digital signature
--- End Message ---