Bug#1009680: ghostscript breaks ocrmypdf autopkgtest: seemingly multiple issues

2022-04-20 Thread Nilesh Patra
Hi,

On Thu, 14 Apr 2022 14:23:43 -0700 Sean Whitton  
wrote:
> control: reassign -1 ghostscript
> control: forwarded -1 https://bugs.ghostscript.com/show_bug.cgi?id=705187
> control: retitle -1 Ghostscript 9.56 removes hidden (e.g. OCR) text layers 
> when refrying with NEWPDF=true
> 
> Hello,
> 
> On Thu 14 Apr 2022 at 11:13AM +02, Paul Gevers wrote:
> 
> > With a recent upload of ghostscript the autopkgtest of ocrmypdf fails in
> > testing when that autopkgtest is run with the binary packages of
> > ghostscript from unstable. It passes when run with only packages from
> > testing. In tabular form:
> >
> > passfail
> > ghostscriptfrom testing9.56.0~dfsg-1
> > ocrmypdf   from testing13.4.0+dfsg-1
> > all others from testingfrom testing
> >
> > I copied some of the output at the bottom of this report.
> >
> > Currently this regression is blocking the migration of ghostscript to
> > testing [1]. Due to the nature of this issue, I filed this bug report
> > against both packages. Can you please investigate the situation and
> > reassign the bug to the right package?
> >
> > More information about this bug and the reason for filing it can be found on
> > https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation
> 
> It's a regression in Ghostscript.
> 
> OCRmyPDF has made a release including a workaround but the test suite
> for that fails, so I can't upload it yet.  But in any case this bug is
> not one in OCRmyPDF.

Looks like it has been resolved upstream[1] as I see in the ticket[2].
Jonas, could you please consider to upload a fix?

I personally use ghostscript and would very much like to see this bug being 
fixed.

[1]: https://bugs.ghostscript.com/show_bug.cgi?id=705187
[2]: 
http://git.ghostscript.com/?p=ghostpdl.git;h=fa895673a942caefb81efe1c922407a46d6780c9

Best, Nilesh


signature.asc
Description: PGP signature


Bug#1009680: ghostscript breaks ocrmypdf autopkgtest: seemingly multiple issues

2022-04-14 Thread Sean Whitton
control: reassign -1 ghostscript
control: forwarded -1 https://bugs.ghostscript.com/show_bug.cgi?id=705187
control: retitle -1 Ghostscript 9.56 removes hidden (e.g. OCR) text layers when 
refrying with NEWPDF=true

Hello,

On Thu 14 Apr 2022 at 11:13AM +02, Paul Gevers wrote:

> With a recent upload of ghostscript the autopkgtest of ocrmypdf fails in
> testing when that autopkgtest is run with the binary packages of
> ghostscript from unstable. It passes when run with only packages from
> testing. In tabular form:
>
> passfail
> ghostscriptfrom testing9.56.0~dfsg-1
> ocrmypdf   from testing13.4.0+dfsg-1
> all others from testingfrom testing
>
> I copied some of the output at the bottom of this report.
>
> Currently this regression is blocking the migration of ghostscript to
> testing [1]. Due to the nature of this issue, I filed this bug report
> against both packages. Can you please investigate the situation and
> reassign the bug to the right package?
>
> More information about this bug and the reason for filing it can be found on
> https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

It's a regression in Ghostscript.

OCRmyPDF has made a release including a workaround but the test suite
for that fails, so I can't upload it yet.  But in any case this bug is
not one in OCRmyPDF.

-- 
Sean Whitton


signature.asc
Description: PGP signature


Bug#1009680: ghostscript breaks ocrmypdf autopkgtest: seemingly multiple issues

2022-04-14 Thread james
Ghostscript 9.56.0 introduced a serious bug from ocrmypdf’s perspective. 
Upgrading to ocrmypdf 13.4.2 would work or a newer Ghostscript if that’s been 
released. 

> On Apr 14, 2022, at 02:15, Paul Gevers  wrote:
> 
> Source: ghostscript, ocrmypdf
> Control: found -1 ghostscript/9.56.0~dfsg-1
> Control: found -1 ocrmypdf/13.4.0+dfsg-1
> Severity: serious
> Tags: sid bookworm
> User: debian...@lists.debian.org
> Usertags: breaks needs-update
> 
> Dear maintainer(s),
> 
> With a recent upload of ghostscript the autopkgtest of ocrmypdf fails in 
> testing when that autopkgtest is run with the binary packages of ghostscript 
> from unstable. It passes when run with only packages from testing. In tabular 
> form:
> 
>   passfail
> ghostscriptfrom testing9.56.0~dfsg-1
> ocrmypdf   from testing13.4.0+dfsg-1
> all others from testingfrom testing
> 
> I copied some of the output at the bottom of this report.
> 
> Currently this regression is blocking the migration of ghostscript to testing 
> [1]. Due to the nature of this issue, I filed this bug report against both 
> packages. Can you please investigate the situation and reassign the bug to 
> the right package?
> 
> More information about this bug and the reason for filing it can be found on
> https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation
> 
> Paul
> 
> [1] https://qa.debian.org/excuses.php?package=ghostscript
> 
> https://ci.debian.net/data/autopkgtest/testing/amd64/o/ocrmypdf/20818050/log.gz
> 
> === FAILURES 
> ===
>  test_force_ocr 
> 
> 
> resources = 
> PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')
> outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_force_ocr0/out.pdf')
> 
>def test_force_ocr(resources, outpdf):
>out = check_ocrmypdf(
>resources / 'graph_ocred.pdf',
>outpdf,
>'-f',
>'--plugin',
>'tests/plugins/tesseract_cache.py',
>)
>pdfinfo = PdfInfo(out)
>>  assert pdfinfo[0].has_text
> E   assert False
> E+  where False =  7.573"x6.16" rotation=0 dpi=400.00x400.00 
> has_text=False>.has_text
> 
> tests/test_main.py:83: AssertionError
> - Captured stderr call 
> -
> 
> Scanning contents:   0%|  | 0/1 [00:00 Scanning contents: 100%|██| 1/1 [00:00<00:00, 62.30page/s]
> 
> OCR:   0%|  | 0.0/1.0 [00:00 OCR:  50%|█ | 0.5/1.0 [00:02<00:02,  5.47s/page]
> OCR: 100%|██| 1.0/1.0 [00:02<00:00,  2.75s/page]
> 
> PDF/A conversion:   0%|  | 0/1 [00:00 
> Recompressing JPEGs: 0image [00:00, ?image/s]
> Recompressing JPEGs: 0image [00:00, ?image/s]
> 
> 
> Deflating JPEGs:   0%|  | 0/1 [00:00 Deflating JPEGs: 100%|██| 1/1 [00:00<00:00, 74.34image/s]
> 
> 
> JBIG2: 0item [00:00, ?item/s]
> JBIG2: 0item [00:00, ?item/s]
> -- Captured log call 
> ---
> INFO ocrmypdf._pipeline:_pipeline.py:275 page already has text! - 
> rasterizing text and running OCR anyway
> INFO ocrmypdf._sync:_sync.py:301 Postprocessing...
> WARNING  ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could not be 
> copied because it is not permitted in PDF/A. You may wish to examine the 
> output PDF's XMP metadata.
> INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.52 savings: 34.1%
> INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)
> WARNING  ocrmypdf._validation:_validation.py:381 The output file size is 
> 2.45× larger than the input file.
> Possible reasons for this include:
> The argument --force-ocr was issued, causing transcoding.
> The optional dependency 'jbig2' was not found, so some image optimizations 
> could not be attempted.
> PDF/A conversion was enabled. (Try `--output-type pdf`.)
> Plugins were used.
> --- Captured stderr teardown 
> ---
> 
> PDF/A conversion: 100%|██| 1/1 [00:01<00:00,  1.20s/page]
>  test_skip_ocr 
> _
> 
> resources = 
> PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')
> outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_skip_ocr0/out.pdf')
> 
>def test_skip_ocr(resources, outpdf):
>out = check_ocrmypdf(
>resources / 'graph_ocred.pdf',
>outpdf,
>'-s',
>'--plugin',
>'tests/plugins/tesseract_cache.py',
>)
>pdfinfo = PdfInfo(out)
>>  assert pdfinfo[0].has_text
> E   assert False
> E+  where False =  7.573"x6.16" rotation=0 dpi=150.00x150.00 

Bug#1009680: ghostscript breaks ocrmypdf autopkgtest: seemingly multiple issues

2022-04-14 Thread Paul Gevers

Source: ghostscript, ocrmypdf
Control: found -1 ghostscript/9.56.0~dfsg-1
Control: found -1 ocrmypdf/13.4.0+dfsg-1
Severity: serious
Tags: sid bookworm
User: debian...@lists.debian.org
Usertags: breaks needs-update

Dear maintainer(s),

With a recent upload of ghostscript the autopkgtest of ocrmypdf fails in 
testing when that autopkgtest is run with the binary packages of 
ghostscript from unstable. It passes when run with only packages from 
testing. In tabular form:


   passfail
ghostscriptfrom testing9.56.0~dfsg-1
ocrmypdf   from testing13.4.0+dfsg-1
all others from testingfrom testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of ghostscript to 
testing [1]. Due to the nature of this issue, I filed this bug report 
against both packages. Can you please investigate the situation and 
reassign the bug to the right package?


More information about this bug and the reason for filing it can be found on
https://wiki.debian.org/ContinuousIntegration/RegressionEmailInformation

Paul

[1] https://qa.debian.org/excuses.php?package=ghostscript

https://ci.debian.net/data/autopkgtest/testing/amd64/o/ocrmypdf/20818050/log.gz

=== FAILURES 
===
 test_force_ocr 



resources = 
PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')

outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_force_ocr0/out.pdf')

def test_force_ocr(resources, outpdf):
out = check_ocrmypdf(
resources / 'graph_ocred.pdf',
outpdf,
'-f',
'--plugin',
'tests/plugins/tesseract_cache.py',
)
pdfinfo = PdfInfo(out)

  assert pdfinfo[0].has_text

E   assert False
E+  where False = 7.573"x6.16" rotation=0 
dpi=400.00x400.00 has_text=False>.has_text


tests/test_main.py:83: AssertionError
- Captured stderr call 
-


Scanning contents:   0%|  | 0/1 [00:00-- Captured log call 
---
INFO ocrmypdf._pipeline:_pipeline.py:275 page already has text! - 
rasterizing text and running OCR anyway

INFO ocrmypdf._sync:_sync.py:301 Postprocessing...
WARNING  ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could 
not be copied because it is not permitted in PDF/A. You may wish to 
examine the output PDF's XMP metadata.
INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.52 savings: 
34.1%

INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)
WARNING  ocrmypdf._validation:_validation.py:381 The output file size is 
2.45× larger than the input file.

Possible reasons for this include:
The argument --force-ocr was issued, causing transcoding.
The optional dependency 'jbig2' was not found, so some image 
optimizations could not be attempted.

PDF/A conversion was enabled. (Try `--output-type pdf`.)
Plugins were used.
--- Captured stderr teardown 
---


PDF/A conversion: 100%|██| 1/1 [00:01<00:00,  1.20s/page]
 test_skip_ocr 
_


resources = 
PosixPath('/tmp/autopkgtest-lxc.zdbcipww/downtmp/build.V8r/src/tests/resources')

outpdf = PosixPath('/tmp/pytest-of-debci/pytest-0/test_skip_ocr0/out.pdf')

def test_skip_ocr(resources, outpdf):
out = check_ocrmypdf(
resources / 'graph_ocred.pdf',
outpdf,
'-s',
'--plugin',
'tests/plugins/tesseract_cache.py',
)
pdfinfo = PdfInfo(out)

  assert pdfinfo[0].has_text

E   assert False
E+  where False = 7.573"x6.16" rotation=0 
dpi=150.00x150.00 has_text=False>.has_text


tests/test_main.py:95: AssertionError
- Captured stderr call 
-


Scanning contents:   0%|  | 0/1 [00:00-- Captured log call 
---
INFO ocrmypdf._pipeline:_pipeline.py:287 skipping all processing on 
this page

INFO ocrmypdf._sync:_sync.py:301 Postprocessing...
WARNING  ocrmypdf._pipeline:_pipeline.py:776 Some input metadata could 
not be copied because it is not permitted in PDF/A. You may wish to 
examine the output PDF's XMP metadata.
INFO ocrmypdf.optimize:optimize.py:665 Optimize ratio: 1.14 savings: 
12.6%

INFO ocrmypdf._sync:_sync.py:399 Output file is a PDF/A-2B (as expected)
--- Captured stderr teardown 
---


PDF/A conversion: 100%|██| 1/1 [00:00<00:00,  4.16page/s]
 test_redo_ocr