[Bug 1687308] Re: ocrmypdf program and man page disagree about options
This bug was fixed in the package ocrmypdf - 5.4-1 --- ocrmypdf (5.4-1) unstable; urgency=medium * New upstream release. * Drop Testsuite: field. See Lintian tag unnecessary-testsuite-autopkgtest-header. * Bump standards version to 4.1.1 (no changes required). -- Sean Whitton Sat, 14 Oct 2017 10:46:45 -0700 ** Changed in: ocrmypdf (Ubuntu) Status: Expired => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1687308] Re: ocrmypdf program and man page disagree about options
[Expired for ocrmypdf (Ubuntu) because there has been no activity for 60 days.] ** Changed in: ocrmypdf (Ubuntu) Status: Incomplete => Expired -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1687308] Re: ocrmypdf program and man page disagree about options
Now that you point it out I see what you mean. I looked at the image with gimp and see that the resolution is ... not so great. I tried to sharpen the text up but, not being too skilled with gimp, didn't succeed. thanks for your help. I'll try to get a better image and try again. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1687308] Re: ocrmypdf program and man page disagree about options
That scan is quite low resolution so it is hard to say how well any OCR will work. I'd expect better than garbage, but a lot of errors. The DPI is quite significant for checking whether a group of pixels is noise or a glyph. It implies the minimum font size. 72 or 96 is a good guess for screenshots (or 200 for a retina screen). One possibility is that ocrmypdf fails to encode Cyrillic under the current settings and available system fonts. If you have problems with all Cyrillic images (even high quality scans), you could try adding the --pdf-renderer=tesseract --output-type=pdf . That seems to work better for non-Latin languages. If you want to install the latest version instead of the Ubuntu version, you could use the --sidecar argument to see what text is being found to discern if the issue is PDF encoding or the image itself. Aside: The "just print" feature would not have been helpful here even if it worked. On Sun, 4 Jun 2017 at 05:11 david braun <1687...@bugs.launchpad.net> wrote: > Sorry for the delay. > I'm trying to translate the text in the attached to english. I have loaded > the tesseract RUS language and executing > $ ocrmypdf -l rus --image-dpi 64 111684498_large_2.jpg > 111684498_large_2.pdf > completes with the following messages >INFO - Input file is not a PDF, checking if it is an image... >INFO - Input file is an image >INFO - Input image has no ICC profile, assuming sRGB >INFO - Image seems valid. Try converting to PDF... >INFO - Successfully converted to PDF, processing... > WARNING -1: [tesseract] unsure about page orientation >INFO - Output file is a PDF/A-2B (as expected) > But Google translate produces garbage. > I was hoping to see what was being done by ocrmypdf to see if I could > figure out what might be the cause. > > BTW - I chose the DPI randomly - how significant is this parameter? > > > On Fri, May 26, 2017 at 12:51 AM, James R Barlow < > 1687...@bugs.launchpad.net > > wrote: > > > The code makes decisions at runtime based on the input file, so an > argument > > to skip executing all intermediates doesn't give an accurate picture of > > what will happen. There is a --flowchart argument that produces a SVG > file > > showing the processing path which helps development a lot, but it's > > probably not helpful to anyone else. > > > > What sort of use did you have for it? > > On Thu, May 25, 2017 at 17:56 david braun <1687...@bugs.launchpad.net> > > wrote: > > > > > > > > That's unfortunate! Any reason why you removed the options? > > > > > > -- > > > You received this bug notification because you are subscribed to > Ubuntu. > > > https://bugs.launchpad.net/bugs/1687308 > > > > > > Title: > > > ocrmypdf program and man page disagree about options > > > > > > Status in ocrmypdf package in Ubuntu: > > > Incomplete > > > > > > Bug description: > > > The man page for ocrmypdf claimes there is a "--just-print" option > but > > > the program rejects this. Also the man page claims the "-n" does the > > > same. It doesn't. The option is accepted but nothing obvious happens. > > > > > > ProblemType: Bug > > > DistroRelease: Ubuntu 17.04 > > > Package: ocrmypdf 4.3.5-2 > > > ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8 > > > Uname: Linux 4.10.0-20-generic x86_64 > > > ApportVersion: 2.20.4-0ubuntu4 > > > Architecture: amd64 > > > CurrentDesktop: Unity:Unity7 > > > Date: Sun Apr 30 13:55:46 2017 > > > EcryptfsInUse: Yes > > > InstallationDate: Installed on 2015-05-31 (699 days ago) > > > InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64 > > > (20150218.1) > > > PackageArchitecture: all > > > ProcEnviron: > > >LANGUAGE=en_US > > >PATH=(custom, no user) > > >XDG_RUNTIME_DIR= > > >LANG=en_US.UTF-8 > > >SHELL=/bin/bash > > > SourcePackage: ocrmypdf > > > UpgradeStatus: Upgraded to zesty on 2017-04-28 (1 days ago) > > > > > > To manage notifications about this bug go to: > > > > > > https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/ > > 1687308/+subscriptions > > > > > > > > > > -- > > You received this bug notification because you are subscribed to the bug > > report. > > https://bugs.launchpad.net/bugs/1687308 > > > > Title: > > ocrmypdf program and man page disagree about options > > > > Status in ocrmypdf package in Ubuntu: > > Incomplete > > > > Bug description: > > The man page for ocrmypdf claimes there is a "--just-print" option but > > the program rejects this. Also the man page claims the "-n" does the > > same. It doesn't. The option is accepted but nothing obvious happens. > > > > ProblemType: Bug > > DistroRelease: Ubuntu 17.04 > > Package: ocrmypdf 4.3.5-2 > > ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8 > > Uname: Linux 4.10.0-20-generic x86_64 > > ApportVersion: 2.20.4-0ubuntu4 > > Architecture: amd64 > > CurrentDesktop: Unity:Unity7 > > Date: Sun Apr 30 13:55:46 2017 > > EcryptfsInUse: Yes > > Insta
Re: [Bug 1687308] Re: ocrmypdf program and man page disagree about options
Sorry for the delay. I'm trying to translate the text in the attached to english. I have loaded the tesseract RUS language and executing $ ocrmypdf -l rus --image-dpi 64 111684498_large_2.jpg 111684498_large_2.pdf completes with the following messages INFO - Input file is not a PDF, checking if it is an image... INFO - Input file is an image INFO - Input image has no ICC profile, assuming sRGB INFO - Image seems valid. Try converting to PDF... INFO - Successfully converted to PDF, processing... WARNING -1: [tesseract] unsure about page orientation INFO - Output file is a PDF/A-2B (as expected) But Google translate produces garbage. I was hoping to see what was being done by ocrmypdf to see if I could figure out what might be the cause. BTW - I chose the DPI randomly - how significant is this parameter? On Fri, May 26, 2017 at 12:51 AM, James R Barlow <1687...@bugs.launchpad.net > wrote: > The code makes decisions at runtime based on the input file, so an argument > to skip executing all intermediates doesn't give an accurate picture of > what will happen. There is a --flowchart argument that produces a SVG file > showing the processing path which helps development a lot, but it's > probably not helpful to anyone else. > > What sort of use did you have for it? > On Thu, May 25, 2017 at 17:56 david braun <1687...@bugs.launchpad.net> > wrote: > > > > > That's unfortunate! Any reason why you removed the options? > > > > -- > > You received this bug notification because you are subscribed to Ubuntu. > > https://bugs.launchpad.net/bugs/1687308 > > > > Title: > > ocrmypdf program and man page disagree about options > > > > Status in ocrmypdf package in Ubuntu: > > Incomplete > > > > Bug description: > > The man page for ocrmypdf claimes there is a "--just-print" option but > > the program rejects this. Also the man page claims the "-n" does the > > same. It doesn't. The option is accepted but nothing obvious happens. > > > > ProblemType: Bug > > DistroRelease: Ubuntu 17.04 > > Package: ocrmypdf 4.3.5-2 > > ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8 > > Uname: Linux 4.10.0-20-generic x86_64 > > ApportVersion: 2.20.4-0ubuntu4 > > Architecture: amd64 > > CurrentDesktop: Unity:Unity7 > > Date: Sun Apr 30 13:55:46 2017 > > EcryptfsInUse: Yes > > InstallationDate: Installed on 2015-05-31 (699 days ago) > > InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64 > > (20150218.1) > > PackageArchitecture: all > > ProcEnviron: > >LANGUAGE=en_US > >PATH=(custom, no user) > >XDG_RUNTIME_DIR= > >LANG=en_US.UTF-8 > >SHELL=/bin/bash > > SourcePackage: ocrmypdf > > UpgradeStatus: Upgraded to zesty on 2017-04-28 (1 days ago) > > > > To manage notifications about this bug go to: > > > > https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/ > 1687308/+subscriptions > > > > > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1687308 > > Title: > ocrmypdf program and man page disagree about options > > Status in ocrmypdf package in Ubuntu: > Incomplete > > Bug description: > The man page for ocrmypdf claimes there is a "--just-print" option but > the program rejects this. Also the man page claims the "-n" does the > same. It doesn't. The option is accepted but nothing obvious happens. > > ProblemType: Bug > DistroRelease: Ubuntu 17.04 > Package: ocrmypdf 4.3.5-2 > ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8 > Uname: Linux 4.10.0-20-generic x86_64 > ApportVersion: 2.20.4-0ubuntu4 > Architecture: amd64 > CurrentDesktop: Unity:Unity7 > Date: Sun Apr 30 13:55:46 2017 > EcryptfsInUse: Yes > InstallationDate: Installed on 2015-05-31 (699 days ago) > InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64 > (20150218.1) > PackageArchitecture: all > ProcEnviron: >LANGUAGE=en_US >PATH=(custom, no user) >XDG_RUNTIME_DIR= >LANG=en_US.UTF-8 >SHELL=/bin/bash > SourcePackage: ocrmypdf > UpgradeStatus: Upgraded to zesty on 2017-04-28 (1 days ago) > > To manage notifications about this bug go to: > https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/ > 1687308/+subscriptions > ** Attachment added: "111684498_large_2.jpg" https://bugs.launchpad.net/bugs/1687308/+attachment/404/+files/111684498_large_2.jpg ** Attachment added: "111684498_large_2.pdf" https://bugs.launchpad.net/bugs/1687308/+attachment/405/+files/111684498_large_2.pdf -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/m
Re: [Bug 1687308] Re: ocrmypdf program and man page disagree about options
The code makes decisions at runtime based on the input file, so an argument to skip executing all intermediates doesn't give an accurate picture of what will happen. There is a --flowchart argument that produces a SVG file showing the processing path which helps development a lot, but it's probably not helpful to anyone else. What sort of use did you have for it? On Thu, May 25, 2017 at 17:56 david braun <1687...@bugs.launchpad.net> wrote: > > That's unfortunate! Any reason why you removed the options? > > -- > You received this bug notification because you are subscribed to Ubuntu. > https://bugs.launchpad.net/bugs/1687308 > > Title: > ocrmypdf program and man page disagree about options > > Status in ocrmypdf package in Ubuntu: > Incomplete > > Bug description: > The man page for ocrmypdf claimes there is a "--just-print" option but > the program rejects this. Also the man page claims the "-n" does the > same. It doesn't. The option is accepted but nothing obvious happens. > > ProblemType: Bug > DistroRelease: Ubuntu 17.04 > Package: ocrmypdf 4.3.5-2 > ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8 > Uname: Linux 4.10.0-20-generic x86_64 > ApportVersion: 2.20.4-0ubuntu4 > Architecture: amd64 > CurrentDesktop: Unity:Unity7 > Date: Sun Apr 30 13:55:46 2017 > EcryptfsInUse: Yes > InstallationDate: Installed on 2015-05-31 (699 days ago) > InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64 > (20150218.1) > PackageArchitecture: all > ProcEnviron: >LANGUAGE=en_US >PATH=(custom, no user) >XDG_RUNTIME_DIR= >LANG=en_US.UTF-8 >SHELL=/bin/bash > SourcePackage: ocrmypdf > UpgradeStatus: Upgraded to zesty on 2017-04-28 (1 days ago) > > To manage notifications about this bug go to: > > https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions > > -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1687308] Re: ocrmypdf program and man page disagree about options
That's unfortunate! Any reason why you removed the options? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1687308] Re: ocrmypdf program and man page disagree about options
In upstream I removed both of these arguments. I suggest patching them out of Ubuntu as well. On Thu, May 25, 2017 at 12:41 Andreas Moog wrote: > I agree that there is a bug with the -n option. From what I understand, > it should only simulate the commands, not actually execute them. But the > final check on the output.pdf seems to be unconditionally called, even > if -n is used. That's why you get an error. > > You have to use the verbose option to see more of what ocrmypdf does. > Like in my example, -n --verbose 2 will tell you what tasks would be > run. > > What do you expect -n to be doing? > > -- > You received this bug notification because you are subscribed to Ubuntu. > https://bugs.launchpad.net/bugs/1687308 > > Title: > ocrmypdf program and man page disagree about options > > Status in ocrmypdf package in Ubuntu: > Incomplete > > Bug description: > The man page for ocrmypdf claimes there is a "--just-print" option but > the program rejects this. Also the man page claims the "-n" does the > same. It doesn't. The option is accepted but nothing obvious happens. > > ProblemType: Bug > DistroRelease: Ubuntu 17.04 > Package: ocrmypdf 4.3.5-2 > ProcVersionSignature: Ubuntu 4.10.0-20.22-generic 4.10.8 > Uname: Linux 4.10.0-20-generic x86_64 > ApportVersion: 2.20.4-0ubuntu4 > Architecture: amd64 > CurrentDesktop: Unity:Unity7 > Date: Sun Apr 30 13:55:46 2017 > EcryptfsInUse: Yes > InstallationDate: Installed on 2015-05-31 (699 days ago) > InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64 > (20150218.1) > PackageArchitecture: all > ProcEnviron: >LANGUAGE=en_US >PATH=(custom, no user) >XDG_RUNTIME_DIR= >LANG=en_US.UTF-8 >SHELL=/bin/bash > SourcePackage: ocrmypdf > UpgradeStatus: Upgraded to zesty on 2017-04-28 (1 days ago) > > To manage notifications about this bug go to: > > https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions > > -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1687308] Re: ocrmypdf program and man page disagree about options
I agree that there is a bug with the -n option. From what I understand, it should only simulate the commands, not actually execute them. But the final check on the output.pdf seems to be unconditionally called, even if -n is used. That's why you get an error. You have to use the verbose option to see more of what ocrmypdf does. Like in my example, -n --verbose 2 will tell you what tasks would be run. What do you expect -n to be doing? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1687308] Re: ocrmypdf program and man page disagree about options
Sorry for the misspelling! So when I try the correct option I get $ ocrmypdf --just_print input.pdf output.pdf Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 11, in load_entry_point('ocrmypdf==4.3.5', 'console_scripts', 'ocrmypdf')() File "/usr/lib/python3/dist-packages/ocrmypdf/__main__.py", line 1521, in run_pipeline pdfa_info = file_claims_pdfa(options.output_file) File "/usr/lib/python3/dist-packages/ocrmypdf/pdfa.py", line 131, in file_claims_pdfa pdf = pypdf.PdfFileReader(filename) File "/usr/lib/python3/dist-packages/PyPDF2/pdf.py", line 1081, in __init__ fileobj = open(stream, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'output.pdf' $ which I don't understand at all - output.pdf is the output file - it shouldn't exist! Or if it does it should either be overwritten, a warning printed, or a confirmation requested. the -n option does the same. However - if I use "output.pdf" as the output file AND it exists AND it is a PDF file (but not a PDF/A) I get $ file *.pdf input.pdf: PDF document, version 1.7 output.pdf: PDF document, version 1.7 $ ocrmypdf --just_print input.pdf output.pdf WARNING - Output file is okay but is not PDF/A (seems to be No XMP metadata) (note: Both input.pdf and output.pdf are the same) BUT - if output.pdf is a PDF/A-2B file I get $ file *.pdf input.pdf: PDF document, version 1.7 output.pdf: PDF document, version 1.5 $ ocrmypdf --just_print input.pdf output.pdf INFO - Output file is a PDF/A-2B (as expected) None of which is what I expected from the man page description -n, --just_print Don't actually run any commands; just print the pipeline. Something isn't right! BTW: the -n and --just_print options aren't listed in the SYNOPSIS section of the man page -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1687308] Re: ocrmypdf program and man page disagree about options
The option is "--just_print" (underscore instead of dash). And it seems to be working correctly for me, see http://paste.ubuntu.com/24656479/ What exactly is the error message you are getting? ** Changed in: ocrmypdf (Ubuntu) Importance: Undecided => Low ** Changed in: ocrmypdf (Ubuntu) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1687308] Re: ocrmypdf program and man page disagree about options
** Tags added: manpage -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1687308 Title: ocrmypdf program and man page disagree about options To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ocrmypdf/+bug/1687308/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs