[Bug 1527935] Re: Load PDF with more than 1000 pages
My "fix"is to load and ocr 1/2 of the file and save it, then scan the remainder and save it, then piece together the two using PDF Arranger. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: gscan2pdf (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
I've just fixed this in the uncoming version. However, it exposed a fundemental problem with the program architecture. Each page is stored as a temporary file. Perl does not close the file handles for temporary files until they are out of scope, so on my machine, after around 1000 pages, my machine runs out of file handles. The medium term solution is to store the data in an SQLite database, rather than lots of temporary files, but that will require major surgery. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1527935] Re: Load PDF with more than 1000 pages
On 21 December 2015 at 09:51, Tim Ritberg <1527...@bugs.launchpad.net> wrote: > x-000.ppm - x-4576.ppm > x-002.pbm - x-4577.pbm Ah. OK. In that case I should be able to fix the problem. Just have to create a test case first. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
BTW this two mask images are shown in overview at the left. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
x-000.ppm - x-4576.ppm x-002.pbm - x-4577.pbm -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
Evidently every page has the image itself, and two masks. How are the images named/numbered? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
The result is 1526 pbm-files and 3052 ppm-files. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1527935] Re: Load PDF with more than 1000 pages
On 20 December 2015 at 22:53, Tim Ritberg <1527...@bugs.launchpad.net> wrote: > -Feature request 82 (Scanning documents that are 1000 pages or more) > Seems to be, that there was/is a limit to 1000. I wrote that. It was about scanning and has nothing to do with importing a PDF. > The import with two steps didn't work. Now I have 2000 Images loaded. What about pdfimages -f 1 -l 1526 "/media/user/BigDocument.pdf" x ? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
take a look at this: https://sourceforge.net/p/gscan2pdf/mailman/message/32904652/ -Feature request 82 (Scanning documents that are 1000 pages or more) Seems to be, that there was/is a limit to 1000. The import with two steps didn't work. Now I have 2000 Images loaded. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
Just out of interest, how many files are produced by (and what are they named) pdfimages -f 1 -l 1526 "/media/user/BigDocument.pdf" x ? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
If there are 30 images in the first 10 pages, then I would expect this behaviour. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
gscan2pdf uses pdfimages to extract the images from the PDF. It looks to me that pdfimage can only write 1000 images per call. I assume that the workaround would be to extract the images in batches of 1000. Please confirm that having imported pages 1-1000 in the first step, you can them import pages 1001-1526 in a second step. If this works, I will: a. raise a bug against pdfimages b. code the workaround into gscan2pdf so that it does this internally. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
I found another strage thing with this document. I tried to load the first 10 pages. But the first 30 pages were loaded: DEBUG - import_files queued /media/user/BigDocument.pdf INFO - Getting info for/media/user/BigDocument.pdf INFO - Format: 'PDF document, version 1.5' INFO - pdfinfo "/media/user/BigDocument.pdf" INFO - Creator:LuraDocument PDF Compressor Server 5.7.69.47 Producer: LuraDocument PDF v2.47 CreationDate: Thu Aug 23 16:08:35 2012 ModDate:Thu Aug 23 16:08:35 2012 Tagged: no UserProperties: no Suspects: no Form: none JavaScript: no Pages: 1526 Encrypted: no Page size: 1177.56 x 1487.52 pts Page rot: 0 File size: 118923732 bytes Optimized: yes PDF version:1.5 INFO - 1526 pages DEBUG - import_files queued /media/user/BigDocument.pdf INFO - pdfimages -f 1 -l 10 "/media/user/BigDocument.pdf" x DEBUG - import_files started /media/user/BigDocument.pdf INFO - New page filename x-000.ppm, format Portable pixmap format (color) INFO - New page filename /tmp/gscan2pdf-9L29/S_ptSkIwvO.png, format Portable Network Graphics INFO - New page filename x-001.ppm, format Portable pixmap format (color) INFO - Added /tmp/gscan2pdf-9L29/RaNwlrqSnG.png at page 1 with resolution 72 INFO - New page filename /tmp/gscan2pdf-9L29/XJdVAR63iK.png, format Portable Network Graphics INFO - New page filename x-002.pbm, format Portable bitmap format (black and white) INFO - Added /tmp/gscan2pdf-9L29/fQOr8JWFUj.png at page 2 with resolution 72 ... DEBUG - import_files finished /media/user/BigDocument.pdf DEBUG - Started setting page_number_start from 1 to 31 DEBUG - Finished setting page_number_start from 1 to 31 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
Here first and the last log lines: INFO - 1526 pages INFO - Added /tmp/gscan2pdf-R5U6/FHcsY_awxQ.png at page 998 with resolution 72 INFO - New page filename /tmp/gscan2pdf-R5U6/6N4XIcC8s3.png, format Portable Network Graphics INFO - New page filename x-999.ppm, format Portable pixmap format (color) INFO - Added /tmp/gscan2pdf-R5U6/H4IStYd_IF.png at page 999 with resolution 72 INFO - New page filename /tmp/gscan2pdf-R5U6/YOwFGtUKz0.png, format Portable Network Graphics INFO - Added /tmp/gscan2pdf-R5U6/IiVBoSdJcP.png at page 1000 with resolution 72 DEBUG - import_files finished /media/user/BigDocument.pdf DEBUG - Started setting page_number_start from 1 to 1001 DEBUG - Finished setting page_number_start from 1 to 1001 The loading dialog asked me to load 1 to 1526 pages. I pressed ok. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1527935] Re: Load PDF with more than 1000 pages
I don't have a PDF with 1000 pages to test. Please start gscan2pdf from the command line with the --log option, reproduce the problem, quit, and post the log file: gscan2pdf --log=log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1527935 Title: Load PDF with more than 1000 pages To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gscan2pdf/+bug/1527935/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs