Re: [sane-devel] scanimage / tesseract interoperability
Very much appreciated. -Jeff -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Revised so printing filenames to stdout is optional and defaults to off. The new option is --batch-print. Please consider applying along with happy-batch-1.1.gz names-to-stdout-1.4.diff.gz Description: GNU Zip compressed data -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Both of these patches (with some minor compiler silencing modifications) have been committed to sane-backends. Thanks for your effort and your patience. Thanks to Olaf for reviewing the code when I was away. allan On Mon, Jul 14, 2014 at 3:57 PM, Jeff Breidenbach j...@jab.org wrote: Revised so printing filenames to stdout is optional and defaults to off. The new option is --batch-print. Please consider applying along with happy-batch-1.1.gz -- The truth is an offense, but not a sin -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Friendly reminder. -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Is there anything more that I can do to help? -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Thank you for the review, Olaf. I've incorporated both of your suggestions. Jeff On Thu, May 29, 2014 at 4:49 PM, Olaf Meeuwissen olaf.meeuwis...@avasys.jp wrote: Jeff Breidenbach writes: Are these two patches on track for inclusion? What more can I do to help? names-to-stdout-1.2.diff.gz happy-batch-1.0.diff.gz I don't have commit privileges but both look mostly good to me. Re: names-to-stdout-1.2.diff.gz, you have @@ -2283,19 +2291,27 @@ List of available devices:, prog_name); { fprintf (stderr, %s: sane_start: %s\n, prog_name, sane_strstatus (status)); - fclose (stdout); - break; + if (ofp) + { + fclose (ofp); + ofp = NULL; + break; + } } but shouldn't that break statement stay out of the if-branch? Looks to me like your code has the potential to change program flow. Re: happy-batch-1.0.diff.gz, why n = batch_start_at + batch_increment and not just n batch_start_at Hope this helps, -- Olaf Meeuwissen, LPIC-2 FLOSS Engineer -- AVASYS CORPORATION FSF Associate Member #1962 Help support software freedom http://www.fsf.org/jf?referrer=1962 happy-batch-1.1.gz Description: GNU Zip compressed data names-to-stdout-1.3.diff.gz Description: GNU Zip compressed data -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Jeff Breidenbach writes: Are these two patches on track for inclusion? What more can I do to help? names-to-stdout-1.2.diff.gz happy-batch-1.0.diff.gz I don't have commit privileges but both look mostly good to me. Re: names-to-stdout-1.2.diff.gz, you have @@ -2283,19 +2291,27 @@ List of available devices:, prog_name); { fprintf (stderr, %s: sane_start: %s\n, prog_name, sane_strstatus (status)); - fclose (stdout); - break; + if (ofp) + { + fclose (ofp); + ofp = NULL; + break; + } } but shouldn't that break statement stay out of the if-branch? Looks to me like your code has the potential to change program flow. Re: happy-batch-1.0.diff.gz, why n = batch_start_at + batch_increment and not just n batch_start_at Hope this helps, -- Olaf Meeuwissen, LPIC-2 FLOSS Engineer -- AVASYS CORPORATION FSF Associate Member #1962 Help support software freedom http://www.fsf.org/jf?referrer=1962 -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Implementation was a little intrusive because there is no recovery from calling freopen() on stdout. This preliminary patch follows the recommendations of the C FAQ and introduces an explicit stream variable. I've only done light testing. http://c-faq.com/stdio/undofreopen.html $ scanimage --batch 2 /dev/null | cat out1.pnm out2.pnm Cheers, Jeff --- /tmp/orig/sane-backends-1.0.23/frontend/scanimage.c 2014-05-12 13:44:40.0 -0700 +++ sane-backends-1.0.23/frontend/scanimage.c 2014-05-12 14:17:18.0 -0700 @@ -1124,7 +1124,8 @@ } static void -write_pnm_header (SANE_Frame format, int width, int height, int depth) +write_pnm_header (SANE_Frame format, int width, int height, int depth, + FILE *ofp) { /* The netpbm-package does not define raw image data with maxval 255. */ /* But writing maxval 65535 for 16bit data gives at least a chance */ @@ -1135,20 +1136,20 @@ case SANE_FRAME_GREEN: case SANE_FRAME_BLUE: case SANE_FRAME_RGB: - printf (P6\n# SANE data follows\n%d %d\n%d\n, width, height, + fprintf (ofp, P6\n# SANE data follows\n%d %d\n%d\n, width, height, (depth = 8) ? 255 : 65535); break; default: if (depth == 1) - printf (P4\n# SANE data follows\n%d %d\n, width, height); + fprintf (ofp, P4\n# SANE data follows\n%d %d\n, width, height); else - printf (P5\n# SANE data follows\n%d %d\n%d\n, width, height, + fprintf (ofp, P5\n# SANE data follows\n%d %d\n%d\n, width, height, (depth = 8) ? 255 : 65535); break; } #ifdef __EMX__ /* OS2 - write in binary mode. */ - _fsetmode (stdout, b); + _fsetmode (ofp, b); #endif } @@ -1183,7 +1184,7 @@ } static SANE_Status -scan_it (void) +scan_it (FILE *ofp) { int i, len, first_frame = 1, offset = 0, must_buffer = 0, hundred_percent; SANE_Byte min = 0xff, max = 0; @@ -1273,10 +1274,10 @@ sanei_write_tiff_header (parm.format, parm.pixels_per_line, parm.lines, parm.depth, resolution_value, - icc_profile); + icc_profile, ofp); else write_pnm_header (parm.format, parm.pixels_per_line, - parm.lines, parm.depth); + parm.lines, parm.depth, ofp); } break; @@ -1397,7 +1398,7 @@ else /* ! must_buffer */ { if ((output_format == OUTPUT_TIFF) || (parm.depth != 16)) - fwrite (buffer, 1, len, stdout); + fwrite (buffer, 1, len, ofp); else { #if !defined(WORDS_BIGENDIAN) @@ -1408,7 +1409,7 @@ { if (len 0) { - fwrite (buffer, 1, 1, stdout); + fwrite (buffer, 1, 1, ofp); buffer[0] = (SANE_Byte) hang_over; hang_over = -1; start = 1; @@ -1429,7 +1430,7 @@ len--; } #endif - fwrite (buffer, 1, len, stdout); + fwrite (buffer, 1, len, ofp); } } @@ -1453,10 +1454,10 @@ if (output_format == OUTPUT_TIFF) sanei_write_tiff_header (parm.format, parm.pixels_per_line, image.height, parm.depth, resolution_value, - icc_profile); + icc_profile, ofp); else write_pnm_header (parm.format, parm.pixels_per_line, - image.height, parm.depth); + image.height, parm.depth, ofp); #if !defined(WORDS_BIGENDIAN) /* multibyte pnm file may need byte swap to LE */ @@ -1474,11 +1475,11 @@ } #endif - fwrite (image.data, 1, image.height * image.width, stdout); + fwrite (image.data, 1, image.height * image.width, ofp); } /* flush the output buffer */ - fflush( stdout ); + fflush( ofp ); cleanup: if (image.data) @@ -1714,6 +1715,7 @@ SANE_Status status; char *full_optstring; SANE_Int version_code; + FILE *ofp; atexit (scanimage_exit); @@ -2236,12 +2238,15 @@ format = out%d.pnm; } + if (!batch) +ofp = stdout; + if (batch) fprintf (stderr, Scanning %d pages, incrementing by %d, numbering from %d\n, batch_count, batch_increment, batch_start_at); - else if(isatty(fileno(stdout))){ + else if(isatty(fileno(ofp))){ fprintf (stderr,%s: output is not a file, exiting\n, prog_name); exit (1); } @@ -2274,7 +2279,11 @@ { fprintf (stderr, Batch terminated, %d pages scanned\n, (n - batch_increment)); - fclose (stdout); + if (ofp) + { + fclose (ofp); + ofp = NULL; + } break; /* get out of this loop */ } } @@ -2294,19 +2303,27 @@ { fprintf (stderr, %s: sane_start: %s\n, prog_name, sane_strstatus (status)); - fclose (stdout); - break; + if (ofp) + { + fclose (ofp); + ofp = NULL; + break; + } } + /* write to .part file while scanning is in progress */ - if (batch NULL == freopen (part_path, w, stdout)) + if (batch) +ofp = fopen (part_path, w); + + if (batch ofp == NULL) { fprintf (stderr, cannot open %s\n, part_path); sane_cancel (device); return SANE_STATUS_ACCESS_DENIED; } - status = scan_it (); + status = scan_it (ofp); if (batch) { fprintf (stderr, Scanned page %d.,
Re: [sane-devel] scanimage / tesseract interoperability
Implemented and tested. Please consider for inclusion. happy-batch-1.0.diff.gz Description: GNU Zip compressed data -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Testing found an error path with a double fclose. Tiny tweak to make that impossible. - if (0 != fclose(ofp)) + if (!ofp || 0 != fclose(ofp)) names-to-stdout-1.2.diff.gz Description: GNU Zip compressed data -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Jeff Breidenbach writes: When I run scanimage on a Fujitsu S1500, the program is a little unhappy even after normal operation, note the return code. This is not great for pipelines. Should I attempt a fix? This is version 1.0.23-3ubuntu3 on the latest Ubuntu release. Sorry, I haven't yet figured out how to configure a vanilla build of SANE from git to talk to hardware. $ scanimage --batch Scanning -1 pages, incrementing by 1, numbering from 1 Scanning page 1 Scanned page 1. (scanner status = 5) Scanning page 2 Scanned page 2. (scanner status = 5) Scanning page 3 scanimage: sane_start: Document feeder out of documents $ echo $? 7 That's SANE_STATUS_NO_DOCS which is to be expected for --batch scans. I'd say scanimage should return EXIT_SUCCESS for --batch scans if and only if it has successfully acquired at least one image. That is, starting a --batch scan without any originals in the feeder should still return SANE_STATUS_NO_DOCS. Hope this helps, -- Olaf Meeuwissen, LPIC-2 FLOSS Engineer -- AVASYS CORPORATION FSF Associate Member #1962 Help support software freedom http://www.fsf.org/jf?referrer=1962 -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
When I run scanimage on a Fujitsu S1500, the program is a little unhappy even after normal operation, note the return code. This is not great for pipelines. Should I attempt a fix? This is version 1.0.23-3ubuntu3 on the latest Ubuntu release. Sorry, I haven't yet figured out how to configure a vanilla build of SANE from git to talk to hardware. $ scanimage --batch Scanning -1 pages, incrementing by 1, numbering from 1 Scanning page 1 Scanned page 1. (scanner status = 5) Scanning page 2 Scanned page 2. (scanner status = 5) Scanning page 3 scanimage: sane_start: Document feeder out of documents $ echo $? 7 Second, I'm attaching a slightly newer version of the patch that emits filenames to stdout. No functional changes, I just tried to match the surrounding code style a little better. Hopefully I got the tab situation more or less correct. Does it look good? $ scanimage --batch Scanning -1 pages, incrementing by 1, numbering from 1 Scanning page 1 Scanned page 1. (scanner status = 5) out1.pnm Scanning page 2 Scanned page 2. (scanner status = 5) out2.pnm Scanning page 3 scanimage: sane_start: Document feeder out of documents scanimage: sane_read: Operation was cancelled Scanned page 3. (scanner status = 2) names-to-stdout-1.1.diff.gz Description: GNU Zip compressed data -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Thank you, Allan. Tesseract will also accept image data directly on stdin, so single scan mode should work just fine. I think it is cleaner to use stdout as opposed to stderr for filenames. I will work on a patch. There is one possible alternative. Scanimage could emit image data to stdout using one of the multi-image formats. For example, multi-image PNM is simply the concatenation of individual PNM files. Let me know if you prefer that instead. Cheers, Jeff -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Jeffrey, gscan2pdf is terrific and I expect most users will prefer the friendly graphical user interface. I would love to compare notes with you on PDF generation nuances, and also coordinate with respect to future Tesseract releases. It's also good have the command line programs connect together nicely through pipes, in the unix tradition. The necessary modifications on the Tesseract side are almost ready, so it is a good time to look at the scanimage side. Cheers, Jeff -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
We would accept a patch which adds a new argument to scanimage which causes it to output the filename to stdout when it is in batch mode. In single scan mode, this will never work, because the image is printed on stdout. Another option is to add the filename to stderr, which already has some meta info. Then tesseract (or an intermediate script) could parse and extract the image name. That would make your command lines more complicated, however. allan On Sun, May 11, 2014 at 7:50 PM, Jeff Breidenbach j...@jab.org wrote: Jeffrey, gscan2pdf is terrific and I expect most users will prefer the friendly graphical user interface. I would love to compare notes with you on PDF generation nuances, and also coordinate with respect to future Tesseract releases. It's also good have the command line programs connect together nicely through pipes, in the unix tradition. The necessary modifications on the Tesseract side are almost ready, so it is a good time to look at the scanimage side. Cheers, Jeff -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org -- The truth is an offense, but not a sin -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org
Re: [sane-devel] scanimage / tesseract interoperability
Thank you Simon, the --batch-script feature looks very flexible. It almost does the trick: scanimage --batch --batch-script (echo) | tesseract - - Unfortunately, it runs just a little too early. The script executes just before the temporary image file is renamed. Tesseract wants to know about the final filename after it exists. My very first implementation attempt didn't work; the emitted filename somehow ended up in the image data. Is there a problem with scanimage writing to standard output? if (new_interoperability_feature) fprintf(stdout, %s\n, path); I ultimately hope to make scanimage and tesseract work together nicely for everyone. So any suggestions or advice are appreciated. Cheers, Jeff -- sane-devel mailing list: sane-devel@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/sane-devel Unsubscribe: Send mail with subject unsubscribe your_password to sane-devel-requ...@lists.alioth.debian.org