Package: fuzzyocr3
Version: 3.5.1-2
Severity: normal
Tags: patch
Since version 2.03 tesseract requires the image file to have a .tif
extension to work properly. However FuzzyOcr uses the filename
prep.maketiff.out. This stops tesseract from working as seen from this log
entry:
Exec : pnmtotiff -color -truecolor
Stdin : </tmp/.spamassassin15003RD32Twtmp/me.pnm
Stdout: >/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
Stderr: >/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.err
Exec : /usr/bin/tesseract /tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
/tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.out
Stdout: >/dev/null
Stderr: >/tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.err
Elapsed [31574]: 0.233304 sec. (/usr/bin/tesseract: exit 31)
Unable to read output from
"/tmp/.spamassassin15003RD32Twtmp/scanset.tesseract.out.txt" for scanset
tesseract
Errors in Scanset "tesseract"
Return code: 7936, Error: Tesseract Open Source OCR Engine
name_to_image_type:Error:Unrecognized image
type:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
IMAGE::read_header:Error:Can't read this image
type:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
/usr/bin/tesseract:Error:Read of file
/failed:/tmp/.spamassassin15003RD32Twtmp/prep.maketiff.out
Signal_exit 31 ABORT. LocCode: 3 AbortCode: 3
Skipping scanset because of errors, trying next...
I patched around this by making sure the maketiff preprocessor uses a
different output name and by having the scanner part know this.
--- Preprocessor.pm.ORIG 2008-05-15 18:24:22.000000000 +0200
+++ Preprocessor.pm 2008-05-15 18:51:03.000000000 +0200
@@ -15,6 +15,9 @@ sub run {
my $tmpdir = FuzzyOcr::Config::get_tmpdir();
my $label = $self->{label};
my $output = "$tmpdir/prep.$label.out";
+ if ($label =~ /maketiff/) {
+ $output = "$tmpdir/prep.$label.tif";
+ }
my $stderr = ">$tmpdir/prep.$label.err";
my $stdin = undef;
--- Scanset.pm.ORIG 2008-05-15 18:56:11.000000000 +0200
+++ Scanset.pm 2008-05-15 19:03:26.000000000 +0200
@@ -63,7 +63,12 @@ sub run {
return ($retcode,@result);
}
# Input of next processor is output of last
- $input = "$tmpdir/prep.$plabel.out";
+ # Output name of maketiff is special!
+ if ($plabel =~ /maketiff/) {
+ $input = "$tmpdir/prep.$plabel.tif";
+ } else {
+ $input = "$tmpdir/prep.$plabel.out";
+ }
}
}
It is not the nicest solution, but it works :). The other solution would be
to have the .tif filename extension requirement removed from tesseract. I'll
leave that discussion to you Debian developers... :)
-- System Information:
Debian Release: lenny/sid
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.6.23.14 (PREEMPT)
Locale: LANG=C, LC_CTYPE=en_US.ISO-8859-15 (charmap=ISO-8859-15)
Shell: /bin/sh linked to /bin/bash
Versions of packages fuzzyocr3 depends on:
ii gifsicle 1.49-1 Tool for manipulating GIF images
ii gocr 0.41-1+b1 A command line OCR
ii libmldbm-sync-perl 0.30-2 Perl module for safe concurrent ac
ii libstring-approx-perl 3.25-1+b1 Perl extension for approximate mat
ii libungif-bin 4.1.6-4 library for GIF images (transition
ii netpbm 2:10.0-11.1 Graphics conversion tools
ii ocrad 0.17-3 Optical Character Recognition prog
ii perl [libdigest-md5-perl] 5.10.0-10 Larry Wall's Practical Extraction
ii spamassassin 3.2.4-1 Perl-based spam filter using text
fuzzyocr3 recommends no packages.
-- no debconf information
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]