[tesseract-ocr] Trouble with Apparently Simple Source Image

2024-02-12 Thread Rob
Hello,

I've run into some trouble using Tesseract OCR in a python program doing 
some screen scraping. I can't quite wrap my head around why this one value 
is having so much more trouble than the others on the same page,  with the 
same contrast and font.

This is the image in question:
It has been scraped from a 1080p resolution screenshot, sliced into 
individual images for the values in a grid, scaled up by 10x, inverted 
(from white-on-black to this), thresholded, and passed to Tesseract. I have 
also tried various Gaussian and median blurs but those seem to just make 
other strings fail more.

I have tried most of the PSM options that make sense, and passed options 
with just numerals, $, comma, and decimal as allow list of characters. I've 
tried all the different interpolations OpenCV has to offer. Tesseract just 
constantly chokes on this value.

It's a little frustrating because the only OCR I've found that works with 
this value is an A9T9 model(I think) through the free api at ocr.space ( 
https://ocr.space/ocrapi#ocrengine2 ). Unfortunately there doesn't appear 
to be a way for me to run that locally, and the string seems like it should 
be simple for an OCR read.

Any advice on poking Tesseract in the right way to read this, or some fancy 
filtering I could do to help make the image clearer for it?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ae2ae7cd-6cd1-44ef-843e-ef10a35929c6n%40googlegroups.com.


[tesseract-ocr] Specify target file name patterns?

2023-05-08 Thread Rob Aaldijk
I may have missed it in teh command line parameters but is there any way to 
specify the names of target OCR-ed PDF files instead of having a (Windows 
in my case) file copy of the original file?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d03c3420-a936-4cd3-be8a-27245beed648n%40googlegroups.com.


[tesseract-ocr] tesstrain.sh no output

2018-12-03 Thread 'Rob' via tesseract-ocr
Hello,
 
i used tesstrain with tessdata_best German (deu) ant two installed Fonts. 
I had some Problems:
 
1. Which langdata do i Need for this (lstm or the normal)? I build 
tesseract and the Training Tools from source, but i do not have a langdata 
Folder. Which files do i Need?
 
2. In Phase I: Generating training Images i receive the message "Stripped 
66 unrenderable words" (the number varies). What does this mean?
 
3. At the end it says tesseract failed loading language 'eng', but i used 
deu, so i don't understand why this Erro occurs.
 
See my Terminal Input/Output below (i forgot the Latin.unicharset):

 
src/training/tesstrain.sh --fonts_dir /usr/local/share/fonts --lang deu 
--linedata_only 
  --noextract_font_properties --langdata_dir ./langdata   --tessdata_dir 
./tessdata 
  --fontlist "Desyrel" "Journal" --output_dir ~/tesstutorial/deueval



=== Starting training for language 'deu'

[So 2. Dez 22:54:19 CET 2018] /usr/local/bin/text2image --fonts_dir=/usr/
local/share/fonts --font=Desyrel --outputbase=/tmp/font_tmp.yYD7WTtIyC/
sample_text.txt --text=/tmp/font_tmp.yYD7WTtIyC/sample_text.txt --
fontconfig_tmpdir=/tmp/font_tmp.yYD7WTtIyC

Rendered page 0 to file /tmp/font_tmp.yYD7WTtIyC/sample_text.txt.tif

=== Phase I:  ===

Rendering using Desyrel

[So 2. Dez 22:54:22 CET 2018] /usr/local/bin/text2image --fontconfig_tmpdir=
/tmp/font_tmp.yYD7WTtIyC --fonts_dir=/usr/local/share/fonts 
--strip_unrenderable_words 
--leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/deu-2018-12-
02.1rr/deu.Desyrel.exp0 --max_pages=0 --font=Desyrel --text=./langdata/deu/
deu.training_text

Rendering using Journal

[So 2. Dez 22:54:23 CET 2018] /usr/local/bin/text2image --fontconfig_tmpdir=
/tmp/font_tmp.yYD7WTtIyC --fonts_dir=/usr/local/share/fonts 
--strip_unrenderable_words 
--leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/deu-2018-12-
02.1rr/deu.Journal.exp0 --max_pages=0 --font=Journal --text=./langdata/deu/
deu.training_text

Stripped 66 unrenderable words

Rendered page 0 to file /tmp/deu-2018-12-02.1rr/deu.Journal.exp0.tif

...

...

Stripped 72 unrenderable words

Rendered page 4969 to file /tmp/deu-2018-12-02.l2i/deu.Journal.exp0.tif

Rendered page 4937 to file /tmp/deu-2018-12-02.l2i/deu.Desyrel.exp0.tif

Stripped 3 unrenderable words

Rendered page 4970 to file /tmp/deu-2018-12-02.l2i/deu.Journal.exp0.tif

 

=== Phase UP: Generating unicharset and unichar properties files ===

[So 2. Dez 22:04:32 CET 2018] /usr/local/bin/unicharset_extractor 
--output_unicharset 
/tmp/deu-2018-12-02.l2i/deu.unicharset --norm_mode 1 /tmp/deu-2018-12-02.l2i
/deu.Desyrel.exp0.box /tmp/deu-2018-12-02.l2i/deu.Journal.exp0.box

Extracting unicharset from box file /tmp/deu-2018-12-02.l2i/deu.Desyrel.exp0
.box

Extracting unicharset from box file /tmp/deu-2018-12-02.l2i/deu.Journal.exp0
.box

Wrote unicharset file /tmp/deu-2018-12-02.l2i/deu.unicharset

[So 2. Dez 22:06:19 CET 2018] /usr/local/bin/set_unicharset_properties -U /
tmp/deu-2018-12-02.l2i/deu.unicharset -O /tmp/deu-2018-12-02.l2i/deu.unicharset 
-X /tmp/deu-2018-12-02.l2i/deu.xheights --script_dir=./langdata

Loaded unicharset of size 117 from file /tmp/deu-2018-12-02.l2i/deu.
unicharset

Setting unichar properties

Setting script properties

Failed to load script unicharset from:./langdata/Latin.unicharset

Warning: properties incomplete for index 3 = M

...

...

Warning: properties incomplete for index 114 = "

Warning: properties incomplete for index 115 = i

Warning: properties incomplete for index 116 = €

Writing unicharset to file /tmp/deu-2018-12-02.l2i/deu.unicharset

=== Phase E: Generating lstmf files ===

Using TESSDATA_PREFIX=./tessdata

[So 2. Dez 22:06:21 CET 2018] /usr/local/bin/tesseract 
/tmp/deu-2018-12-02.l2i/deu.Desyrel.exp0.tif 
/tmp/deu-2018-12-02.l2i/deu.Desyrel.exp0 --psm 6 lstm.train

[So 2. Dez 22:06:21 CET 2018] /usr/local/bin/tesseract 
/tmp/deu-2018-12-02.l2i/deu.Journal.exp0.tif 
/tmp/deu-2018-12-02.l2i/deu.Journal.exp0 --psm 6 lstm.train

Error opening data file ./tessdata/eng.traineddata

Error opening data file ./tessdata/eng.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to your "
tessdata" directory.

Failed loading language 'eng'

Tesseract couldn't load any languages!

Could not initialize tesseract.

Please make sure the TESSDATA_PREFIX environment variable is set to your "
tessdata" directory.

Failed loading language 'eng'

Tesseract couldn't load any languages!

Could not initialize tesseract.

ERROR: /tmp/deu-2018-12-02.l2i/deu.Desyrel.exp0.lstmf does not exist or is 
not readable



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this 

[tesseract-ocr] Training with Font Files

2018-11-28 Thread 'Rob' via tesseract-ocr
Hello,

i want to create a traineddata file based on a few different fonts. I'm 
using Tesseract 4.0 with LSTM.
Whats the easiest way? Is there a Tool to train Tesseract with font files 
directly (.tff- files) or do i have to create Text images based on the Font 
and then use those to train?

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/79f1f5e1-7473-467e-b5e2-f468a0d24225%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Handwriting training

2018-11-26 Thread Rob
Hello everyone,

I am currently working on making a scanned fillable text document readable 
for the computer. This document can be filled in with computer writing as 
well as with handwriting. The quality of the scanned document is good 
enough and the font is not too small. I'm sing Ubuntu 18.04, Python 3 and 
Tesseract 4.0.

What is the best way to recognize both types of font (in particular 
handwriting)? Do you have some easy steps for me to archieve the Training 
for this Problem?
I found this "https://github.com/OCR-D/ocrd-train;, it seems to make the 
Training Process a lot easier right?

Thanks in advance and best wishes.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/783dc358-e7b7-47f7-9a82-06552d3af37d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Configuring blacklists in windows 7

2015-04-12 Thread Rob
Hi All,
I'm using Tesseract 3.02.02 on a windows 7 computer, via gImageReader GUI 
front-end (so I don't have to go into the black stuff, ms-dos).
Works well, except... same problem as everyone else: character sequence fi 
and fl are replaced by unicode(?) characters 0xFB01 and 0xFB02, latin 
ligatures small fi and fl.

Solution in a few other threads is to put a blacklist in the config file, 
but I've tried and not succeeded. How do you actually do that in the 
windows operating system?

Firstly: There is no config file, as such. Tesseract is not installed, 
but has its files copied across to the directory:
C:\Users\rob\AppData\Local\Tesseract-OCR

Deeper down there are 3 more directories:

1.C:\Users\rob\AppData\Local\Tesseract-OCR\tessdata
which has the files:
eng.traineddata
eng.cube.fold
eng.cube.lm_
eng.cube.word-freq
eng.cube.size
eng.cube.nn
eng.cube.params
eng.cube.bigrams
eng.cube.lm
eng.tesseract_cube.nn
osd.traineddata

plus 2 directories:

2. C:\Users\rob\AppData\Local\Tesseract-OCR\tessdata\configs
which has the files:
ambigs.train
api_config
bigram
box.train
box.train.stderr
digits
hocr
inter
kannada
linebox
logfile
makebox
quiet
rebox
strokewidth
unlv

3.C:\Users\rob\AppData\Local\Tesseract-OCR\tessdata\tessconfigs
which has the files:
batch
batch.nochop
matdemo
msdemo
nobatch
segdemo


Is one of these the configuration file I need to edit?

Note also, windows standard editor would be ms-notepad, you have option to 
save text as ANSI, UTF-8, Unicode or Unicode big-endian. Which is the 
correct one to use - ANSI is standard, but won't allow you to save the 
ligatures, so it must be one of the others. I've tried them all, editing 
existing files and adding new files. Always failed.


More info: I know nothing about programming, have no compiler on my 
computer. I downloaded working executables from sourceforge or github or 
googlecode or somewhere. Managed to get them going without too much fuss by 
following the instructions.
I never did any training of Tesseract - it came already trained, presumably.

But I can't find any simple configuration instructions to follow to get rid 
of the latin fi and fl ligatures by editing windows files. And I want to 
get rid of them - convert each to two standard english letters for saving 
the files as english text.

Any help appreciated,
Regards,
Rob



-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/eef3df68-25db-4a95-b0ef-9786edbbb99a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] mftraining core dump - Illegal malloc request size on Ubuntu...

2014-05-19 Thread Rob Stewart
Thanks Nick!
  Regarding mftraining - I just couldn't see what was wrong, I must have 
went a bit code blind there.

  Things are working now with a simple change to that one line...

mftraining -F font_properties -U unicharset.out -O unicharset.out2 
eng.FreeSans.exp0.tr

  So it's onto testing to see what difference all this can make.

  Good idea about the make file.

Thanks once more!

-- 
Rob

-- 
-- 
Texthelp Ltd is a limited company registered in Belfast, N. Ireland with 
registration number NI31186 having its registered office and principal 
place of business at Lucas Exchange, 1 Orchard Way, Antrim, N. Ireland, 
BT41 2RU.

-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8da43571-b54d-4237-bb2a-1f1c6c418992%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] mftraining core dump - Illegal malloc request size on Ubuntu...

2014-05-16 Thread Rob Stewart
Hi!
  I've been trying to train tesseract and after a hard day getting all the 
dependencies downloaded and compiled I managed to get so far down the 
training documentation.

  I'm using Ubuntu 14.04LTS and I've downloaded and compiled leptonica-1.70.

  I ended up creating a shell script after compiling and installing 
tesseract and tesseract-training...

 Start of file (called commands.sh)...

#!/bin/bash

# Get a copy of Tesseract src code...
#   svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ 
tesseract-ocr-read-only
#
# Make a folder, let's call it 'training_text'
#   mkdir training_text
#   cd training_text
#
# Create a '1.txt' file containing the training text. (Try the Gutenburg 
project).
# Copy 'font_properties' from tesseract-ocr-read-only/training/langdata...
#   cp ../tesseract-ocr-read-only/training/langdata/font_properties .
#
# Run this commands file...
#   commands.sh

# Remove any previously generated files (you will get errors
# if this is the first time you run this, but it's OK)...

rm eng.FreeSans.exp0.box
rm eng.FreeSans.exp0.tif
rm eng.FreeSans.exp0.tr
rm eng.FreeSans.exp0.txt
rm shapetable
rm unicharset
rm unicharset.out

# Try to generate them again...

text2image --text=1.txt -outputbase=eng.FreeSans.exp0 --font='FreeSans' 
--fonts_dir=/usr/share/fonts/truetype/freefont

tesseract eng.FreeSans.exp0.tif eng.FreeSans.exp0 box.train

unicharset_extractor eng.FreeSans.exp0.box

set_unicharset_properties -U unicharset -O unicharset.out 
--script_dir=../tesseract-ocr-read-only/training/langdata

shapeclustering -F font_properties -U unicharset eng.FreeSans.exp0.tr
#shapeclustering -F font_properties -U unicharset.out eng.FreeSans.exp0.tr

mftraining -F font_properties -U unicharset -O eng.FreeSans.exp0.tr
#mftraining -F font_properties -U unicharset.out -O eng.FreeSans.exp0.tr

#cntraining eng.FreeSans.exp0.tr

 End of file

Once I get down to shaperclustering I can't tell from the documentation 
which unicharset file to use the first one produced or the one produced by 
the 'set_unicharset_properties' command.

Either way the mftraining usually fails, sometimes a second attempt at 
running shapeclustering and mftraining outside of this shell file works, 
but almost every time I get the following error...

 Start of Error (mftraining)

Error: Illegal malloc request size!
Fatal error encountered! == NULL:Error:Assert failed:in file 
globaloc.cpp, line 75
./commands.sh: line 40: 20958 Segmentation fault  (core dumped) 
mftraining -F font_properties -U unicharset -O eng.FreeSans.exp0.tr

 End of Error

And even worse the cntraining command doesn't work at all...

 Start of Error (cntraining)

Error: Illegal short name for a feature!
Fatal error encountered! == NULL:Error:Assert failed:in file 
globaloc.cpp, line 75
Segmentation fault (core dumped)

 End of Error

  What am I doing wrong?
  Any help would be appreciated. Also I think adding this kind of shell 
script (or equivalent) to a 'fast start' for training could be useful.

Rob

-- 
-- 
Texthelp Ltd is a limited company registered in Belfast, N. Ireland with 
registration number NI31186 having its registered office and principal 
place of business at Lucas Exchange, 1 Orchard Way, Antrim, N. Ireland, 
BT41 2RU.

-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/63157b27-eb70-467c-bae9-69b12931dadb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: differences between IOS version and regular version

2014-01-04 Thread Rob Mathews
read my thread more carefully. i did recompile against tesseract 3.02

Typos courtesy of my iPhone

 On Jan 4, 2014, at 6:15 PM, Benjamin Sølberg benjamin.soelb...@gmail.com 
 wrote:
 
 Hi Robert
 
 You probably already know this but your project uses an old version/snapshot 
 of tesseract.
 Just a heads up as I was hoping that that you were using the latest code :-)
 There have been at least one fix regarding the osx version.
 
 Benjamin
 
 Den fredag den 3. januar 2014 21.20.27 UTC+1 skrev Robert Mathews:
 
 I recompiled against the latest tesseract and leptonica-1.69
 
 You can see the project I used to compile here: 
 https://github.com/robmathews/compile-tesseract
 
 Then, I updated the sample ios app to 
 - use tesseract 3.02 + leptonica-1.69
 - allow choosing a photo from the photo library
 
 and checked into this fork: https://github.com/robmathews/OCR-iOS-Example
 
 And that's all I know.
 
 -- 
 -- 
 You received this message because you are subscribed to the Google
 Groups tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en
  
 --- 
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
-- 
You received this message because you are subscribed to the Google
Groups tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Tessarect Version - Linux ?

2013-10-17 Thread Rob Townley
yum info tesseract
apt-get show tesseract
 On Oct 17, 2013 5:40 AM, Sriram Varadharajan varadhuku...@gmail.com
wrote:

 I have tessarect installed in linux machine and wanted to find out what
 version it is. I tried using command line  tessarect --version and it does
 not give out the version.Please let me know if someone has encountered the
 same.

 Thanks

 --
 --
 You received this message because you are subscribed to the Google
 Groups tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en

 ---
 You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
-- 
You received this message because you are subscribed to the Google
Groups tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Is there a variable for tuning character spacing?

2011-11-28 Thread Rob
Hi Merve,

Thank you for your reply!  I think my case is slightly different.  I
want to adjust the spacing threshold on *input* images, not the output
text.  In my case, I get *no* output, whereas you get output that is
spaced improperly.

I see your question here:
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/cfffeed5da7ab757/08e70a97c50e32e7?lnk=gstq=space+threshold#08e70a97c50e32e7

Your image said apple and tesseract produced app le.

In my case, I get no output.

Here are the two images: http://imgur.com/a/KSeiW

The first produces no output, the second one produces 591.

Anyone else have a suggestion?

Thanks again,

Rob



On Nov 28, 8:17 am, Merve Temizer mervet2...@gmail.com wrote:
 I asked similar question a while ago, and had got a reply which tells:
 There is not such a variable to tell tesseract the space threshold between
 characters unfortunately

 2011/11/27 Rob r...@wholewhale.com







  Greetings, is there a variable for tuning character spacing?

  I ran tesseract on an image with three characters and it gave no
  result.  Then I used photoshop to add space between the characters,
  and it came out perfectly.

  Since I'm new, I'm wondering, is there a simple setting I can adjust,
  or is this something that would require training?

  Thanks!

  Rob

  --
  You received this message because you are subscribed to the Google
  Groups tesseract-ocr group.
  To post to this group, send email to tesseract-ocr@googlegroups.com
  To unsubscribe from this group, send email to
  tesseract-ocr+unsubscr...@googlegroups.com
  For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Is there a variable for tuning character spacing?

2011-11-27 Thread Rob
Greetings, is there a variable for tuning character spacing?

I ran tesseract on an image with three characters and it gave no
result.  Then I used photoshop to add space between the characters,
and it came out perfectly.

Since I'm new, I'm wondering, is there a simple setting I can adjust,
or is this something that would require training?

Thanks!

Rob

-- 
You received this message because you are subscribed to the Google
Groups tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


Re: tif to pdf

2011-03-08 Thread Rob Townley
On Wed, Mar 9, 2011 at 12:46 AM, Jeffrey Ratcliffe
jeffrey.ratcli...@gmail.com wrote:
 On 8 March 2011 20:25, UziTech tbri...@gmail.com wrote:
 is there an easy way to make the output a pdf or doc or format other
 than txt?

 I have built this functionality into gscan2pdf.

 Regards

 Jeff

if it is already in XML, then you way want to look at the package that
xmlresume uses to take xml and output to pdf, html, manpages, rtf...

-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Errors on startup after compiled in VS 2010 and Windows 7

2010-01-13 Thread Rob
I have successfully managed to compile tesseract in visual studio
2010, but the program hits an unhandled exception as soon as it
executes Unhandled exception at 0x00427be8 in cntraining.exe:
0xC005: Access violation reading location 0x.

I'm not sure if this has anything to do with Windows 7, but I haven't
been able to find anyone else having the same problem through a google
search.

Anyone have ideas on how to fix this?


Thank you
-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.




Re: Compillation in Visual Studio 2010

2010-01-12 Thread Rob
Thank you, that was very useful

On Jan 3, 10:49 pm, SURAJ suraj.supe...@gmail.com wrote:
 Hello all,

 I have tried to compile tessaract 3.0 inVisualStudio2010. Good news
 is its compiled but need small 2/3 changes in code due to new C++
 specifications followed in VS2010for Templates.

 I am using XP SP3 and VS2010Team edition.

 My Observations are
 1. Due to change in Template spec, you canot pass NULL in tamplates
 call. To overcome this problem you need to typecase NULL.

 Fortunetly all changes are in on file only. scrollview.cpp in Viewer
 Project. Path ..\tesseract\viewer

 Line 140 :

 Original : std::pairScrollView*, SVEventType awaiting_list_any_window
 (NULL,

 SVET_ANY);
 New : std::pairScrollView*, SVEventType awaiting_list_any_window
 ((ScrollView*)NULL,

 SVET_ANY);

 Original : waiting_for_events[ea] = std::pairSVSemaphore*, SVEvent*
 (sem,NULL);
 New: waiting_for_events[ea] = std::pairSVSemaphore*, SVEvent* (sem,
 ( SVEvent*)NULL);

 Line 430 :
 Original : std::pairScrollView*, SVEventType ea(NULL, SVET_ANY);
 New: std::pairScrollView*, SVEventType ea((ScrollView* )NULL,
 SVET_ANY);

 Line 433 :
 Original :  waiting_for_events[ea] = std::pairSVSemaphore*, SVEvent*
 (sem,NULL);
 New:  waiting_for_events[ea] = std::pairSVSemaphore*, SVEvent* (sem,
 (SVEvent*)NULL);

 I hope this information is useful for developers who wants to 
 useVisualstudio2010

 SURAJ
-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.




Compressing a sequence of spaces

2009-06-29 Thread Rob

Tesseract is compressing a sequence of spaces in an input TIFF into a
single space in the output text.  I want to preserve the original
spaces.

Tesseract 2.03
Debian 4 (2.6.18-5-686 kernel)
libtiff-tools
libtiff-dev

I'd appreciate any advice.

Thanks,
Rob

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Simple and fast editor of box files (QT)

2009-05-12 Thread Rob H.

nice! Thanks.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Simple and fast editor of box files (QT)

2009-05-08 Thread Rob H.

1. Ergonomically speaking, If you load a box then the corresponding
image should be loaded... and vice versa.
I'm not aware of any reason that someone would want to load an image
without a box file... or vice versa.
Since Tesseract generates a box/txt file with the same name as the
image, your editor should try to load both the image+box file at the
same time by default.
If both files are not in the same directory (e.g. if you keep images
in one directory and box files in another), then display a file
browser window to have the user select the corresponding box or image.

2. The characters I want to use are not mapped to any known keyboard
layouts. So I can't type them directly.
The only option is to copy/paste which is more tedious than typing the
actual unicode hex value.
Maybe you could show both the character and hex value on your pop-up
and use the TAB key to switch into hex mode where the user would
type 4 hex values?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Great tool for working with unicode

2009-05-04 Thread Rob H.

Copy and paste the following text into the basic notepad application.
It will show up as little boxes.
There's a good chance that your web browser doesn't have a unicode
enabled font, so most of the following characters will display as
garbage.

The following characters are: circled E, circled F, circled L, circled
L, circled U, circled P, circled S, circled S, circled T, circled U

ⒺⒻⓁⓁⓊⓅⓈⓈⓉⓊ

Or you can copy/paste those into the web app and view them:
http://rishida.net/scripts/uniview/uniview.php?codepoints=24BA 24BB
24C1 24C1 24CA 24C5 24C8 24C8 24C9 24CA


On May 3, 5:35 am, 74yrs old withblessi...@gmail.com wrote:
 Thanks. very good idea. will you please upload sample of little box?



 On Sun, May 3, 2009 at 9:21 AM, Rob H. hksny...@gmail.com wrote:

  I'm training Tess to recognize letters/numbers/symbols/etc. used for
  geometrical tolerancing and annotations (ASME Standard Y14.5)
  Alot of the characters used in the ASME standard are coming from all
  over the unicode tables (although the characters/words are from the
  English language).

  This is part of a data validation project and I'm using OCR as part of
  the process.
  Since OCR is not 100% accurate, some of the validation will need to be
  done by hand (hopefully as little as possible).
  If the person checking the annotation sees a little box (ie
  unprintable character) then it will slow down their job.
  For the moment, I check unprintable characters using the webapp which
  I posted above.
  Once this goes into production, there will be a font (purchasd or home-
  brewed) which can correctly draw all the letters/numbers/symbols/etc.

  On May 2, 7:04 am, 74yrs old withblessi...@gmail.com wrote:
   Hi Rob,
   I know about conversion.php which I am using for long time for Kannada
   project.
   Will you kindly explain by step by step  of your experiment with sample
  if
   any. I
   wanted to have hands on experience.  BTW which lang. you were training?
   Regards,
   sriranga(76yrs old)

   On Sat, May 2, 2009 at 6:37 AM, Rob H. hksny...@gmail.com wrote:

Also, I got this e-mail from a someone named Albert
=
Hi Rob,

Reply to your ps

That doesn't make any sense to me.  You are asking for a set of glyphs
that can represent every Unicode character in existence.  Not
only would such a file be *HUGE* in size, but I can't see it as
serving any purpose to anyone (other than you, I guess)...

So you should stop looking for it.

-
Albert
=

Arial Unicode covers ~50K of the ~140K characters defined at
unicode.org. This font file is 22mb.
Wouldn't a complete unicode font be around 70mb?

If you need a general text viewer which can legibly show documents
that contain any number of the valid ~140K characters,
then a complete font would be useful.

Great advice Albert...*roll eyes*... stop looking... how about
something a little more constructive?
maybe you know a strategy of mixing fonts to enable an application to
view all the possible unicode characters?- Hide quoted text -

 - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Great tool for working with Unicode

2009-05-04 Thread Rob H.

Thanks for the reply Albert. I think I'll stop looking ... for a
silver bullet and create a strategy which covers my set of glyphs.
(maybe the pdf solution will work).

I thought Unicode did specify what a character looks like (on a basic
level), and then fonts were responsible for their interpretation
(which can be completely off).
For example, WingDings is vastly different from what Unicode shows
in their PDF renderings. I assumed that the character drawn in those
unicode files were a basic rendition of what the character should look
like.

Do you have any experience creating fonts? I might create one... it
doesn't have to be pretty... just needs to help the user accomplish
their task of comparing text extract from the UI vs text extracted
from the model.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Not able to use it

2009-05-04 Thread Rob H.

You probably need some language data. Check the downloads page again
for this.
Once you've unzipped your language, there should be a directory called
tessdata under which you will see files with file extensions like
DangAmbigs, inttemp, pffmtable, etc...

This tessdata directory would be located here (in the same sub
directory as tesseract.exe):
\tesseract-2.03\tessdata

All languages you download, or create, will be placed in that
directory.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Tesseract 3.0

2009-05-04 Thread Rob H.

But seriously... I'm writing a fairly interesting application using
Tesseract for my client: Gulfstream Aerospace.
I have no problem testing 3.0, especially if I can get some
performance gains.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Ehm Ehm

2009-05-04 Thread Rob H.

Start by reading through here:
http://code.google.com/p/tesseract-ocr/wiki/ReadMe

You probably need Visual Studio C++ Express (I think 2005 and 2008
will work).
You open the *.sln file and build the solution.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Great tool for working with unicode

2009-05-02 Thread Rob H.

I'm training Tess to recognize letters/numbers/symbols/etc. used for
geometrical tolerancing and annotations (ASME Standard Y14.5)
Alot of the characters used in the ASME standard are coming from all
over the unicode tables (although the characters/words are from the
English language).

This is part of a data validation project and I'm using OCR as part of
the process.
Since OCR is not 100% accurate, some of the validation will need to be
done by hand (hopefully as little as possible).
If the person checking the annotation sees a little box (ie
unprintable character) then it will slow down their job.
For the moment, I check unprintable characters using the webapp which
I posted above.
Once this goes into production, there will be a font (purchasd or home-
brewed) which can correctly draw all the letters/numbers/symbols/etc.


On May 2, 7:04 am, 74yrs old withblessi...@gmail.com wrote:
 Hi Rob,
 I know about conversion.php which I am using for long time for Kannada
 project.
 Will you kindly explain by step by step  of your experiment with sample if
 any. I
 wanted to have hands on experience.  BTW which lang. you were training?
 Regards,
 sriranga(76yrs old)

 On Sat, May 2, 2009 at 6:37 AM, Rob H. hksny...@gmail.com wrote:

  Also, I got this e-mail from a someone named Albert
  =
  Hi Rob,

  Reply to your ps

  That doesn't make any sense to me.  You are asking for a set of glyphs
  that can represent every Unicode character in existence.  Not
  only would such a file be *HUGE* in size, but I can't see it as
  serving any purpose to anyone (other than you, I guess)...

  So you should stop looking for it.

  -
  Albert
  =

  Arial Unicode covers ~50K of the ~140K characters defined at
  unicode.org. This font file is 22mb.
  Wouldn't a complete unicode font be around 70mb?

  If you need a general text viewer which can legibly show documents
  that contain any number of the valid ~140K characters,
  then a complete font would be useful.

  Great advice Albert...*roll eyes*... stop looking... how about
  something a little more constructive?
  maybe you know a strategy of mixing fonts to enable an application to
  view all the possible unicode characters?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Great tool for working with unicode

2009-05-01 Thread Rob H.

Well Tesseract 2.0 has support for unicode, but many times it can be
hard to understand the results of the OCR because the characters are
not printable in many fonts.

Typically in text editors (including Notepad++, UltraEdit, MS Word,
Notepad, etc.), an unrecognized character will be displayed as a
simple box. This is not readable.
So, to verify your results, especially while training, you need to
check how accurate the results came out.

So, if you are using unprintable characters and don't have a font
which recognizes them correctly, then this webapp will help you know
which character the OCR recognized unless you know off the top of
your head what hex value matches what characters you want.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Great tool for working with unicode

2009-05-01 Thread Rob H.

Also, I got this e-mail from a someone named Albert
=
Hi Rob,

Reply to your ps

That doesn't make any sense to me.  You are asking for a set of glyphs
that can represent every Unicode character in existence.  Not
only would such a file be *HUGE* in size, but I can't see it as
serving any purpose to anyone (other than you, I guess)...

So you should stop looking for it.


-
Albert
=

Arial Unicode covers ~50K of the ~140K characters defined at
unicode.org. This font file is 22mb.
Wouldn't a complete unicode font be around 70mb?

If you need a general text viewer which can legibly show documents
that contain any number of the valid ~140K characters,
then a complete font would be useful.

Great advice Albert...*roll eyes*... stop looking... how about
something a little more constructive?
maybe you know a strategy of mixing fonts to enable an application to
view all the possible unicode characters?






--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: What causes this error? 6 classes in inttemp while unicharset contains 7

2009-04-23 Thread Rob H.

Do you know what the problem is already? Maybe you could point me to
the method which needs to be fixed, and explain the problem?

PS:
Is it just my VS2005 setup, or am I seeing the for/if/function
statements split up over multiple rows (must be some leftover HP
stuff)?



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: My training methodology does not work :(

2009-04-20 Thread Rob H.

Have you tried building a dictionary of words...word-dawg + freq-dawg.
At least try putting those 2 words (mother india) into your
dictionary.

I am starting to train the OCR to recognize special characters and
I've considered this single character approach, but not yet tried it.
I am leaning towards building a page of special characters now.


On Apr 17, 3:18 pm, Debayan Banerjee debaya...@gmail.com wrote:
 As much as I hate to admit it my training methodology
 http://hacking-tesseract.blogspot.com/2009/04/my-old-training-methodo...
 of generating one image per akshar  does not work. I hate to say it
 since I put some effort into writing the Python code that does this .
 Well the reason is probably that Tesseract OCR training code looks for
 characters on a single line during training as it also extracts base
 line metrics http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
 for rare/strange characters like numerals. As such it may not be able
 to extract all the information it needs for its training.
 Or may be Tesseract OCR training code accepts a very little number of
 .tr files and since my code generates thousands of tr files, it
 becomes useless.
 Let me show you an example of how miserably it failed.
 I decided to test the training on the string  ভারত মাতা  (Bharat
 Mata which means Mother India). I generated the tiff image using Pango
 rendering.
 Then I generated 7 images per sample of ভ র ত ম and used the
 subsequently generated training fils for OCR.
 The result was this:  মভতভ  
 Yes, I know. The result is absolutely outrageous.
 However, what if I still auto-generate images of characters but this
 time in single lines adjacently? Will it work?
 You may go throughhttp://hacking-tesseract.blogspot.com/for all my work.

 --
 Be Intelligent, Use GNU/Linux

 http://debayanin.googlepages.com/http://debayan.wordpress.comhttp://lug.nitdgp.ac.in
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



What causes this error? 6 classes in inttemp while unicharset contains 7

2009-04-20 Thread Rob H.

I've read through all the related threads on the topic, but I don't
understand what causes this problem.
Does this error even matter, since I can modify the unicharset file by
removing the extra characters?

I ask because I'm having this problem with some fonts which I am
training now.
I have trained 2 fonts without this problem and then there are the 2
fonts which have this problem.

In the end, I'm wondering how good the OCR will be, if I remove
special unicode characters from the unicharset which are needed in my
results?

- Some analysis -

I am running with the 2.03 code, which I downloaded and compiled.

Here is a sample error:
APPLY_BOXES: boxfile 1/2/h ((47,1546),(80,1594)): FAILURE! box
overlaps blob in labelled word
APPLY_BOXES: ALSO ignoring corrupted char blk:1 row:1 T

When tess generated the box, it had created a box around 2 letters, so
I modified the box file to have 2 boxes instead of 1. This error
complains about one of my boxes...

I noticed that the *.tr is missing the two letters which were in these
two boxes which I created.

So, based on this quote from training page, I suppose splitting a box
is not supported?
If you didn't sucessfully space out the characters on the training
image, some may have been joined into a single box. In this case, you
can either remake the images with better spacing and start again, or
if the pair is common, put both characters at the start of the line,
leaving the bounding box to represent them both.




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: How much training data - if characters are always the same

2009-03-31 Thread Rob H.

-Ray S.
I noticed in this thread:
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/71a41fa5065855c9
You said:
The training process usually uses a minimum of
5-10 samples of each character in each font.

When my character is drawn in the exact same size/shape/etc. on the
image, but in different locations, does the training still need 5-10
samples of each character?
Is the goal to have the OCR understand a certain character when it is
next to other characters? I'm interested in understanding why (either
way)...

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: Tesseract 3.0???

2009-03-25 Thread Rob H.

Has Version 3.0 been discussed somewhere else on this google group?
I'm curious about the upcoming features?

On Feb 25, 7:49 pm, Ray Smith theraysm...@gmail.com wrote:
 If everything goes according to plan, it should be available around the end
 of March. I can't promise anything though, other than that it *will* be
 worth the wait!Ray.

 On Wed, Feb 11, 2009 at 8:32 PM, bharath bhooshan abbhoos...@gmail.comwrote:



  We are eagerly waiting for that.

  On Wed, Feb 11, 2009 at 11:37 PM, Swistak swistak...@gmail.com wrote:

  Same question. Any approximate date will be appreciated.- Hide quoted text 
  -

 - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---



Re: How to decrease Tif file size

2009-03-06 Thread Rob Townley

convert to group 4 fax using command line ImageMagik?


On 3/6/09, Rags2u raghu7...@gmail.com wrote:

 Hi,

 Im using Tesseract2.dll for my project. Tif files with size in KB is
 working fine and converting to Text files. But the Tif files with size
 in MB is not working. It is not converting to text files.

 Can anybody help me how to decrease the Tif file size? or any other
 suggesion for this issue?

 Thanks in advance.

 Raghu.
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~--~~~~--~~--~--~---