Image Dimensions allowed in Tessearct

2011-02-23 Thread Ann
What is the maximum dimension (width and height) of an image allowed
by tesseract for training?

Is there any restriction in the image dimension (width and height)
when creating an image in Java?

-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Detect highlighted fields

2011-02-23 Thread ozan
Hi all,

I am new to OCR programming and I wanted to know if its possible to
detect whether certain fields are highlighted (e.g. different
background color) or not with tesseract? As far as I know the image is
converted to 2 bit so it won't be able to tell the difference between
a highlighted field or a regular field but I just wanted to make sure.

Thank you...

Ozan.

-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Image pre-processing for good OCR results

2011-02-23 Thread TP
On Sun, Feb 20, 2011 at 6:02 PM, Jon Andersen jande...@gmail.com wrote:
 Hi,
 My project at http://RecordAGrave.com is about recording headstones from
 graves and posting the text and images on the Net so that people can
 research their family history.  I would appreciate some advice on how to
 pre-process these headstone images to get the best results from Tesseract
 OCR.  I have thousands of 1-2 MB jpg images of headstones to process.
 Example images:
 http://freepages.genealogy.rootsweb.ancestry.com/~janderse/cemeteries/Star%20of%20David%20Memorial%20Gardens/Garden%20of%20Haifa%20-%20Raw/IMG_28215.jpg
 http://freepages.genealogy.rootsweb.ancestry.com/~janderse/cemeteries/Star%20of%20David%20Memorial%20Gardens/Garden%20of%20Haifa%20-%20Raw/IMG_28216.jpg
 http://freepages.genealogy.rootsweb.ancestry.com/~janderse/cemeteries/Star%20of%20David%20Memorial%20Gardens/Garden%20of%20Haifa%20-%20Raw/IMG_28217.jpg
 I am a software developer so I can script up pre-processing steps to prepare
 the input for Tesseract.
 Any advice on improving OCR accuracy through pre-processing steps?
 Thanks so much,

 -Jon

 --
 You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en.


I guess I'm a bit surprised that no one has yet mentioned the fact
that the Leptonica C Image Processing Library
(http://www.leptonica.com) is now required to build tesseract-ocr --
or soon will be... the current state of tesseract-ocr is a bit hazy.
My understanding is that eventually (not in the near future though)
tesseract-ocr will only use Leptonica PIXs as its in-memory image
representation.

A still unofficial, easier to read, Sphinx generated version of the
Leptonica documentation is at
http://tpgit.github.com/UnOfficialLeptDocs/. Dan is currently
hammering away at v1.68 and it should be out soon (this week?). At
which point I'll also update my unofficial version of the
documentation.

My admittedly quick/biased opinion was that OpenCV focused on Computer
Vision and that Leptonica has more pure Image Processing routines. I
also find Leptonica's source code fairly easy to read because one of
the purposes of the library is to try to teach image processing
concepts.

In any case, if you're planning on using tesseract-ocr 3.x, then you
already must have liblept, so you might as well try it out.

-- TP

-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Wrappers for tessearct3.01?

2011-02-23 Thread Ray Smith
+1 to sticking with the TessBaseAPI + PageIterator + ResultIterator.
Delving too far into the guts is likely to get you into all kinds of
trouble, especially as we are making rapid improvements, some of which are
quite radical. The idea is that the above 3 APIs give you everything you
need.

If recog_all_words isn;t working for you, perhaps you  aren't getting your
image into a Pix correctly. Try pixWrite(filename.png, pix, IFF_PNG);
If a common display tool displays the file correctly, then you have created
the pix correctly, otherwise you need to learn how to write your image data
into a pix.

Ray.

On Tue, Feb 15, 2011 at 9:52 PM, Dmitry Silaev daemons2...@gmail.comwrote:

 devTess,

 I'd not ask questions like this as Tess is undergoing transition from the
 old code base and is under hard development of new features. I've no enough
 time to investigate but the prev_word_best_choice_ data member seems to be
 related to best segmentation search based on the language model.

 Instead of rummaging in Tess's guts I'd better use a pretty convenient and
 high-level interface provided by ResultIterator (see GetIterator() in
 baseapi.h and then read all comments in resultiterator.h and
 pageiterator.h)

 Warm regards,
 Dmitry Silaev





 On Wed, Feb 16, 2011 at 5:34 AM, devTess jim...@googlemail.com wrote:

 Question:
 where can I find out more about (see below)

 tesseract_-prev_word_best_choice_


 What is the purpose of doing that?
 Why is it that it is not sufficient just to

 page_res_ = new PAGE_RES(block_list_);

 Thank you.
 =

 int TessBaseAPI::RecognizeText(ETEXT_DESC* monitor) {

  if (tesseract_ == NULL)
return -1;
  if (page_res_ != NULL)
delete page_res_;

  block_list_ =FindLinesCreateBlockList();

  tesseract_-SetBlackAndWhitelist();
  recognition_done_ = true;

  page_res_ = new PAGE_RES(block_list_, tesseract_-
 prev_word_best_choice_);

   // Now run the main recognition.
 tesseract_-recog_all_words(page_res_, monitor, NULL, NULL, 1);

return 0;
 }

 --
 You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.