Image Dimensions allowed in Tessearct
What is the maximum dimension (width and height) of an image allowed by tesseract for training? Is there any restriction in the image dimension (width and height) when creating an image in Java? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Detect highlighted fields
Hi all, I am new to OCR programming and I wanted to know if its possible to detect whether certain fields are highlighted (e.g. different background color) or not with tesseract? As far as I know the image is converted to 2 bit so it won't be able to tell the difference between a highlighted field or a regular field but I just wanted to make sure. Thank you... Ozan. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Image pre-processing for good OCR results
On Sun, Feb 20, 2011 at 6:02 PM, Jon Andersen jande...@gmail.com wrote: Hi, My project at http://RecordAGrave.com is about recording headstones from graves and posting the text and images on the Net so that people can research their family history. I would appreciate some advice on how to pre-process these headstone images to get the best results from Tesseract OCR. I have thousands of 1-2 MB jpg images of headstones to process. Example images: http://freepages.genealogy.rootsweb.ancestry.com/~janderse/cemeteries/Star%20of%20David%20Memorial%20Gardens/Garden%20of%20Haifa%20-%20Raw/IMG_28215.jpg http://freepages.genealogy.rootsweb.ancestry.com/~janderse/cemeteries/Star%20of%20David%20Memorial%20Gardens/Garden%20of%20Haifa%20-%20Raw/IMG_28216.jpg http://freepages.genealogy.rootsweb.ancestry.com/~janderse/cemeteries/Star%20of%20David%20Memorial%20Gardens/Garden%20of%20Haifa%20-%20Raw/IMG_28217.jpg I am a software developer so I can script up pre-processing steps to prepare the input for Tesseract. Any advice on improving OCR accuracy through pre-processing steps? Thanks so much, -Jon -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. I guess I'm a bit surprised that no one has yet mentioned the fact that the Leptonica C Image Processing Library (http://www.leptonica.com) is now required to build tesseract-ocr -- or soon will be... the current state of tesseract-ocr is a bit hazy. My understanding is that eventually (not in the near future though) tesseract-ocr will only use Leptonica PIXs as its in-memory image representation. A still unofficial, easier to read, Sphinx generated version of the Leptonica documentation is at http://tpgit.github.com/UnOfficialLeptDocs/. Dan is currently hammering away at v1.68 and it should be out soon (this week?). At which point I'll also update my unofficial version of the documentation. My admittedly quick/biased opinion was that OpenCV focused on Computer Vision and that Leptonica has more pure Image Processing routines. I also find Leptonica's source code fairly easy to read because one of the purposes of the library is to try to teach image processing concepts. In any case, if you're planning on using tesseract-ocr 3.x, then you already must have liblept, so you might as well try it out. -- TP -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Wrappers for tessearct3.01?
+1 to sticking with the TessBaseAPI + PageIterator + ResultIterator. Delving too far into the guts is likely to get you into all kinds of trouble, especially as we are making rapid improvements, some of which are quite radical. The idea is that the above 3 APIs give you everything you need. If recog_all_words isn;t working for you, perhaps you aren't getting your image into a Pix correctly. Try pixWrite(filename.png, pix, IFF_PNG); If a common display tool displays the file correctly, then you have created the pix correctly, otherwise you need to learn how to write your image data into a pix. Ray. On Tue, Feb 15, 2011 at 9:52 PM, Dmitry Silaev daemons2...@gmail.comwrote: devTess, I'd not ask questions like this as Tess is undergoing transition from the old code base and is under hard development of new features. I've no enough time to investigate but the prev_word_best_choice_ data member seems to be related to best segmentation search based on the language model. Instead of rummaging in Tess's guts I'd better use a pretty convenient and high-level interface provided by ResultIterator (see GetIterator() in baseapi.h and then read all comments in resultiterator.h and pageiterator.h) Warm regards, Dmitry Silaev On Wed, Feb 16, 2011 at 5:34 AM, devTess jim...@googlemail.com wrote: Question: where can I find out more about (see below) tesseract_-prev_word_best_choice_ What is the purpose of doing that? Why is it that it is not sufficient just to page_res_ = new PAGE_RES(block_list_); Thank you. = int TessBaseAPI::RecognizeText(ETEXT_DESC* monitor) { if (tesseract_ == NULL) return -1; if (page_res_ != NULL) delete page_res_; block_list_ =FindLinesCreateBlockList(); tesseract_-SetBlackAndWhitelist(); recognition_done_ = true; page_res_ = new PAGE_RES(block_list_, tesseract_- prev_word_best_choice_); // Now run the main recognition. tesseract_-recog_all_words(page_res_, monitor, NULL, NULL, 1); return 0; } -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.