OCR Sensitivity
Hi, I am really new to Tesseract OCR 3.0 as a static DLL within a windows envionment and have the majority of what I want working but... Is there a way to increase the sensitivity of the OCR engine? For instance, I am passing JPG images that purely have images of registration plates (ANPR essentially) but the OCR engine reads 1 as I 0 as U 8 as S I have tried altering params on the TessBaseAPI::INit but this simply crashes it set to nything other than OME_DEFAULT I have also set up a char whilelist to limit to certain chars/digits Any help would be appreciated. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Especial Characteres
Dimitry, I had generated traineddata(Kannada) files sucessfully from the old datafiles of 2.xx last year. There is discussion by spohorsky in the forum how to do. sriranga(78) ♫ On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Especial Characteres
Sriranga, Thanks for letting me know. You are the first one then, and I invented the bicycle )) However an article might be still of use instead of verbose forum discussion... May be you'd like to write it then? Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dimitry, I had generated traineddata(Kannada) files sucessfully from the old datafiles of 2.xx last year. There is discussion by spohorsky in the forum how to do. sriranga(78) ♫ On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: OCR Sensitivity
The answer lies within your own question! Since you expect only digits, simply accept these letters as the equivalent digit by replacing them. Patrick On Mar 3, 5:09 am, Richard rhe...@dial.pipex.com wrote: Hi, I am really new to Tesseract OCR 3.0 as a static DLL within a windows envionment and have the majority of what I want working but... Is there a way to increase the sensitivity of the OCR engine? For instance, I am passing JPG images that purely have images of registration plates (ANPR essentially) but the OCR engine reads 1 as I 0 as U 8 as S I have tried altering params on the TessBaseAPI::INit but this simply crashes it set to nything other than OME_DEFAULT I have also set up a char whilelist to limit to certain chars/digits Any help would be appreciated. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: OCR Sensitivity
Yes, that is possible and I am scanning now to do that but its not possible to always the know the format of the plate and just changing random chars may/does give strange results. On Mar 3, 1:13 pm, patrickq patrick.questemb...@gmail.com wrote: The answer lies within your own question! Since you expect only digits, simply accept these letters as the equivalent digit by replacing them. Patrick On Mar 3, 5:09 am, Richard rhe...@dial.pipex.com wrote: Hi, I am really new to Tesseract OCR 3.0 as a static DLL within a windows envionment and have the majority of what I want working but... Is there a way to increase the sensitivity of the OCR engine? For instance, I am passing JPG images that purely have images of registration plates (ANPR essentially) but the OCR engine reads 1 as I 0 as U 8 as S I have tried altering params on the TessBaseAPI::INit but this simply crashes it set to nything other than OME_DEFAULT I have also set up a char whilelist to limit to certain chars/digits Any help would be appreciated. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Especial Characteres
Sriranga, Actually I don't understand why one needs to refer to the forum discussion you've just mentioned above, as I managed to build this traineddata file without writing a single line of code and even without a compiler, say Visual C++... The value I can add is in that any user inexperienced in programming can make this traineddata file himself )) Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:08 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dmitry, No I am NOT the first invented but actually credited to spohor...@sjm.com -who helped me very lot including creating vcproj for combined traineddata for windows. I am very thankful to him for his help/guidance rendered from time to time. Without his help I would not succeeded to generate traineddata file out of old datafiles All credits should go to Steve. Steve has already explained in detail how to do in the forum discussion are available. -sriranga(78yrs) On Thu, Mar 3, 2011 at 6:36 PM, Dmitry Silaev daemons2...@gmail.com wrote: Sriranga, Thanks for letting me know. You are the first one then, and I invented the bicycle )) However an article might be still of use instead of verbose forum discussion... May be you'd like to write it then? Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dimitry, I had generated traineddata(Kannada) files sucessfully from the old datafiles of 2.xx last year. There is discussion by spohorsky in the forum how to do. sriranga(78) ♫ On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups
Re: Especial Characteres
Dmitry, I fully agree with your points. Newbies (who are non-programmer) like me cannot make traineddata file without any valuable guidance of people like you. Being expert programmer/developer, you have succeeded to build traineddata very easily. As such only newbies need/must to refer to the forum discussion on any points -for solution, With Warmest regards, -sriranga(78yrs) On Thu, Mar 3, 2011 at 7:46 PM, Dmitry Silaev daemons2...@gmail.com wrote: Sriranga, Actually I don't understand why one needs to refer to the forum discussion you've just mentioned above, as I managed to build this traineddata file without writing a single line of code and even without a compiler, say Visual C++... The value I can add is in that any user inexperienced in programming can make this traineddata file himself )) Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:08 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dmitry, No I am NOT the first invented but actually credited to spohor...@sjm.com -who helped me very lot including creating vcproj for combined traineddata for windows. I am very thankful to him for his help/guidance rendered from time to time. Without his help I would not succeeded to generate traineddata file out of old datafiles All credits should go to Steve. Steve has already explained in detail how to do in the forum discussion are available. -sriranga(78yrs) On Thu, Mar 3, 2011 at 6:36 PM, Dmitry Silaev daemons2...@gmail.com wrote: Sriranga, Thanks for letting me know. You are the first one then, and I invented the bicycle )) However an article might be still of use instead of verbose forum discussion... May be you'd like to write it then? Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dimitry, I had generated traineddata(Kannada) files sucessfully from the old datafiles of 2.xx last year. There is discussion by spohorsky in the forum how to do. sriranga(78) ♫ On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email
Re: Especial Characteres
Hi Dmitry, I just replaced with your file por.traineddata But I'm getting an error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault It's seem to be interesting to convert old files from 2.0X to 3, because there isn't a brazillian portuguese for version 3, just portuguese. At least the dictionary por.traineeddata is working correctly in version 3. The special chars is being recognized by tesseract 3. regards, Manuel Pardo Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. por.traineddata -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Dictionnary issues
Hi all, I'm working on a project that involves detecting text in street level images. I have already written a code that allows me to extract text areas from my images. I work with Tesseract 3.0, and, first of all, I tried running Tesseract on full images (1080x1920), just to see the results I could get. Obviously, because of trees, fences, walls, etc, there are a lot of false recognition from Tesseract, but some texts are also well recognized. So, to improve the recognition, I give to Tesseract only the text areas segmented by my code and hope that recognition would be good although the scenes are very difficult. I know that when the image is too complicated (not enough contrast between text and background, many shadows...) detection is really difficult and may not give good results, however even in some supposedly very simple cases like this one (black text on white background with a slight blur): http://tesseract-ocr.googlegroups.com/web/Paris_12-080422_0687-34-1_0001312_box_0006.png?gda=4bbSHWQq5Pp34OGAuWVwGRkvOnHabRkL_yLtSqEDTbGzFn1v-X8wOvnLz5Tja6xhmVIF9MLYgjPKDhWo7fDwKv7hdSobhowICgXY9oBdZxkhoGyvOFXq71KIRN2DRDZ98DIdT53NzgFmQudIVZfn2evkHEao Tesseract recognizes: http://tesseract-ocr.googlegroups.com/web/Paris_12-080422_0687-34-1_0001312_box_0006_boxes.png?gda=KSimu2oq5Pp34OGAuWVwGRkvOnHabRkL_yLtSqEDTbGzFn1v-X8wOvnLz5Tja6xhmVIF9MLYgjPKDhWo7fDwKv7hdSobOWEMBOZDXT0mTiVSy6rk8qwfOToRrNOWJtPPKSAn4D797daDQaep90o7AOpSKHW0 I do not understand this result. Indeed, I use Tesseract with the option -l fra for french language. Normally, in the french dictionnary, the word Cloison exists, so I do not understand why Tesseract recognizes a 0 instead of a o. Does the dictionary actually plays a role in the recognition? Because it is clear that the 0 and o have same shape-based confidence value, but the dictionary should also aim at choosing o rather than 0, am I wrong? In addition, Tesseract does not seem to take into account the scale between two adjacent boxes? It recognizes ll for the segmented quotation mark (see images above) while it recognizes correctly 'i' just before ll. I also tried to add lines to the file fra.unicharambigs to correct false recognition of the 'n' as l'I (line in unicharambigs: 3 l'I 1 n 0) and the 'm' as ITI (line in unicharambigs: 3 ITI 1 m 0), I ran combine_tessdata to make a new fra.traineddata, but there is no change. So, i tried to help Tesseract by giving it our own segmented text image, in this case, the blur is removed and the recognition gives better results as you can see on this image: http://tesseract-ocr.googlegroups.com/web/Paris_12-080422_0687-34-1_0001312_box_0006_boxes+(2).png?gda=_sgwA3Iq5Pp34OGAuWVwGRkvOnHabRkL_yLtSqEDTbGzFn1v-X8wOvnLz5Tja6xhmVIF9MLYgjPKDhWo7fDwKv7hdSobOWEMBOZDXT0mTiVSy6rk8juef4gIssVZMUVd4ovTnHRV4u3aa4iAIyYQIqbG9naPgh6o8ccLBvP6Chud5KMzIQ or this one too: http://tesseract-ocr.googlegroups.com/web/Paris_12-080422_0687-34-1_0001312_box_0005_boxes.png?gda=pgvY8moq5Pp34OGAuWVwGRkvOnHabRkL_yLtSqEDTbGzFn1v-X8wOvnLz5Tja6xhmVIF9MLYgjPKDhWo7fDwKv7hdSobIvfRTlYBT-BD2NUWBDUNMqwfOToRrNOWJtPPKSAn4D797daDQaep90o7AOpSKHW0 I guess that Faux and plafonds (and mayber even Faux-plafonds) are present in the basic Tesseract dictionnary since the recognition is good with original Tesseract. However, if I use a new dictionary I have created, with a list of about 350k french words, using wordlist2dawg to create fra.word-dawg and remake the fra.traineddata and that I ran Tesseract on the same image, the recognition is Foux-plufonds. This word is not in my list neither Foux nor plufonds whereas Faux-plafonds, Faux and plafonds are in my list. If you have any idea to help me with this too, I will be very greatful. Next, I will try to provide character-image by character-image to Tesseract to simplify again the recognition, but if you have any other idea to improve it, I am definitely interested. Thank you in advance for any help you will be able to provide me, Jonathan. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Especial Characteres
Manuel, Is the error message generated by version 2.xx? Did you try to run version 3.xx with my por.traineddata file? I don't get it - have you succeeded or not? Please provide us with the image you are trying to recognize. Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com wrote: Hi Dmitry, I just replaced with your file por.traineddata But I'm getting an error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault It's seem to be interesting to convert old files from 2.0X to 3, because there isn't a brazillian portuguese for version 3, just portuguese. At least the dictionary por.traineeddata is working correctly in version 3. The special chars is being recognized by tesseract 3. regards, Manuel Pardo Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. por.traineddata -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Tesserac 2.0 not working
Hi, I added the code to my project from this site: http://www.pixel-technology.com/freeware/tessnet2/ It work well untill i installed the tesserac 3.0 windows executable. Now when I run my application it shuts down when it hits this line of code: ocr.Init(@D:\Projects\AMCDF\Source\Frameworks\Device\AMCDF.Device.GUI \Resources\tessdata\, eng, false); This use to work. Any help? I uninstaledd all references to 3.0 from regedit and still no luck -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.