Re: Especial Characteres
Manuel, I'm afraid just chaining command line tools won't help in this case. I'm talking about programming. And yes, I did solve many practical problems related to layout analysis, and other fields of document image processing, and succeeded in it )) Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 7:55 AM, manuel...@gmail.com manuel...@gmail.com wrote: What would you recommend to use to split the columns? I think I will need to scan using tesseract column by column. So after that I will need to merge it to make correct rows. Can you point me a direction to help me? What tools (unix compatible tools) can I use to tell tesseract to scan a specific column? Later I will recompile to test, but first I need to find a way to scan correct these reports to generate CSV files to import later to a database. If it works I will spend more time tunning tesseract. Have you ever did this before? (scan reports using tesseract or other tools to generate csv files) Thanks Em 13/03/2011, às 11:20, Dmitry Silaev escreveu: Running via ports can cause diverse errors. Try to compile Tesseract natively. I use revision 549 and as I said it works fine. Such tables as you have present a challenge for simple layout processing algorithms, due to sparsely located text. A minimal skew which is almost inevitable could break all the logic. In such cases I prefer to devise a custom made segmentation logic specific to the document type being processed. In this way I do not depend on Tesseract's segmentation - Tesseract is being used as a raw classifier. Warm regards, Dmitry Silaev On Sun, Mar 13, 2011 at 4:47 PM, manuel...@gmail.com manuel...@gmail.com wrote: I'm using the latest version tesseract @3.00_2+eng I installed using ports in MacOSX Another question Dmitry about this sample In this sample why doesn't tesseract recognize a complete row? It's not a perfect align, but it is impossible to get a image 100% aligned. Tesseract is breaking columns in new lines like : 1 test productA 2 test2 productB Do you know how to fix it? Regard Manuel Pardo Em 13/03/2011, às 08:32, Dmitry Silaev escreveu: Manuel, The sample you provided definitely has insufficient resolution. You may only expect some part of the heading to be recognized. So this is what happened when I've run the recognition of your image. But I haven't got any error or warning messages with my por.traineddata at all! However all this was tested under Windows. Probably I can try this under Ubuntu, but I don't know when I have enough time to reboot, set up a C++ compiler, build Tesseract and do some testing, sorry )) Are you sure you downloaded the latest stable version of Tesseract? Warm regards, Dmitry Silaev On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com manuel...@gmail.com wrote: I just replaced por.traineddata with your file por.traineddata. After that I'm getting this message error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault I haven't succeeded. I'm using version 3 - MacOSX 10.6 Attached Reported.tiff Regards Manuel Pardo Em 04/03/2011, às 03:19, Dmitry Silaev escreveu: Manuel, Is the error message generated by version 2.xx? Did you try to run version 3.xx with my por.traineddata file? I don't get it - have you succeeded or not? Please provide us with the image you are trying to recognize. Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com wrote: Hi Dmitry, I just replaced with your file por.traineddata But I'm getting an error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault It's seem to be interesting to convert old files from 2.0X to 3, because there isn't a brazillian portuguese for version 3, just portuguese. At least the dictionary por.traineeddata is working correctly in version 3. The special chars is being recognized by tesseract 3. regards, Manuel Pardo Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images
Re: Especial Characteres
I doubt there's a GUI which can help with what you want. As for programmatic way of doing this, please refer to the following thread where I already tried to answer a similar question: http://groups.google.com/group/tesseract-ocr/browse_thread/thread/6322a29f28ba49dc/f98699a9caf36dbc#f98699a9caf36dbc If you see no clues in these posts then you need to send your sample images, there's no other way to help you. Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 5:22 PM, manuel...@gmail.com manuel...@gmail.com wrote: Thanks. I need a GUI that tells to tesseract to recognize just a specific column. I'm a Java and C++ developer. Can you point me a direction ? Regards Manuel Pardo Em 14/03/2011, às 04:50, Dmitry Silaev escreveu: Manuel, I'm afraid just chaining command line tools won't help in this case. I'm talking about programming. And yes, I did solve many practical problems related to layout analysis, and other fields of document image processing, and succeeded in it )) Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 7:55 AM, manuel...@gmail.com manuel...@gmail.com wrote: What would you recommend to use to split the columns? I think I will need to scan using tesseract column by column. So after that I will need to merge it to make correct rows. Can you point me a direction to help me? What tools (unix compatible tools) can I use to tell tesseract to scan a specific column? Later I will recompile to test, but first I need to find a way to scan correct these reports to generate CSV files to import later to a database. If it works I will spend more time tunning tesseract. Have you ever did this before? (scan reports using tesseract or other tools to generate csv files) Thanks Em 13/03/2011, às 11:20, Dmitry Silaev escreveu: Running via ports can cause diverse errors. Try to compile Tesseract natively. I use revision 549 and as I said it works fine. Such tables as you have present a challenge for simple layout processing algorithms, due to sparsely located text. A minimal skew which is almost inevitable could break all the logic. In such cases I prefer to devise a custom made segmentation logic specific to the document type being processed. In this way I do not depend on Tesseract's segmentation - Tesseract is being used as a raw classifier. Warm regards, Dmitry Silaev On Sun, Mar 13, 2011 at 4:47 PM, manuel...@gmail.com manuel...@gmail.com wrote: I'm using the latest version tesseract @3.00_2+eng I installed using ports in MacOSX Another question Dmitry about this sample In this sample why doesn't tesseract recognize a complete row? It's not a perfect align, but it is impossible to get a image 100% aligned. Tesseract is breaking columns in new lines like : 1 test productA 2 test2 productB Do you know how to fix it? Regard Manuel Pardo Em 13/03/2011, às 08:32, Dmitry Silaev escreveu: Manuel, The sample you provided definitely has insufficient resolution. You may only expect some part of the heading to be recognized. So this is what happened when I've run the recognition of your image. But I haven't got any error or warning messages with my por.traineddata at all! However all this was tested under Windows. Probably I can try this under Ubuntu, but I don't know when I have enough time to reboot, set up a C++ compiler, build Tesseract and do some testing, sorry )) Are you sure you downloaded the latest stable version of Tesseract? Warm regards, Dmitry Silaev On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com manuel...@gmail.com wrote: I just replaced por.traineddata with your file por.traineddata. After that I'm getting this message error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault I haven't succeeded. I'm using version 3 - MacOSX 10.6 Attached Reported.tiff Regards Manuel Pardo Em 04/03/2011, às 03:19, Dmitry Silaev escreveu: Manuel, Is the error message generated by version 2.xx? Did you try to run version 3.xx with my por.traineddata file? I don't get it - have you succeeded or not? Please provide us with the image you are trying to recognize. Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com wrote: Hi Dmitry, I just replaced with your file por.traineddata But I'm getting an error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault It's seem to be interesting to convert old files from 2.0X to 3, because there isn't a brazillian portuguese for version 3, just portuguese. At least the dictionary por.traineeddata is working correctly in version 3. The special chars is being recognized by tesseract
Re: Especial Characteres
Manuel, The sample you provided definitely has insufficient resolution. You may only expect some part of the heading to be recognized. So this is what happened when I've run the recognition of your image. But I haven't got any error or warning messages with my por.traineddata at all! However all this was tested under Windows. Probably I can try this under Ubuntu, but I don't know when I have enough time to reboot, set up a C++ compiler, build Tesseract and do some testing, sorry )) Are you sure you downloaded the latest stable version of Tesseract? Warm regards, Dmitry Silaev On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com manuel...@gmail.com wrote: I just replaced por.traineddata with your file por.traineddata. After that I'm getting this message error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault I haven't succeeded. I'm using version 3 - MacOSX 10.6 Attached Reported.tiff Regards Manuel Pardo Em 04/03/2011, às 03:19, Dmitry Silaev escreveu: Manuel, Is the error message generated by version 2.xx? Did you try to run version 3.xx with my por.traineddata file? I don't get it - have you succeeded or not? Please provide us with the image you are trying to recognize. Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com wrote: Hi Dmitry, I just replaced with your file por.traineddata But I'm getting an error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault It's seem to be interesting to convert old files from 2.0X to 3, because there isn't a brazillian portuguese for version 3, just portuguese. At least the dictionary por.traineeddata is working correctly in version 3. The special chars is being recognized by tesseract 3. regards, Manuel Pardo Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. por.traineddata -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups
Re: Especial Characteres
What would you recommend to use to split the columns? I think I will need to scan using tesseract column by column. So after that I will need to merge it to make correct rows. Can you point me a direction to help me? What tools (unix compatible tools) can I use to tell tesseract to scan a specific column? Later I will recompile to test, but first I need to find a way to scan correct these reports to generate CSV files to import later to a database. If it works I will spend more time tunning tesseract. Have you ever did this before? (scan reports using tesseract or other tools to generate csv files) Thanks Em 13/03/2011, às 11:20, Dmitry Silaev escreveu: Running via ports can cause diverse errors. Try to compile Tesseract natively. I use revision 549 and as I said it works fine. Such tables as you have present a challenge for simple layout processing algorithms, due to sparsely located text. A minimal skew which is almost inevitable could break all the logic. In such cases I prefer to devise a custom made segmentation logic specific to the document type being processed. In this way I do not depend on Tesseract's segmentation - Tesseract is being used as a raw classifier. Warm regards, Dmitry Silaev On Sun, Mar 13, 2011 at 4:47 PM, manuel...@gmail.com manuel...@gmail.com wrote: I'm using the latest version tesseract @3.00_2+eng I installed using ports in MacOSX Another question Dmitry about this sample In this sample why doesn't tesseract recognize a complete row? It's not a perfect align, but it is impossible to get a image 100% aligned. Tesseract is breaking columns in new lines like : 1 testproductA 2 test2 productB Do you know how to fix it? Regard Manuel Pardo Em 13/03/2011, às 08:32, Dmitry Silaev escreveu: Manuel, The sample you provided definitely has insufficient resolution. You may only expect some part of the heading to be recognized. So this is what happened when I've run the recognition of your image. But I haven't got any error or warning messages with my por.traineddata at all! However all this was tested under Windows. Probably I can try this under Ubuntu, but I don't know when I have enough time to reboot, set up a C++ compiler, build Tesseract and do some testing, sorry )) Are you sure you downloaded the latest stable version of Tesseract? Warm regards, Dmitry Silaev On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com manuel...@gmail.com wrote: I just replaced por.traineddata with your file por.traineddata. After that I'm getting this message error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault I haven't succeeded. I'm using version 3 - MacOSX 10.6 Attached Reported.tiff Regards Manuel Pardo Em 04/03/2011, às 03:19, Dmitry Silaev escreveu: Manuel, Is the error message generated by version 2.xx? Did you try to run version 3.xx with my por.traineddata file? I don't get it - have you succeeded or not? Please provide us with the image you are trying to recognize. Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com wrote: Hi Dmitry, I just replaced with your file por.traineddata But I'm getting an error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault It's seem to be interesting to convert old files from 2.0X to 3, because there isn't a brazillian portuguese for version 3, just portuguese. At least the dictionary por.traineeddata is working correctly in version 3. The special chars is being recognized by tesseract 3. regards, Manuel Pardo Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make
Re: Especial Characteres
Dimitry, I had generated traineddata(Kannada) files sucessfully from the old datafiles of 2.xx last year. There is discussion by spohorsky in the forum how to do. sriranga(78) ♫ On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Especial Characteres
Sriranga, Thanks for letting me know. You are the first one then, and I invented the bicycle )) However an article might be still of use instead of verbose forum discussion... May be you'd like to write it then? Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dimitry, I had generated traineddata(Kannada) files sucessfully from the old datafiles of 2.xx last year. There is discussion by spohorsky in the forum how to do. sriranga(78) ♫ On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Especial Characteres
Sriranga, Actually I don't understand why one needs to refer to the forum discussion you've just mentioned above, as I managed to build this traineddata file without writing a single line of code and even without a compiler, say Visual C++... The value I can add is in that any user inexperienced in programming can make this traineddata file himself )) Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:08 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dmitry, No I am NOT the first invented but actually credited to spohor...@sjm.com -who helped me very lot including creating vcproj for combined traineddata for windows. I am very thankful to him for his help/guidance rendered from time to time. Without his help I would not succeeded to generate traineddata file out of old datafiles All credits should go to Steve. Steve has already explained in detail how to do in the forum discussion are available. -sriranga(78yrs) On Thu, Mar 3, 2011 at 6:36 PM, Dmitry Silaev daemons2...@gmail.com wrote: Sriranga, Thanks for letting me know. You are the first one then, and I invented the bicycle )) However an article might be still of use instead of verbose forum discussion... May be you'd like to write it then? Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dimitry, I had generated traineddata(Kannada) files sucessfully from the old datafiles of 2.xx last year. There is discussion by spohorsky in the forum how to do. sriranga(78) ♫ On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups
Re: Especial Characteres
Dmitry, I fully agree with your points. Newbies (who are non-programmer) like me cannot make traineddata file without any valuable guidance of people like you. Being expert programmer/developer, you have succeeded to build traineddata very easily. As such only newbies need/must to refer to the forum discussion on any points -for solution, With Warmest regards, -sriranga(78yrs) On Thu, Mar 3, 2011 at 7:46 PM, Dmitry Silaev daemons2...@gmail.com wrote: Sriranga, Actually I don't understand why one needs to refer to the forum discussion you've just mentioned above, as I managed to build this traineddata file without writing a single line of code and even without a compiler, say Visual C++... The value I can add is in that any user inexperienced in programming can make this traineddata file himself )) Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:08 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dmitry, No I am NOT the first invented but actually credited to spohor...@sjm.com -who helped me very lot including creating vcproj for combined traineddata for windows. I am very thankful to him for his help/guidance rendered from time to time. Without his help I would not succeeded to generate traineddata file out of old datafiles All credits should go to Steve. Steve has already explained in detail how to do in the forum discussion are available. -sriranga(78yrs) On Thu, Mar 3, 2011 at 6:36 PM, Dmitry Silaev daemons2...@gmail.com wrote: Sriranga, Thanks for letting me know. You are the first one then, and I invented the bicycle )) However an article might be still of use instead of verbose forum discussion... May be you'd like to write it then? Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold) withblessi...@gmail.com wrote: Dimitry, I had generated traineddata(Kannada) files sucessfully from the old datafiles of 2.xx last year. There is discussion by spohorsky in the forum how to do. sriranga(78) ♫ On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email
Re: Especial Characteres
Hi Dmitry, I just replaced with your file por.traineddata But I'm getting an error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault It's seem to be interesting to convert old files from 2.0X to 3, because there isn't a brazillian portuguese for version 3, just portuguese. At least the dictionary por.traineeddata is working correctly in version 3. The special chars is being recognized by tesseract 3. regards, Manuel Pardo Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. por.traineddata -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.
Re: Especial Characteres
Manuel, Is the error message generated by version 2.xx? Did you try to run version 3.xx with my por.traineddata file? I don't get it - have you succeeded or not? Please provide us with the image you are trying to recognize. Warm regards, Dmitry Silaev On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com wrote: Hi Dmitry, I just replaced with your file por.traineddata But I'm getting an error: manuel$ tesseract input.tiff output -l por actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55 Segmentation fault It's seem to be interesting to convert old files from 2.0X to 3, because there isn't a brazillian portuguese for version 3, just portuguese. At least the dictionary por.traineeddata is working correctly in version 3. The special chars is being recognized by tesseract 3. regards, Manuel Pardo Em 03/03/2011, às 09:12, Dmitry Silaev escreveu: Manuel, It's quite an interesting question although it may seem to be an ordinary newbie-like one. I was always wondering if 2.xx files can be used with version 3.xx. The wiki states that the files in the traineddata file are different from the list used prior to 3.00, and will most likely change, possibly dramatically in future revisions. I have no time to investigate it in the code so I decided to act rather than to think. After some tinkering with all those files I slipped the resulted por.traineddata into my Tesseract algo I'm currently working at, and - guess what? - it worked! )) I must say it was tested only with a couple of *very simple* images and also it absolutely lacks any dictionary-related data. And my test images don't contain these specific Portuguese letters with diacritics. So in fact this file may perform poorly. Please test and report your results. The file is in the attachment. It was not difficult at all but also not so straight-forward to make this training data file, so probably this process deserves a separate article and later I'd like to post it in my blog. Warm regards, Dmitry Silaev On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote: Helo list, I can't find a solution for special chars I installed tesseract 3 in my MacOSX 10.6 It is running very well But I'm having problems with charset. I need tesseract working with brazillian portuguese. (ISO8859-1) I installed the portuguese dictionary but is not working with special chars like Ç Ã É é (ISO8859-1) Is there any solution ? There is an old dictionary special for brazilian portuguese in version 2.0.4. Is it possible to use in version 3? How? -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. por.traineddata -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.