Re: Especial Characteres

2011-03-14 Thread Dmitry Silaev
Manuel,

I'm afraid just chaining command line tools won't help in this case.
I'm talking about programming.

And yes, I did solve many practical problems related to layout
analysis, and other fields of document image processing, and succeeded
in it ))

Warm regards,
Dmitry Silaev





On Mon, Mar 14, 2011 at 7:55 AM, manuel...@gmail.com
manuel...@gmail.com wrote:
 What would you recommend to use to split the columns?

 I think I will need to scan using tesseract column by column.
 So after that I will need to merge it to make correct rows.

 Can you point me a direction to help me?
 What tools (unix compatible tools) can I use to tell tesseract to scan a 
 specific  column?

 Later I will recompile to test, but first I need to find a way to scan 
 correct these reports to generate CSV files to import later to a database.
 If it works I will spend more time tunning tesseract.

 Have you ever did this before? (scan reports using tesseract or other tools 
 to generate csv files)

 Thanks



 Em 13/03/2011, às 11:20, Dmitry Silaev escreveu:

 Running via ports can cause diverse errors. Try to compile Tesseract
 natively. I use revision 549 and as I said it works fine.

 Such tables as you have present a challenge for simple layout
 processing algorithms, due to sparsely located text. A minimal skew
 which is almost inevitable could break all the logic. In such cases I
 prefer to devise a custom made segmentation logic specific to the
 document type being processed. In this way I do not depend on
 Tesseract's segmentation - Tesseract is being used as a raw
 classifier.

 Warm regards,
 Dmitry Silaev





 On Sun, Mar 13, 2011 at 4:47 PM, manuel...@gmail.com
 manuel...@gmail.com wrote:
 I'm using the latest version tesseract @3.00_2+eng
 I installed using ports in MacOSX

 Another question Dmitry about this sample
 In this sample why doesn't tesseract recognize a complete row? It's not a 
 perfect align, but it is impossible to get a image 100% aligned.
 Tesseract is breaking columns in new lines like :

 1           test    productA
 2           test2
 productB

 Do you know how to fix it?

 Regard
 Manuel Pardo


 Em 13/03/2011, às 08:32, Dmitry Silaev escreveu:

 Manuel,

 The sample you provided definitely has insufficient resolution. You
 may only expect some part of the heading to be recognized. So this is
 what happened when I've run the recognition of your image. But I
 haven't got any error or warning messages with my por.traineddata at
 all!

 However all this was tested under Windows. Probably I can try this
 under Ubuntu, but I don't know when I have enough time to reboot, set
 up a C++ compiler, build Tesseract and do some testing, sorry ))

 Are you sure you downloaded the latest stable version of Tesseract?

 Warm regards,
 Dmitry Silaev





 On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com
 manuel...@gmail.com wrote:
 I just replaced por.traineddata with your file por.traineddata.
 After that I'm getting this message error:

 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert 
 failed:in file tessdatamanager.cpp, line 55
 Segmentation fault

 I haven't succeeded. I'm using version 3 - MacOSX 10.6



 Attached Reported.tiff






 Regards
 Manuel Pardo

 Em 04/03/2011, às 03:19, Dmitry Silaev escreveu:

 Manuel,

 Is the error message generated by version 2.xx? Did you try to run
 version 3.xx with my por.traineddata file?
 I don't get it - have you succeeded or not?
 Please provide us with the image you are trying to recognize.

 Warm regards,
 Dmitry Silaev





 On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com 
 manuel...@gmail.com wrote:
 Hi Dmitry,

 I just replaced with your file por.traineddata
 But I'm getting an error:

 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert 
 failed:in file tessdatamanager.cpp, line 55
 Segmentation fault

 It's seem to be interesting to convert old files from 2.0X to 3, 
 because there isn't a brazillian portuguese for version 3,  just 
 portuguese.
 At least the dictionary por.traineeddata is working correctly in 
 version 3.
 The special chars is being recognized by tesseract 3.

 regards,
 Manuel Pardo




 Em 03/03/2011, às 09:12, Dmitry Silaev escreveu:

 Manuel,

 It's quite an interesting question although it may seem to be an
 ordinary newbie-like one.

 I was always wondering if 2.xx files can be used with version 3.xx.
 The wiki states that the files in the traineddata file are different
 from the list used prior to 3.00, and will most likely change,
 possibly dramatically in future revisions.

 I have no time to investigate it in the code so I decided to act
 rather than to think. After some tinkering with all those files I
 slipped the resulted por.traineddata into my Tesseract algo I'm
 currently working at, and - guess what? - it worked! ))

 I must say it was tested only with a couple of *very simple* images
 

Re: Especial Characteres

2011-03-14 Thread Dmitry Silaev
I doubt there's a GUI which can help with what you want. As for
programmatic way of doing this, please refer to the following thread
where I already tried to answer a similar question:
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/6322a29f28ba49dc/f98699a9caf36dbc#f98699a9caf36dbc

If you see no clues in these posts then you need to send your sample
images, there's no other way to help you.

Warm regards,
Dmitry Silaev





On Mon, Mar 14, 2011 at 5:22 PM, manuel...@gmail.com
manuel...@gmail.com wrote:
 Thanks.

 I need a GUI that tells to tesseract to recognize just a specific column.
 I'm a Java and C++ developer. Can you point me a direction ?


 Regards
 Manuel Pardo

 Em 14/03/2011, às 04:50, Dmitry Silaev escreveu:

 Manuel,

 I'm afraid just chaining command line tools won't help in this case.
 I'm talking about programming.

 And yes, I did solve many practical problems related to layout
 analysis, and other fields of document image processing, and succeeded
 in it ))

 Warm regards,
 Dmitry Silaev





 On Mon, Mar 14, 2011 at 7:55 AM, manuel...@gmail.com
 manuel...@gmail.com wrote:
 What would you recommend to use to split the columns?

 I think I will need to scan using tesseract column by column.
 So after that I will need to merge it to make correct rows.

 Can you point me a direction to help me?
 What tools (unix compatible tools) can I use to tell tesseract to scan a 
 specific  column?

 Later I will recompile to test, but first I need to find a way to scan 
 correct these reports to generate CSV files to import later to a database.
 If it works I will spend more time tunning tesseract.

 Have you ever did this before? (scan reports using tesseract or other tools 
 to generate csv files)

 Thanks



 Em 13/03/2011, às 11:20, Dmitry Silaev escreveu:

 Running via ports can cause diverse errors. Try to compile Tesseract
 natively. I use revision 549 and as I said it works fine.

 Such tables as you have present a challenge for simple layout
 processing algorithms, due to sparsely located text. A minimal skew
 which is almost inevitable could break all the logic. In such cases I
 prefer to devise a custom made segmentation logic specific to the
 document type being processed. In this way I do not depend on
 Tesseract's segmentation - Tesseract is being used as a raw
 classifier.

 Warm regards,
 Dmitry Silaev





 On Sun, Mar 13, 2011 at 4:47 PM, manuel...@gmail.com
 manuel...@gmail.com wrote:
 I'm using the latest version tesseract @3.00_2+eng
 I installed using ports in MacOSX

 Another question Dmitry about this sample
 In this sample why doesn't tesseract recognize a complete row? It's not a 
 perfect align, but it is impossible to get a image 100% aligned.
 Tesseract is breaking columns in new lines like :

 1           test    productA
 2           test2
 productB

 Do you know how to fix it?

 Regard
 Manuel Pardo


 Em 13/03/2011, às 08:32, Dmitry Silaev escreveu:

 Manuel,

 The sample you provided definitely has insufficient resolution. You
 may only expect some part of the heading to be recognized. So this is
 what happened when I've run the recognition of your image. But I
 haven't got any error or warning messages with my por.traineddata at
 all!

 However all this was tested under Windows. Probably I can try this
 under Ubuntu, but I don't know when I have enough time to reboot, set
 up a C++ compiler, build Tesseract and do some testing, sorry ))

 Are you sure you downloaded the latest stable version of Tesseract?

 Warm regards,
 Dmitry Silaev





 On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com
 manuel...@gmail.com wrote:
 I just replaced por.traineddata with your file por.traineddata.
 After that I'm getting this message error:

 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert 
 failed:in file tessdatamanager.cpp, line 55
 Segmentation fault

 I haven't succeeded. I'm using version 3 - MacOSX 10.6



 Attached Reported.tiff






 Regards
 Manuel Pardo

 Em 04/03/2011, às 03:19, Dmitry Silaev escreveu:

 Manuel,

 Is the error message generated by version 2.xx? Did you try to run
 version 3.xx with my por.traineddata file?
 I don't get it - have you succeeded or not?
 Please provide us with the image you are trying to recognize.

 Warm regards,
 Dmitry Silaev





 On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com 
 manuel...@gmail.com wrote:
 Hi Dmitry,

 I just replaced with your file por.traineddata
 But I'm getting an error:

 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert 
 failed:in file tessdatamanager.cpp, line 55
 Segmentation fault

 It's seem to be interesting to convert old files from 2.0X to 3, 
 because there isn't a brazillian portuguese for version 3,  just 
 portuguese.
 At least the dictionary por.traineeddata is working correctly in 
 version 3.
 The special chars is being recognized by tesseract 

Re: Especial Characteres

2011-03-13 Thread Dmitry Silaev
Manuel,

The sample you provided definitely has insufficient resolution. You
may only expect some part of the heading to be recognized. So this is
what happened when I've run the recognition of your image. But I
haven't got any error or warning messages with my por.traineddata at
all!

However all this was tested under Windows. Probably I can try this
under Ubuntu, but I don't know when I have enough time to reboot, set
up a C++ compiler, build Tesseract and do some testing, sorry ))

Are you sure you downloaded the latest stable version of Tesseract?

Warm regards,
Dmitry Silaev





On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com
manuel...@gmail.com wrote:
 I just replaced por.traineddata with your file por.traineddata.
 After that I'm getting this message error:

 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in 
 file tessdatamanager.cpp, line 55
 Segmentation fault

 I haven't succeeded. I'm using version 3 - MacOSX 10.6



 Attached Reported.tiff






 Regards
 Manuel Pardo

 Em 04/03/2011, às 03:19, Dmitry Silaev escreveu:

 Manuel,

 Is the error message generated by version 2.xx? Did you try to run
 version 3.xx with my por.traineddata file?
 I don't get it - have you succeeded or not?
 Please provide us with the image you are trying to recognize.

 Warm regards,
 Dmitry Silaev





 On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com 
 wrote:
 Hi Dmitry,

 I just replaced with your file por.traineddata
 But I'm getting an error:

 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in 
 file tessdatamanager.cpp, line 55
 Segmentation fault

 It's seem to be interesting to convert old files from 2.0X to 3, because 
 there isn't a brazillian portuguese for version 3,  just portuguese.
 At least the dictionary por.traineeddata is working correctly in version 3.
 The special chars is being recognized by tesseract 3.

 regards,
 Manuel Pardo




 Em 03/03/2011, às 09:12, Dmitry Silaev escreveu:

 Manuel,

 It's quite an interesting question although it may seem to be an
 ordinary newbie-like one.

 I was always wondering if 2.xx files can be used with version 3.xx.
 The wiki states that the files in the traineddata file are different
 from the list used prior to 3.00, and will most likely change,
 possibly dramatically in future revisions.

 I have no time to investigate it in the code so I decided to act
 rather than to think. After some tinkering with all those files I
 slipped the resulted por.traineddata into my Tesseract algo I'm
 currently working at, and - guess what? - it worked! ))

 I must say it was tested only with a couple of *very simple* images
 and also it absolutely lacks any dictionary-related data. And my test
 images don't contain these specific Portuguese letters with
 diacritics. So in fact this file may perform poorly. Please test and
 report your results. The file is in the attachment.

 It was not difficult at all but also not so straight-forward to make
 this training data file, so probably this process deserves a separate
 article and later I'd like to post it in my blog.

 Warm regards,
 Dmitry Silaev





 On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote:
 Helo list,
 I can't find a solution for special chars

 I installed tesseract 3 in my MacOSX 10.6
 It is running very well

 But I'm having problems with charset.
 I need tesseract working with brazillian portuguese. (ISO8859-1)

 I installed the portuguese dictionary but is not working with special
 chars like  Ç Ã É é   (ISO8859-1)
 Is there any solution ?

 There is an old dictionary special for brazilian portuguese in version
 2.0.4. Is it possible to use in version 3? How?


 --
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to 
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/tesseract-ocr?hl=en.



 --
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to 
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/tesseract-ocr?hl=en.

 por.traineddata

 --
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to 
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/tesseract-ocr?hl=en.



 --
 You received this message because you are subscribed to the Google Groups 
 

Re: Especial Characteres

2011-03-13 Thread manuel...@gmail.com
What would you recommend to use to split the columns?

I think I will need to scan using tesseract column by column.
So after that I will need to merge it to make correct rows.

Can you point me a direction to help me?
What tools (unix compatible tools) can I use to tell tesseract to scan a 
specific  column?

Later I will recompile to test, but first I need to find a way to scan correct 
these reports to generate CSV files to import later to a database.
If it works I will spend more time tunning tesseract.

Have you ever did this before? (scan reports using tesseract or other tools to 
generate csv files)

Thanks



Em 13/03/2011, às 11:20, Dmitry Silaev escreveu:

 Running via ports can cause diverse errors. Try to compile Tesseract
 natively. I use revision 549 and as I said it works fine.
 
 Such tables as you have present a challenge for simple layout
 processing algorithms, due to sparsely located text. A minimal skew
 which is almost inevitable could break all the logic. In such cases I
 prefer to devise a custom made segmentation logic specific to the
 document type being processed. In this way I do not depend on
 Tesseract's segmentation - Tesseract is being used as a raw
 classifier.
 
 Warm regards,
 Dmitry Silaev
 
 
 
 
 
 On Sun, Mar 13, 2011 at 4:47 PM, manuel...@gmail.com
 manuel...@gmail.com wrote:
 I'm using the latest version tesseract @3.00_2+eng
 I installed using ports in MacOSX
 
 Another question Dmitry about this sample
 In this sample why doesn't tesseract recognize a complete row? It's not a 
 perfect align, but it is impossible to get a image 100% aligned.
 Tesseract is breaking columns in new lines like :
 
 1   testproductA
 2   test2
 productB
 
 Do you know how to fix it?
 
 Regard
 Manuel Pardo
 
 
 Em 13/03/2011, às 08:32, Dmitry Silaev escreveu:
 
 Manuel,
 
 The sample you provided definitely has insufficient resolution. You
 may only expect some part of the heading to be recognized. So this is
 what happened when I've run the recognition of your image. But I
 haven't got any error or warning messages with my por.traineddata at
 all!
 
 However all this was tested under Windows. Probably I can try this
 under Ubuntu, but I don't know when I have enough time to reboot, set
 up a C++ compiler, build Tesseract and do some testing, sorry ))
 
 Are you sure you downloaded the latest stable version of Tesseract?
 
 Warm regards,
 Dmitry Silaev
 
 
 
 
 
 On Thu, Mar 10, 2011 at 9:32 PM, manuel...@gmail.com
 manuel...@gmail.com wrote:
 I just replaced por.traineddata with your file por.traineddata.
 After that I'm getting this message error:
 
 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert 
 failed:in file tessdatamanager.cpp, line 55
 Segmentation fault
 
 I haven't succeeded. I'm using version 3 - MacOSX 10.6
 
 
 
 Attached Reported.tiff
 
 
 
 
 
 
 Regards
 Manuel Pardo
 
 Em 04/03/2011, às 03:19, Dmitry Silaev escreveu:
 
 Manuel,
 
 Is the error message generated by version 2.xx? Did you try to run
 version 3.xx with my por.traineddata file?
 I don't get it - have you succeeded or not?
 Please provide us with the image you are trying to recognize.
 
 Warm regards,
 Dmitry Silaev
 
 
 
 
 
 On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com 
 wrote:
 Hi Dmitry,
 
 I just replaced with your file por.traineddata
 But I'm getting an error:
 
 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert 
 failed:in file tessdatamanager.cpp, line 55
 Segmentation fault
 
 It's seem to be interesting to convert old files from 2.0X to 3, because 
 there isn't a brazillian portuguese for version 3,  just portuguese.
 At least the dictionary por.traineeddata is working correctly in version 
 3.
 The special chars is being recognized by tesseract 3.
 
 regards,
 Manuel Pardo
 
 
 
 
 Em 03/03/2011, às 09:12, Dmitry Silaev escreveu:
 
 Manuel,
 
 It's quite an interesting question although it may seem to be an
 ordinary newbie-like one.
 
 I was always wondering if 2.xx files can be used with version 3.xx.
 The wiki states that the files in the traineddata file are different
 from the list used prior to 3.00, and will most likely change,
 possibly dramatically in future revisions.
 
 I have no time to investigate it in the code so I decided to act
 rather than to think. After some tinkering with all those files I
 slipped the resulted por.traineddata into my Tesseract algo I'm
 currently working at, and - guess what? - it worked! ))
 
 I must say it was tested only with a couple of *very simple* images
 and also it absolutely lacks any dictionary-related data. And my test
 images don't contain these specific Portuguese letters with
 diacritics. So in fact this file may perform poorly. Please test and
 report your results. The file is in the attachment.
 
 It was not difficult at all but also not so straight-forward to make
 

Re: Especial Characteres

2011-03-03 Thread Sriranga(78yrsold)
Dimitry,
I had generated traineddata(Kannada) files sucessfully from the old
datafiles of 2.xx last year. There is discussion by spohorsky in the forum
how to do.
sriranga(78)
♫

On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote:

 Manuel,

 It's quite an interesting question although it may seem to be an
 ordinary newbie-like one.

 I was always wondering if 2.xx files can be used with version 3.xx.
 The wiki states that the files in the traineddata file are different
 from the list used prior to 3.00, and will most likely change,
 possibly dramatically in future revisions.

 I have no time to investigate it in the code so I decided to act
 rather than to think. After some tinkering with all those files I
 slipped the resulted por.traineddata into my Tesseract algo I'm
 currently working at, and - guess what? - it worked! ))

 I must say it was tested only with a couple of *very simple* images
 and also it absolutely lacks any dictionary-related data. And my test
 images don't contain these specific Portuguese letters with
 diacritics. So in fact this file may perform poorly. Please test and
 report your results. The file is in the attachment.

 It was not difficult at all but also not so straight-forward to make
 this training data file, so probably this process deserves a separate
 article and later I'd like to post it in my blog.

 Warm regards,
 Dmitry Silaev





 On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote:
  Helo list,
  I can't find a solution for special chars
 
  I installed tesseract 3 in my MacOSX 10.6
  It is running very well
 
  But I'm having problems with charset.
  I need tesseract working with brazillian portuguese. (ISO8859-1)
 
  I installed the portuguese dictionary but is not working with special
  chars like  Ç Ã É é   (ISO8859-1)
  Is there any solution ?
 
  There is an old dictionary special for brazilian portuguese in version
  2.0.4. Is it possible to use in version 3? How?
 
 
  --
  You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
  To post to this group, send email to tesseract-ocr@googlegroups.com.
  To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com.
  For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en.
 
 

 --
 You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Especial Characteres

2011-03-03 Thread Dmitry Silaev
Sriranga,

Thanks for letting me know. You are the first one then, and I invented
the bicycle ))
However an article might be still of use instead of verbose forum discussion...
May be you'd like to write it then?

Warm regards,
Dmitry Silaev





On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold)
withblessi...@gmail.com wrote:
 Dimitry,
 I had generated traineddata(Kannada) files sucessfully from the old
 datafiles of 2.xx last year. There is discussion by spohorsky in the forum
 how to do.
 sriranga(78)
 ♫
 On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com wrote:

 Manuel,

 It's quite an interesting question although it may seem to be an
 ordinary newbie-like one.

 I was always wondering if 2.xx files can be used with version 3.xx.
 The wiki states that the files in the traineddata file are different
 from the list used prior to 3.00, and will most likely change,
 possibly dramatically in future revisions.

 I have no time to investigate it in the code so I decided to act
 rather than to think. After some tinkering with all those files I
 slipped the resulted por.traineddata into my Tesseract algo I'm
 currently working at, and - guess what? - it worked! ))

 I must say it was tested only with a couple of *very simple* images
 and also it absolutely lacks any dictionary-related data. And my test
 images don't contain these specific Portuguese letters with
 diacritics. So in fact this file may perform poorly. Please test and
 report your results. The file is in the attachment.

 It was not difficult at all but also not so straight-forward to make
 this training data file, so probably this process deserves a separate
 article and later I'd like to post it in my blog.

 Warm regards,
 Dmitry Silaev





 On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote:
  Helo list,
  I can't find a solution for special chars
 
  I installed tesseract 3 in my MacOSX 10.6
  It is running very well
 
  But I'm having problems with charset.
  I need tesseract working with brazillian portuguese. (ISO8859-1)
 
  I installed the portuguese dictionary but is not working with special
  chars like  Ç Ã É é   (ISO8859-1)
  Is there any solution ?
 
  There is an old dictionary special for brazilian portuguese in version
  2.0.4. Is it possible to use in version 3? How?
 
 
  --
  You received this message because you are subscribed to the Google
  Groups tesseract-ocr group.
  To post to this group, send email to tesseract-ocr@googlegroups.com.
  To unsubscribe from this group, send email to
  tesseract-ocr+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/tesseract-ocr?hl=en.
 
 

 --
 You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en.


 --
 You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Especial Characteres

2011-03-03 Thread Dmitry Silaev
Sriranga,

Actually I don't understand why one needs to refer to the forum
discussion you've just mentioned above, as I managed to build this
traineddata file without writing a single line of code and even
without a compiler, say Visual C++...

The value I can add is in that any user inexperienced in programming
can make this traineddata file himself ))

Warm regards,
Dmitry Silaev





On Thu, Mar 3, 2011 at 5:08 PM, Sriranga(78yrsold)
withblessi...@gmail.com wrote:
 Dmitry,
 No I am NOT the first invented but actually credited to spohor...@sjm.com
 -who helped me very lot including creating vcproj for combined traineddata
 for windows. I am very thankful to him for his help/guidance rendered from
 time to time. Without his help I would not succeeded to generate traineddata
 file out of old datafiles  All credits should go to Steve. Steve has already
 explained in detail how to do in the forum discussion are available.
 -sriranga(78yrs)

 On Thu, Mar 3, 2011 at 6:36 PM, Dmitry Silaev daemons2...@gmail.com wrote:

 Sriranga,

 Thanks for letting me know. You are the first one then, and I invented
 the bicycle ))
 However an article might be still of use instead of verbose forum
 discussion...
 May be you'd like to write it then?

 Warm regards,
 Dmitry Silaev





 On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold)
 withblessi...@gmail.com wrote:
  Dimitry,
  I had generated traineddata(Kannada) files sucessfully from the old
  datafiles of 2.xx last year. There is discussion by spohorsky in the
  forum
  how to do.
  sriranga(78)
  ♫
  On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com
  wrote:
 
  Manuel,
 
  It's quite an interesting question although it may seem to be an
  ordinary newbie-like one.
 
  I was always wondering if 2.xx files can be used with version 3.xx.
  The wiki states that the files in the traineddata file are different
  from the list used prior to 3.00, and will most likely change,
  possibly dramatically in future revisions.
 
  I have no time to investigate it in the code so I decided to act
  rather than to think. After some tinkering with all those files I
  slipped the resulted por.traineddata into my Tesseract algo I'm
  currently working at, and - guess what? - it worked! ))
 
  I must say it was tested only with a couple of *very simple* images
  and also it absolutely lacks any dictionary-related data. And my test
  images don't contain these specific Portuguese letters with
  diacritics. So in fact this file may perform poorly. Please test and
  report your results. The file is in the attachment.
 
  It was not difficult at all but also not so straight-forward to make
  this training data file, so probably this process deserves a separate
  article and later I'd like to post it in my blog.
 
  Warm regards,
  Dmitry Silaev
 
 
 
 
 
  On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote:
   Helo list,
   I can't find a solution for special chars
  
   I installed tesseract 3 in my MacOSX 10.6
   It is running very well
  
   But I'm having problems with charset.
   I need tesseract working with brazillian portuguese. (ISO8859-1)
  
   I installed the portuguese dictionary but is not working with special
   chars like  Ç Ã É é   (ISO8859-1)
   Is there any solution ?
  
   There is an old dictionary special for brazilian portuguese in
   version
   2.0.4. Is it possible to use in version 3? How?
  
  
   --
   You received this message because you are subscribed to the Google
   Groups tesseract-ocr group.
   To post to this group, send email to tesseract-ocr@googlegroups.com.
   To unsubscribe from this group, send email to
   tesseract-ocr+unsubscr...@googlegroups.com.
   For more options, visit this group at
   http://groups.google.com/group/tesseract-ocr?hl=en.
  
  
 
  --
  You received this message because you are subscribed to the Google
  Groups
  tesseract-ocr group.
  To post to this group, send email to tesseract-ocr@googlegroups.com.
  To unsubscribe from this group, send email to
  tesseract-ocr+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/tesseract-ocr?hl=en.
 
 
  --
  You received this message because you are subscribed to the Google
  Groups
  tesseract-ocr group.
  To post to this group, send email to tesseract-ocr@googlegroups.com.
  To unsubscribe from this group, send email to
  tesseract-ocr+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/tesseract-ocr?hl=en.
 

 --
 You received this message because you are subscribed to the Google Groups
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/tesseract-ocr?hl=en.


 --
 You received this message because you are subscribed to the Google Groups
 

Re: Especial Characteres

2011-03-03 Thread Sriranga(78yrsold)
Dmitry,
I fully agree with your points. Newbies (who are non-programmer) like me
cannot make traineddata file without any valuable guidance of people like
you. Being expert programmer/developer, you have succeeded to build
traineddata very easily. As such only newbies need/must to refer to the
forum discussion on any points -for solution,
With Warmest regards,
-sriranga(78yrs)

On Thu, Mar 3, 2011 at 7:46 PM, Dmitry Silaev daemons2...@gmail.com wrote:

 Sriranga,

 Actually I don't understand why one needs to refer to the forum
 discussion you've just mentioned above, as I managed to build this
 traineddata file without writing a single line of code and even
 without a compiler, say Visual C++...

 The value I can add is in that any user inexperienced in programming
 can make this traineddata file himself ))

 Warm regards,
 Dmitry Silaev





 On Thu, Mar 3, 2011 at 5:08 PM, Sriranga(78yrsold)
 withblessi...@gmail.com wrote:
  Dmitry,
  No I am NOT the first invented but actually credited to
 spohor...@sjm.com
  -who helped me very lot including creating vcproj for combined
 traineddata
  for windows. I am very thankful to him for his help/guidance rendered
 from
  time to time. Without his help I would not succeeded to generate
 traineddata
  file out of old datafiles  All credits should go to Steve. Steve has
 already
  explained in detail how to do in the forum discussion are available.
  -sriranga(78yrs)
 
  On Thu, Mar 3, 2011 at 6:36 PM, Dmitry Silaev daemons2...@gmail.com
 wrote:
 
  Sriranga,
 
  Thanks for letting me know. You are the first one then, and I invented
  the bicycle ))
  However an article might be still of use instead of verbose forum
  discussion...
  May be you'd like to write it then?
 
  Warm regards,
  Dmitry Silaev
 
 
 
 
 
  On Thu, Mar 3, 2011 at 3:55 PM, Sriranga(78yrsold)
  withblessi...@gmail.com wrote:
   Dimitry,
   I had generated traineddata(Kannada) files sucessfully from the old
   datafiles of 2.xx last year. There is discussion by spohorsky in the
   forum
   how to do.
   sriranga(78)
   ♫
   On Thu, Mar 3, 2011 at 5:42 PM, Dmitry Silaev daemons2...@gmail.com
   wrote:
  
   Manuel,
  
   It's quite an interesting question although it may seem to be an
   ordinary newbie-like one.
  
   I was always wondering if 2.xx files can be used with version 3.xx.
   The wiki states that the files in the traineddata file are different
   from the list used prior to 3.00, and will most likely change,
   possibly dramatically in future revisions.
  
   I have no time to investigate it in the code so I decided to act
   rather than to think. After some tinkering with all those files I
   slipped the resulted por.traineddata into my Tesseract algo I'm
   currently working at, and - guess what? - it worked! ))
  
   I must say it was tested only with a couple of *very simple* images
   and also it absolutely lacks any dictionary-related data. And my test
   images don't contain these specific Portuguese letters with
   diacritics. So in fact this file may perform poorly. Please test and
   report your results. The file is in the attachment.
  
   It was not difficult at all but also not so straight-forward to make
   this training data file, so probably this process deserves a separate
   article and later I'd like to post it in my blog.
  
   Warm regards,
   Dmitry Silaev
  
  
  
  
  
   On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com
 wrote:
Helo list,
I can't find a solution for special chars
   
I installed tesseract 3 in my MacOSX 10.6
It is running very well
   
But I'm having problems with charset.
I need tesseract working with brazillian portuguese. (ISO8859-1)
   
I installed the portuguese dictionary but is not working with
 special
chars like  Ç Ã É é   (ISO8859-1)
Is there any solution ?
   
There is an old dictionary special for brazilian portuguese in
version
2.0.4. Is it possible to use in version 3? How?
   
   
--
You received this message because you are subscribed to the Google
Groups tesseract-ocr group.
To post to this group, send email to
 tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.
   
   
  
   --
   You received this message because you are subscribed to the Google
   Groups
   tesseract-ocr group.
   To post to this group, send email to tesseract-ocr@googlegroups.com.
   To unsubscribe from this group, send email to
   tesseract-ocr+unsubscr...@googlegroups.com.
   For more options, visit this group at
   http://groups.google.com/group/tesseract-ocr?hl=en.
  
  
   --
   You received this message because you are subscribed to the Google
   Groups
   tesseract-ocr group.
   To post to this group, send email to tesseract-ocr@googlegroups.com.
   To unsubscribe from this group, send email 

Re: Especial Characteres

2011-03-03 Thread manuel...@gmail.com
Hi Dmitry,

I just replaced with your file por.traineddata
But I'm getting an error:

manuel$ tesseract input.tiff output -l por
actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in 
file tessdatamanager.cpp, line 55
Segmentation fault

It's seem to be interesting to convert old files from 2.0X to 3, because there 
isn't a brazillian portuguese for version 3,  just portuguese. 
At least the dictionary por.traineeddata is working correctly in version 3.
The special chars is being recognized by tesseract 3.

regards,
Manuel Pardo




Em 03/03/2011, às 09:12, Dmitry Silaev escreveu:

 Manuel,
 
 It's quite an interesting question although it may seem to be an
 ordinary newbie-like one.
 
 I was always wondering if 2.xx files can be used with version 3.xx.
 The wiki states that the files in the traineddata file are different
 from the list used prior to 3.00, and will most likely change,
 possibly dramatically in future revisions.
 
 I have no time to investigate it in the code so I decided to act
 rather than to think. After some tinkering with all those files I
 slipped the resulted por.traineddata into my Tesseract algo I'm
 currently working at, and - guess what? - it worked! ))
 
 I must say it was tested only with a couple of *very simple* images
 and also it absolutely lacks any dictionary-related data. And my test
 images don't contain these specific Portuguese letters with
 diacritics. So in fact this file may perform poorly. Please test and
 report your results. The file is in the attachment.
 
 It was not difficult at all but also not so straight-forward to make
 this training data file, so probably this process deserves a separate
 article and later I'd like to post it in my blog.
 
 Warm regards,
 Dmitry Silaev
 
 
 
 
 
 On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote:
 Helo list,
 I can't find a solution for special chars
 
 I installed tesseract 3 in my MacOSX 10.6
 It is running very well
 
 But I'm having problems with charset.
 I need tesseract working with brazillian portuguese. (ISO8859-1)
 
 I installed the portuguese dictionary but is not working with special
 chars like  Ç Ã É é   (ISO8859-1)
 Is there any solution ?
 
 There is an old dictionary special for brazilian portuguese in version
 2.0.4. Is it possible to use in version 3? How?
 
 
 --
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to 
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/tesseract-ocr?hl=en.
 
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to 
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/tesseract-ocr?hl=en.
 
 por.traineddata

-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.



Re: Especial Characteres

2011-03-03 Thread Dmitry Silaev
Manuel,

Is the error message generated by version 2.xx? Did you try to run
version 3.xx with my por.traineddata file?
I don't get it - have you succeeded or not?
Please provide us with the image you are trying to recognize.

Warm regards,
Dmitry Silaev





On Thu, Mar 3, 2011 at 5:34 PM, manuel...@gmail.com manuel...@gmail.com wrote:
 Hi Dmitry,

 I just replaced with your file por.traineddata
 But I'm getting an error:

 manuel$ tesseract input.tiff output -l por
 actual_tessdata_num_entries_ = TESSDATA_NUM_ENTRIES:Error:Assert failed:in 
 file tessdatamanager.cpp, line 55
 Segmentation fault

 It's seem to be interesting to convert old files from 2.0X to 3, because 
 there isn't a brazillian portuguese for version 3,  just portuguese.
 At least the dictionary por.traineeddata is working correctly in version 3.
 The special chars is being recognized by tesseract 3.

 regards,
 Manuel Pardo




 Em 03/03/2011, às 09:12, Dmitry Silaev escreveu:

 Manuel,

 It's quite an interesting question although it may seem to be an
 ordinary newbie-like one.

 I was always wondering if 2.xx files can be used with version 3.xx.
 The wiki states that the files in the traineddata file are different
 from the list used prior to 3.00, and will most likely change,
 possibly dramatically in future revisions.

 I have no time to investigate it in the code so I decided to act
 rather than to think. After some tinkering with all those files I
 slipped the resulted por.traineddata into my Tesseract algo I'm
 currently working at, and - guess what? - it worked! ))

 I must say it was tested only with a couple of *very simple* images
 and also it absolutely lacks any dictionary-related data. And my test
 images don't contain these specific Portuguese letters with
 diacritics. So in fact this file may perform poorly. Please test and
 report your results. The file is in the attachment.

 It was not difficult at all but also not so straight-forward to make
 this training data file, so probably this process deserves a separate
 article and later I'd like to post it in my blog.

 Warm regards,
 Dmitry Silaev





 On Wed, Mar 2, 2011 at 8:40 PM, manuelfhp manuel...@gmail.com wrote:
 Helo list,
 I can't find a solution for special chars

 I installed tesseract 3 in my MacOSX 10.6
 It is running very well

 But I'm having problems with charset.
 I need tesseract working with brazillian portuguese. (ISO8859-1)

 I installed the portuguese dictionary but is not working with special
 chars like  Ç Ã É é   (ISO8859-1)
 Is there any solution ?

 There is an old dictionary special for brazilian portuguese in version
 2.0.4. Is it possible to use in version 3? How?


 --
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to 
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/tesseract-ocr?hl=en.



 --
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to 
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/tesseract-ocr?hl=en.

 por.traineddata

 --
 You received this message because you are subscribed to the Google Groups 
 tesseract-ocr group.
 To post to this group, send email to tesseract-ocr@googlegroups.com.
 To unsubscribe from this group, send email to 
 tesseract-ocr+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/tesseract-ocr?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
tesseract-ocr group.
To post to this group, send email to tesseract-ocr@googlegroups.com.
To unsubscribe from this group, send email to 
tesseract-ocr+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.