I tried higher resolution images, and get the same error. In
particular using the following dataset
http://yaroslavvb.com/upload/ocropus/dataset/

I issue command
ocropus trainseg model.simple dataset

And get
dataset/0000/0000.gt.txt: transcript doesn't agree with cseg
(transcript 1, cseg 0) FIXME


On May 31, 1:27 pm, Thomas Breuel <[email protected]> wrote:
> > and get errors as below for each training file
> > dataset/0000/0636.gt.txt: transcript doesn't agree with cseg
> > (transcript 1, cseg 0) FIXME
>
> This means that the transcript contains one character and the cseg
> contains 0 characters.
>
> Why does the cseg contain zero characters?  Because your images appear
> to be so low resolution that the noise filter just removes the few
> bits that are in your image.
>
> If you really want to train on such low resolution images, you have two 
> options:
>
> * figure out which part of OCRopus is removing the bits and turn it
> off (noise removal happens in several places, and I'm not sure which
> one is responsible for this)
>
> * write your own top-level loop to train the characters directly (by
> copying and then greatly simplifying linerec.cc)
>
> BTW, the "FIXME" comment is there because we changed the
> representation of cseg files a little and that occasionally triggers
> this exception; however, in your case, the exception is really due to
> the bits getting deleted, rather than the changed cseg file.
>
> Tom
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to