I tried higher resolution images, and get the same error. In particular using the following dataset http://yaroslavvb.com/upload/ocropus/dataset/
I issue command ocropus trainseg model.simple dataset And get dataset/0000/0000.gt.txt: transcript doesn't agree with cseg (transcript 1, cseg 0) FIXME On May 31, 1:27 pm, Thomas Breuel <[email protected]> wrote: > > and get errors as below for each training file > > dataset/0000/0636.gt.txt: transcript doesn't agree with cseg > > (transcript 1, cseg 0) FIXME > > This means that the transcript contains one character and the cseg > contains 0 characters. > > Why does the cseg contain zero characters? Because your images appear > to be so low resolution that the noise filter just removes the few > bits that are in your image. > > If you really want to train on such low resolution images, you have two > options: > > * figure out which part of OCRopus is removing the bits and turn it > off (noise removal happens in several places, and I'm not sure which > one is responsible for this) > > * write your own top-level loop to train the characters directly (by > copying and then greatly simplifying linerec.cc) > > BTW, the "FIXME" comment is there because we changed the > representation of cseg files a little and that occasionally triggers > this exception; however, in your case, the exception is really due to > the bits getting deleted, rather than the changed cseg file. > > Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
