[ocropus] Re: OCRopus 0.7 released

Tom Wed, 01 May 2013 18:01:49 -0700

>
>  First, if I remember me correctly, previous versions of ocropus were able 
> to use tesseract 3.02 training files. Is it possible to train ocropus0.7 
> with these files, too?



OCRopus 0.7 doesn't need to be trained with individual characters, so you 
don't really need the Tesseract training files. But you should be able to 
use the scans that those files were derived from easily.
 

> Second, the fraktur example does not support  'long-s', therefore words 
> like 
>
> 'Wachstube' vs. 'Wachſtube' could be problematic in historical texts.
>
It should support long-s, but it doesn't encode it separately in the output.
 

> Because in my personal project I digitize a book from 20th century with 
> fraktur I could send you some full corrected wordlists and pages.
>
I would recommend just recognizing it with the default Fraktur model and 
choosing long/short s based on context; there are very few cases where the 
choice can't be made programmatically, and you should be able to find those 
with a simple script.
 
Tom

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msg/ocropus/-/S2fSO3rPSOAJ.
For more options, visit https://groups.google.com/groups/opt_out.

[ocropus] Re: OCRopus 0.7 released

Reply via email to