> > First, if I remember me correctly, previous versions of ocropus were able > to use tesseract 3.02 training files. Is it possible to train ocropus0.7 > with these files, too?
OCRopus 0.7 doesn't need to be trained with individual characters, so you don't really need the Tesseract training files. But you should be able to use the scans that those files were derived from easily. > Second, the fraktur example does not support 'long-s', therefore words > like > > 'Wachstube' vs. 'Wachſtube' could be problematic in historical texts. > It should support long-s, but it doesn't encode it separately in the output. > Because in my personal project I digitize a book from 20th century with > fraktur I could send you some full corrected wordlists and pages. > I would recommend just recognizing it with the default Fraktur model and choosing long/short s based on context; there are very few cases where the choice can't be made programmatically, and you should be able to find those with a simple script. Tom -- You received this message because you are subscribed to the Google Groups "ocropus" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/S2fSO3rPSOAJ. For more options, visit https://groups.google.com/groups/opt_out.
