Yes, you need to add them before you create the starter model. You can edit the Latin.unicarset before you run the combine command.
On Fri, Jan 19, 2024, 5:27 PM Simon <smong5...@gmail.com> wrote: > Ok somehow I had "no entry point found" errors in the dll files. > Reinstallation of Tesseract solved the Problem. > > Now I encounter another interesting Problem. > > combine_lang_model --input_unicharset > C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/Latin.unicharset > --script_dir > C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng --lang > --output_dir C:/Users/LCAdmin/Documents/FineTuning/output > > When I run this command Tesseract tries to load many unicharsets. I don't > understand why it tries to. It doesn't make any sense to me. > Whats the reason for loading all these unicharsets: > > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Latin.unicharset > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Inherited.unicharset > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Unknown.unicharset > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Greek.unicharset > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Armenian.unicharset > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Arabic.unicharset > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Devanagari.unicharset > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Gujarati.unicharset > Failed to load script unicharset > from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Bopomofo.unicharset > > when I only want to train the english model? > > Also another question arised: > When I try to train some new characters do I have to add them to the > Latin.unicharset before I create the starter traineddata or do I just add > these characters to the created unicharset after I created starter > traineddata? > > Simon schrieb am Freitag, 19. Januar 2024 um 10:38:24 UTC+1: > >> Here is a link to the Website of Uni Mannheim: COMBINE_LANG_MODEL - >> generate starter traineddata >> <https://digi.bib.uni-mannheim.de/tesseract/manuals/combine_lang_model.1.html> >> >> Unfortunately the command doesn't create any files and after running the >> command I don't get any Feedback on why the command didn't work properly. >> Even when I porposely use non existent paths I still get no error message! >> >> PS C:\Windows\system32> combine_lang_model --input_unicharset >> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/Latin.unicharset >> --script_dir >> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng --lang eng >> --wordlist >> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/eng.wordlist >> --output_dir C:/Users/LCAdmin/Documents/FineTuning/output >> PS C:\Users\LCAdmin\Documents\FineTuning> >> >> PS C:\Users\LCAdmin\Documents\FineTuning> combine_lang_model >> --input_unicharset tesstutorial/langdata/Latin.unicharset --script_dir >> tesstutorial/langdata/eng --lang eng --wordlist >> asdfasfdef/langdata/eng/eng.wordlist --output_dir output >> PS C:\Users\LCAdmin\Documents\FineTuning> >> >> Does anyone have an idea how I can get insights in some log messages or >> something that could give me more insights on why it didn't work? >> >> >> >> Simon schrieb am Donnerstag, 18. Januar 2024 um 11:11:52 UTC+1: >> >>> Hello everybody, >>> >>> I have a question regarding "Fine Tuning +- a few characters". >>> >>> In general the instructions on >>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#fine-tuning-for--a-few-characters >>> say that you have to make a starter traineddata from the unicharset, but is >>> this also required if I want to fine tune? >>> >>> Furthermore I have absolutely no idea how I can create a starter >>> traineddata. I read the "creating starter traineddata" chapter but I have >>> absolutely no clue how I do that. This site is supposed to be a tutorial, >>> therefore I expect a step for step instruction. >>> >>> Can anyone help me with this? >>> >>> I am a newby at tersseract training, so I would appreciate any help. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/31a0381f-f407-43d7-a9a1-8450394c20fcn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/31a0381f-f407-43d7-a9a1-8450394c20fcn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BLi4kARNk1Jw5VJixyuXE9dtvQVfJTGjasrmd%3DsuDCWWx7vnA%40mail.gmail.com.