Yes, you need to add them before you create the starter model. You can edit
the Latin.unicarset before you run the combine command.

On Fri, Jan 19, 2024, 5:27 PM Simon <smong5...@gmail.com> wrote:

> Ok somehow I had "no entry point found" errors in the dll files.
> Reinstallation of Tesseract solved the Problem.
>
> Now I encounter another interesting Problem.
>
> combine_lang_model --input_unicharset
> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/Latin.unicharset
> --script_dir
> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng --lang
> --output_dir C:/Users/LCAdmin/Documents/FineTuning/output
>
> When I run this command Tesseract tries to load many unicharsets. I don't
> understand why it tries to. It doesn't make any sense to me.
> Whats the reason for loading all these unicharsets:
>
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Latin.unicharset
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Inherited.unicharset
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Unknown.unicharset
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Greek.unicharset
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Armenian.unicharset
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Arabic.unicharset
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Devanagari.unicharset
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Gujarati.unicharset
> Failed to load script unicharset
> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Bopomofo.unicharset
>
> when I only want to train the english model?
>
> Also another question arised:
> When I try to train some new characters do I have to add them to the
> Latin.unicharset before I create the starter traineddata or do I just add
> these characters to the created unicharset after I created starter
> traineddata?
>
> Simon schrieb am Freitag, 19. Januar 2024 um 10:38:24 UTC+1:
>
>> Here is a link to the Website of Uni Mannheim: COMBINE_LANG_MODEL -
>> generate starter traineddata
>> <https://digi.bib.uni-mannheim.de/tesseract/manuals/combine_lang_model.1.html>
>>
>> Unfortunately the command doesn't create any files and after running the
>> command I don't get any Feedback on why the command didn't work properly.
>> Even when I porposely use non existent paths I still get no error message!
>>
>> PS C:\Windows\system32> combine_lang_model --input_unicharset
>> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/Latin.unicharset
>> --script_dir
>> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng  --lang eng
>> --wordlist
>> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/eng.wordlist
>> --output_dir C:/Users/LCAdmin/Documents/FineTuning/output
>> PS C:\Users\LCAdmin\Documents\FineTuning>
>>
>> PS C:\Users\LCAdmin\Documents\FineTuning> combine_lang_model
>> --input_unicharset tesstutorial/langdata/Latin.unicharset --script_dir
>> tesstutorial/langdata/eng  --lang eng --wordlist
>> asdfasfdef/langdata/eng/eng.wordlist --output_dir output
>> PS C:\Users\LCAdmin\Documents\FineTuning>
>>
>> Does anyone have an idea how I can get insights in some log messages or
>> something that could give me more insights on why it didn't work?
>>
>>
>>
>> Simon schrieb am Donnerstag, 18. Januar 2024 um 11:11:52 UTC+1:
>>
>>> Hello everybody,
>>>
>>> I have a question regarding "Fine Tuning +- a few characters".
>>>
>>> In general the instructions on
>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#fine-tuning-for--a-few-characters
>>> say that you have to make a starter traineddata from the unicharset, but is
>>> this also required if I want to fine tune?
>>>
>>> Furthermore I have absolutely no idea how I can create a starter
>>> traineddata. I read the "creating starter traineddata" chapter but I have
>>> absolutely no clue how I do that. This site is supposed to be a tutorial,
>>> therefore I expect a step for step instruction.
>>>
>>> Can anyone help me with this?
>>>
>>> I am a newby at tersseract training, so I would appreciate any help.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/31a0381f-f407-43d7-a9a1-8450394c20fcn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/31a0381f-f407-43d7-a9a1-8450394c20fcn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BLi4kARNk1Jw5VJixyuXE9dtvQVfJTGjasrmd%3DsuDCWWx7vnA%40mail.gmail.com.

Reply via email to