Ok, could you please be a little bit more precise? I learned "[21c6]" is the UTF-16 code. But where do I get the glyph information from and what does the 10 stand for?
Thanks for your patience I really appreciate your help :) elvi...@gmail.com schrieb am Samstag, 20. Januar 2024 um 14:19:33 UTC+1: > You need to look at it in the unicode list. > > On Sat, Jan 20, 2024, 3:50 PM Simon <smon...@gmail.com> wrote: > >> Hey thanks for the response! >> >> How exactly do I add characters to the unicharset? >> >> Typically the unicharset has to follow a specific pattern ( >> Tesseract-unicharset_uni-mannheim >> <https://digi.bib.uni-mannheim.de/tesseract/manuals/unicharset.5.html>) >> >> Here is an example of the Latin unicharset: >> >> ⇆ 0 24,76,166,249,122,224,6,30,136,256 Common 1600 10 1600 ⇆ # ⇆ [21c6 ] >> >> If I want to add for example this character "⌖" how would I know what >> numbers I need to put for the glyph information? >> >> And also what does the "10" and "[21c6]" mean? >> >> >> >> >> elvi...@gmail.com schrieb am Freitag, 19. Januar 2024 um 16:22:24 UTC+1: >> >>> Yes, you need to add them before you create the starter model. You can >>> edit the Latin.unicarset before you run the combine command. >>> >>> On Fri, Jan 19, 2024, 5:27 PM Simon <smon...@gmail.com> wrote: >>> >>>> Ok somehow I had "no entry point found" errors in the dll files. >>>> Reinstallation of Tesseract solved the Problem. >>>> >>>> Now I encounter another interesting Problem. >>>> >>>> combine_lang_model --input_unicharset >>>> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/Latin.unicharset >>>> >>>> --script_dir >>>> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng --lang >>>> --output_dir C:/Users/LCAdmin/Documents/FineTuning/output >>>> >>>> When I run this command Tesseract tries to load many unicharsets. I >>>> don't understand why it tries to. It doesn't make any sense to me. >>>> Whats the reason for loading all these unicharsets: >>>> >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Latin.unicharset >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Inherited.unicharset >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Unknown.unicharset >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Greek.unicharset >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Armenian.unicharset >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Arabic.unicharset >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Devanagari.unicharset >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Gujarati.unicharset >>>> Failed to load script unicharset >>>> from:C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/Bopomofo.unicharset >>>> >>>> when I only want to train the english model? >>>> >>>> Also another question arised: >>>> When I try to train some new characters do I have to add them to the >>>> Latin.unicharset before I create the starter traineddata or do I just add >>>> these characters to the created unicharset after I created starter >>>> traineddata? >>>> >>>> Simon schrieb am Freitag, 19. Januar 2024 um 10:38:24 UTC+1: >>>> >>>>> Here is a link to the Website of Uni Mannheim: COMBINE_LANG_MODEL - >>>>> generate starter traineddata >>>>> <https://digi.bib.uni-mannheim.de/tesseract/manuals/combine_lang_model.1.html> >>>>> >>>>> Unfortunately the command doesn't create any files and after running >>>>> the command I don't get any Feedback on why the command didn't work >>>>> properly. >>>>> Even when I porposely use non existent paths I still get no error >>>>> message! >>>>> >>>>> PS C:\Windows\system32> combine_lang_model --input_unicharset >>>>> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/Latin.unicharset >>>>> >>>>> --script_dir >>>>> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng --lang >>>>> eng >>>>> --wordlist >>>>> C:/Users/LCAdmin/Documents/FineTuning/tesstutorial/langdata/eng/eng.wordlist >>>>> >>>>> --output_dir C:/Users/LCAdmin/Documents/FineTuning/output >>>>> PS C:\Users\LCAdmin\Documents\FineTuning> >>>>> >>>>> PS C:\Users\LCAdmin\Documents\FineTuning> combine_lang_model >>>>> --input_unicharset tesstutorial/langdata/Latin.unicharset --script_dir >>>>> tesstutorial/langdata/eng --lang eng --wordlist >>>>> asdfasfdef/langdata/eng/eng.wordlist --output_dir output >>>>> PS C:\Users\LCAdmin\Documents\FineTuning> >>>>> >>>>> Does anyone have an idea how I can get insights in some log messages >>>>> or something that could give me more insights on why it didn't work? >>>>> >>>>> >>>>> >>>>> Simon schrieb am Donnerstag, 18. Januar 2024 um 11:11:52 UTC+1: >>>>> >>>>>> Hello everybody, >>>>>> >>>>>> I have a question regarding "Fine Tuning +- a few characters". >>>>>> >>>>>> In general the instructions on >>>>>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#fine-tuning-for--a-few-characters >>>>>> >>>>>> say that you have to make a starter traineddata from the unicharset, but >>>>>> is >>>>>> this also required if I want to fine tune? >>>>>> >>>>>> Furthermore I have absolutely no idea how I can create a starter >>>>>> traineddata. I read the "creating starter traineddata" chapter but I >>>>>> have >>>>>> absolutely no clue how I do that. This site is supposed to be a >>>>>> tutorial, >>>>>> therefore I expect a step for step instruction. >>>>>> >>>>>> Can anyone help me with this? >>>>>> >>>>>> I am a newby at tersseract training, so I would appreciate any help. >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/31a0381f-f407-43d7-a9a1-8450394c20fcn%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/31a0381f-f407-43d7-a9a1-8450394c20fcn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/91aeac2a-1e1a-439a-9f92-6abdda3dc695n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/91aeac2a-1e1a-439a-9f92-6abdda3dc695n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/702ec835-51a0-4ad1-a0f0-92b4a6e30a9fn%40googlegroups.com.