Try san_latn.traineddata <https://github.com/Shreeshrii/tessdata4alpha/blob/master/best/san_latn.traineddata> from https://github.com/Shreeshrii/tessdata4alpha/tree/master/best
On Tuesday, August 29, 2017 at 12:19:10 PM UTC+5:30, Anand Akella wrote: > > Hi, > Im new to tesseract and have a pdf file with diacritical marks. I tried to > run tesseract 4.0.0 with language eng. I see that it is not able to > recognize the text with diacritical marks. I found a font that can detect > diacritical mark. > > Gandhari Unicode 5.1 > <http://andrewglass.org/download.php?fname=gu5-110_ttf&extn=zip> > > I tried to extract the fonts files and copied to > /home/tesseract/Downloads/fonts > > Whenever i try to run tesstrain.sh it gives me an error "could not find > font named gandhariunicode" > > ./tesstrain.sh --fontlist 'gandhariunicode' --fonts_dir > /home/tesseract/Downloads/fonts/ --lang eng --langdata_dir > /usr/local/share/tessdata/ --overwrite > > === Starting training for language 'eng' > [Mon Aug 28 23:18:12 PDT 2017] /usr/local/bin/text2image > --fonts_dir=/home/tesseract/Downloads/fonts/ --font=gandhariunicode > --outputbase=/tmp/font_tmp.C9vSySTfge/sample_text.txt > --text=/tmp/font_tmp.C9vSySTfge/sample_text.txt > --fontconfig_tmpdir=/tmp/font_tmp.C9vSySTfge > Could not find font named gandhariunicode. > Pango suggested font Gandhari Unicode. > Please correct --font arg. > > === Phase I: Generating training images === > ERROR: Could not find training text file > /usr/local/share/tessdata//eng/eng.training_text > > What could the issue please let me know. Thanks in advance. > > Thanks, > Anand > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9339d6c4-029f-4afc-90f9-2998a1b1b088%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.