[tesseract-ocr] LSTM training tesseract OCR high error rate

Mridul Davesar Tue, 12 Mar 2024 01:43:33 -0700

Hey everyone ,
I am train my own lstm model based using some specific images that I want 
tesseract to work efficiently on. I have used the command  
*$ lstmtraining --model_output=my_output.lstm --traineddata="C:\Program 
Files\Tesseract-OCR\tessdata\eng.traineddata" --old_traineddata="C:\Program 
Files\Tesseract-OCR\tessdata\eng.traineddata" 
--train_listfile=traindata.txt*


but it is giving I high error rate 
*At iteration 40/40/40, Mean rms=5.874000%, delta=47.785000%, BCER 
train=99.487000%, BWER train=100.000000%, skip ratio=0.000000%,  New worst 
BCER = 99.487000 wrote checkpoint.*

Finished! Selected model with minimal training error rate (BCER) = 99.367

So my questions is What is the reason for this high error rate as my file 
contains normal english sentences.
I think maybe my custom model is not leveraging the preptrained "eng.lstm"  
model 

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/37482650-601d-4a14-b3ed-2add4fc53ca4n%40googlegroups.com.

[tesseract-ocr] LSTM training tesseract OCR high error rate

Reply via email to