Hello everyone, I've been successfully fine-tuning the eng.traineddata model with smaller datasets, but when I try to scale up to a larger dataset to include a more diverse range of documents, I encounter an unusual error. The training process starts, but it immediately reports a negative Mean RMS error, which seems to be an anomaly.
Environment Tesseract Version: 4.1.3 Platform: Ubuntu 20.04 I run the following command for fine-tuning: lstmtraining --debug_interval 0 --traineddata tesstrain/data/experiments/5PX1000D_rs42/model_eng_psm7_mi100000_5PX1000D_rs42/model_eng_psm7_mi100000_5PX1000D_rs42.traineddata --old_traineddata tesstrain/src/tessdata_best/eng.traineddata --continue_from tesstrain/data/experiments/5PX1000D_rs42/model_eng_psm7_mi100000_5PX1000D_rs42/model_eng_psm7_mi100000_5PX1000D_rs42.lstm --model_output tesstrain/data/experiments/5PX1000D_rs42/model_eng_psm7_mi100000_5PX1000D_rs42/checkpoints/model_eng_psm7_mi100000_5PX1000D_rs42 --train_listfile tesstrain/data/experiments/5PX1000D_rs42/list.train --eval_listfile tesstrain/data/experiments/5PX1000D_rs42/list.eval --max_iterations 100000 --target_error_rate 0.01 The output I'm wondering about is : At iteration 1/600/600, Mean rms=-2147483.6%, delta=0.033%, char train=275.696%, word train=100%, skip ratio=0%, New worst char error = 275.696 wrote checkpoint. I expected the training process to proceed normally with the Mean RMS error showing sensible values, similar to when training on smaller datasets. When I use around 100k lstmf files it doesn't have this behaviour but with 400k this happens. Am I looking in the wrong direction or missing something ? I tried to look for something similar in the groups and discussions but couldn't find anything. Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a2b1aa46-66d1-4a12-883c-afeac315cdc2n%40googlegroups.com.