Hello everybody, I just finished fine tuning according to Ray's tutorial.
I did the following steps: 1. I used tesstrain.sh to create training data and the starter traineddata. The training data consists of the eng.training_text with the multiple times added ± character. 2. I used combine_tessdata to extract the eng.lstm from the best eng.traineddata 3. I used lstmtraining with the extracted eng.lstm, the starter traineddata from step1 to train the model This is the end of training: *At iteration 1264/3000/3000, mean rms=0.202%, delta=0.003%, BCER train=0.020%, BWER train=0.072%, skip ratio=0.000%, New worst BCER = 0.020 wrote checkpoint. Finished! Selected model with minimal training error rate (BCER) = 0.017 * 4. Then I made a Screenshot of a textline with the same Font I created the training data with and ran tesseract with the finished traineddata. (also the text is 1:1 in the training daa This is the text in the image *New Articles page ± 23 a To Service ~~ a details DC that don't * This is the result with the freshly trained model: *Ne Artic(Tes page = 23 aa To Bervice ww a detHiTs Dc that don lt * When I use the best eng.traineddata model I get this output: *New Articles page = 23 a To Service ~~ a details DC that don't* Can someone explain why I get such a bad result? The training seems fine. I don't get any error messages. Everything I get back from my "fine tuned" model is absolute crap and way worse than the original one. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a0896f38-6190-4e29-8cd9-44713e6ccd1en%40googlegroups.com.