Hello everybody, 

I just finished fine tuning according to Ray's tutorial.

I did the following steps: 

   1.  I used tesstrain.sh to create training data and the starter 
   traineddata. The training data consists of the eng.training_text with the 
   multiple times added ± character. 
   
   2. I used combine_tessdata to extract the eng.lstm from the best 
   eng.traineddata
   
   3. I used lstmtraining with the extracted eng.lstm, the starter 
   traineddata from step1 to train the model
   This is the end of training: 
   

*At iteration 1264/3000/3000, mean rms=0.202%, delta=0.003%, BCER 
   train=0.020%, BWER train=0.072%, skip ratio=0.000%, New worst BCER = 0.020 
   wrote checkpoint. Finished! Selected model with minimal training error rate 
   (BCER) = 0.017 *
   4. Then I made a Screenshot of a textline with the same Font I created 
   the training data with and ran tesseract with the finished traineddata. 
   (also the text is 1:1 in the training daa
   This is the text in the image
   
*New Articles page ± 23 a To Service ~~ a details DC that don't *
   This is the result with the freshly trained model:
   
*Ne Artic(Tes page = 23 aa To Bervice ww a detHiTs Dc that don lt *
   When I use the best eng.traineddata model I get this output:
   *New Articles page = 23 a To Service ~~ a details DC that don't*
      
Can someone explain why I get such a bad result? The training seems fine. I 
don't get any error messages. Everything I get back from my "fine tuned" 
model is absolute crap and way worse than the original one. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a0896f38-6190-4e29-8cd9-44713e6ccd1en%40googlegroups.com.

Reply via email to