I have created a dataset with almost 200 million words. So there are about 20 million examples to train the model on if each image contains 10 words. Is it enough to get better results? under consideration, we have fine-tuned a model using 20 thousand examples and it did worse than the pre-trained model.
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/aa5866ff-e048-42dd-b49a-ce9c807fd12bn%40googlegroups.com.