[tesseract-ocr] Re: Reading Device labels to get model number
also take a look at the pre-processing method mentioned at https://github.com/tleyden/open-ocr/wiki/Stroke-Width-Transform-In-Action On Thursday, November 13, 2014 3:30:03 AM UTC+5:30, Bill Garrison wrote: > > So if someone sends in labels like the attached ones, I need to grab the > model number. So far results from straight tesseract usage are dismal. I > used an ImageMagick library to clean up the image a bit and send it in and > if its rotated at ALL the results are still dismal. Overall, I am just > looking to increase accuracy. > > Steps I have taken: > > 1) Using pre-processing library to clean up image > 2) Added a new config that turns off dictionary and calls in a words file > that has all the different samsung model numbers in it > 3) tried to take my most promising pre-processed image and create a box > file and then used "tesseract nobatch > box.train" to train tesseract to not miss the two characters it missed > this caused a segmentation fault. > > Any hints or advice about how I can use tesseract to grab this information > with at least 50% accuracy would be GREATLY appreciated. > > Thanks!! > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6ca66acf-481b-4b39-b18f-e7fbe832c265%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: Reading Device labels to get model number
Do you have higher resolution images to work with - that's one issue going on here as the edges of your text are very fuzzy and at that resolution it's pretty hard for Tesseract. You can also play with Thresholding and Opening (Erosion/Dilation) to thicken some of your lines up (using e.g. ImageMagick or OpenCV) prior to Tesseract. On Wednesday, 12 November 2014 22:00:03 UTC, Bill Garrison wrote: > > So if someone sends in labels like the attached ones, I need to grab the > model number. So far results from straight tesseract usage are dismal. I > used an ImageMagick library to clean up the image a bit and send it in and > if its rotated at ALL the results are still dismal. Overall, I am just > looking to increase accuracy. > > Steps I have taken: > > 1) Using pre-processing library to clean up image > 2) Added a new config that turns off dictionary and calls in a words file > that has all the different samsung model numbers in it > 3) tried to take my most promising pre-processed image and create a box > file and then used "tesseract nobatch > box.train" to train tesseract to not miss the two characters it missed > this caused a segmentation fault. > > Any hints or advice about how I can use tesseract to grab this information > with at least 50% accuracy would be GREATLY appreciated. > > Thanks!! > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7ee51e00-3133-4e8a-b9e2-ff1c78f8bb76%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: Reading Device labels to get model number
I think the table lines are not helping. I up-sized your image to 1000px wide, then ran into Tesseract with PSM=6 and got mostly rubbish. Then I removed the table lines manually in Photoshop, then up-sized your image to 1000px wide, then ran into Tesseract with PSM=6: RFZBHMEDBSR R 134a/ 160 g(5.64 oz) AC 115 VI 60 Hz 6.0 A 230 PSI I 103 PSI NOV. 2013 35 96 x 36 % x 70 Food for thought. On Wednesday, 12 November 2014 22:00:03 UTC, Bill Garrison wrote: > > So if someone sends in labels like the attached ones, I need to grab the > model number. So far results from straight tesseract usage are dismal. I > used an ImageMagick library to clean up the image a bit and send it in and > if its rotated at ALL the results are still dismal. Overall, I am just > looking to increase accuracy. > > Steps I have taken: > > 1) Using pre-processing library to clean up image > 2) Added a new config that turns off dictionary and calls in a words file > that has all the different samsung model numbers in it > 3) tried to take my most promising pre-processed image and create a box > file and then used "tesseract nobatch > box.train" to train tesseract to not miss the two characters it missed > this caused a segmentation fault. > > Any hints or advice about how I can use tesseract to grab this information > with at least 50% accuracy would be GREATLY appreciated. > > Thanks!! > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/51003f99-8792-4ff1-b21e-70922ce87809%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.