[tesseract-ocr] Re: Reading Device labels to get model number

2014-11-13 Thread shree
also take a look at the pre-processing method mentioned 
at https://github.com/tleyden/open-ocr/wiki/Stroke-Width-Transform-In-Action

On Thursday, November 13, 2014 3:30:03 AM UTC+5:30, Bill Garrison wrote:
>
> So if someone sends in labels like the attached ones, I need to grab the 
> model number. So far results from straight tesseract usage are dismal. I 
> used an ImageMagick library to clean up the image a bit and send it in and 
> if its rotated at ALL the results are still dismal. Overall, I am just 
> looking to increase accuracy. 
>
> Steps I have taken:
>
> 1) Using pre-processing library to clean up image
> 2) Added a new config that turns off dictionary and calls in a words file 
> that has all the different samsung model numbers in it
> 3) tried to take my most promising pre-processed image and create a box 
> file and then used "tesseract   nobatch 
> box.train" to train tesseract to not miss the two characters it missed 
> this caused a segmentation fault. 
>
> Any hints or advice about how I can use tesseract to grab this information 
> with at least 50% accuracy would be GREATLY appreciated. 
>
> Thanks!!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6ca66acf-481b-4b39-b18f-e7fbe832c265%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Reading Device labels to get model number

2014-11-13 Thread Allistair C
Do you have higher resolution images to work with - that's one issue going 
on here as the edges of your text are very fuzzy and at that resolution 
it's pretty hard for Tesseract. You can also play with Thresholding and 
Opening (Erosion/Dilation) to thicken some of your lines up (using e.g. 
ImageMagick or OpenCV) prior to Tesseract.

On Wednesday, 12 November 2014 22:00:03 UTC, Bill Garrison wrote:
>
> So if someone sends in labels like the attached ones, I need to grab the 
> model number. So far results from straight tesseract usage are dismal. I 
> used an ImageMagick library to clean up the image a bit and send it in and 
> if its rotated at ALL the results are still dismal. Overall, I am just 
> looking to increase accuracy. 
>
> Steps I have taken:
>
> 1) Using pre-processing library to clean up image
> 2) Added a new config that turns off dictionary and calls in a words file 
> that has all the different samsung model numbers in it
> 3) tried to take my most promising pre-processed image and create a box 
> file and then used "tesseract   nobatch 
> box.train" to train tesseract to not miss the two characters it missed 
> this caused a segmentation fault. 
>
> Any hints or advice about how I can use tesseract to grab this information 
> with at least 50% accuracy would be GREATLY appreciated. 
>
> Thanks!!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7ee51e00-3133-4e8a-b9e2-ff1c78f8bb76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Reading Device labels to get model number

2014-11-13 Thread Allistair C
I think the table lines are not helping. 

I up-sized your image to 1000px wide, then ran into Tesseract with PSM=6 
and got mostly rubbish.

Then I removed the table lines manually in Photoshop, then up-sized your 
image to 1000px wide, then ran into Tesseract with PSM=6:

RFZBHMEDBSR

R 134a/ 160 g(5.64 oz)

AC 115 VI 60 Hz
6.0 A

230 PSI I 103 PSI
NOV. 2013
35 96 x 36 % x 70

Food for thought.

On Wednesday, 12 November 2014 22:00:03 UTC, Bill Garrison wrote:
>
> So if someone sends in labels like the attached ones, I need to grab the 
> model number. So far results from straight tesseract usage are dismal. I 
> used an ImageMagick library to clean up the image a bit and send it in and 
> if its rotated at ALL the results are still dismal. Overall, I am just 
> looking to increase accuracy. 
>
> Steps I have taken:
>
> 1) Using pre-processing library to clean up image
> 2) Added a new config that turns off dictionary and calls in a words file 
> that has all the different samsung model numbers in it
> 3) tried to take my most promising pre-processed image and create a box 
> file and then used "tesseract   nobatch 
> box.train" to train tesseract to not miss the two characters it missed 
> this caused a segmentation fault. 
>
> Any hints or advice about how I can use tesseract to grab this information 
> with at least 50% accuracy would be GREATLY appreciated. 
>
> Thanks!!
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/51003f99-8792-4ff1-b21e-70922ce87809%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.