FuzzyOCR a little too fuzzy

2006-12-11 Thread Nigel Kendrick
FuzzyOcr is proving to be useful but it does seem to be a bit too 'Fuzzy' at
times...

[2006-12-08 13:27:47] Debug mode: Found word best in line
   shotermprcetargetoo
   with fuzz of 0.25 scanned with scanset /usr/bin/gocr
-i -

[2006-12-08 13:27:47] Debug mode: Found word best in line
 
quantumenergyincspleasedtoannouncethatthasappiedtohavetssharesistedfor
   with fuzz of 0.25 scanned with scanset /usr/bin/gocr
-i -

[2006-12-08 13:27:47] Debug mode: Found word best in line
 
tradngonthefrankfustockexchangethecompanyhasretanedthesewcesofbaltc
   with fuzz of 0.25 scanned with scanset /usr/bin/gocr
-i -

[2006-12-08 13:27:47] Debug mode: Found word best in line
 
investmentgroupofhamburggermanytoassstwththeappicaton
   with fuzz of 0.25 scanned with scanset /usr/bin/gocr
-i -

[2006-12-08 13:27:47] Debug mode: Found word cheap in line
 
investmentgroupofhamburggermanytoassistwiththeapplication
   with fuzz of 0.2 scanned with scanset /usr/bin/gocr
-l 180 -d 2 -i -

[2006-12-08 13:27:47] Debug mode: Found word revista in line
 
quantumenergyincspleasedtoannouncethatthasappiedtohavetssharesistedfor

[2006-12-08 13:27:47] Debug mode: Found word alert in line
 
quantumenergyincspleasedtoannouncethatthasappiedtohavetssharesistedfor
   with fuzz of 0.2 scanned with scanset /usr/bin/gocr
-i -

[2006-12-08 13:27:47] Debug mode: Found word alert in line
 
tradngonthefrankfustockexchangethecompanyhasretanedthesewcesofbaltc

[2006-12-08 13:27:47] Debug mode: Found word investor in line
 
investmentgroupofhamburggermanytoassistwiththeapplication
   with fuzz of 0.25 scanned with scanset /usr/bin/gocr
-l 180 -d 2 -i -

[2006-12-08 13:27:47] Debug mode: Found word meridia in line
   redytoriiibigmmmeriii
   with fuzz of 0.285714285714286 scanned with scanset
/usr/bin/gocr -i -


Any suggestions?

Thanks




Re: FuzzyOCR a little too fuzzy

2006-12-11 Thread Matthias Keller

Nigel Kendrick wrote:

FuzzyOcr is proving to be useful but it does seem to be a bit too 'Fuzzy' at
times...

First of all, try lowering the focr_threshold to 0.25 or even lower
Secondly, add custom thresholds for the rules that misfire
For example change the line with 'best' to
best::0.2
So that it only fits if 'best' is found with a fuzz *below* 0.2

Matt