Re: picture spams
I was quite sure that FuzzyOcr project is dead, because a few months ago I was trying to contact his author, Decoder, but no success. Probably he was very busy :) Fortunately, it seems He is very busy getting an advanced degree. He still manages to put out the occasional patch, and several others have done quite a lot of work on it. I've found a threat about rotated spam images at FuzzyOcr page [1]. Currently Decoder hasn't time to implement checking image rotation, but he will try to do it in the future. Now we can only work-around it, for example using the preprocessor/scanset settings. Who of you do rotate images in your FuzzyOcr? Do you use fixed degrees or detect the skew angle and rotate the image accordingly? Could you share this? I don't personally check for rotated images, since all my image spams get quite enough points from other things, so I don't need to make the extra effort. There was someone else a couple months ago that has a fairly long thread in the mailing list about his experiments with rotated images. Unfortunately I didn't save any of those in my local archive. But as best I recall, he was suggesting multiple scan sets at (I think) about 8 and 18 degrees each way. I remember there was some talk about rotations over a certain angle being difficult to detect, but I don't recall the exact details now. I think it was problems with font distortion doing the rotation, and someone else had some suggestions to get around that problem. All this is a little hazy in my memory. This was not too long after images spams started and they started trying to avoid OCR detection. They started doing rotated images, but it didn't seem to last too long. I'm guessing that they probably didn't get good hit rates from the spammer's "customers". Loren
Re: picture spams
[EMAIL PROTECTED] writes: > On Fri, 17 Aug 2007, Pawe? T?cza wrote: > >> I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten by >> that spam image. > > You can manually mark this picture as bad : > > # fuzzy-find --delete > # fuzzy-find --learn-spam Hi, Thanks for the hint! I believe that it's an effective method, but I have no time to learn my FuzzyOcr manually ;) Have a nice day, Pawel
Re: picture spams
"Loren Wilton" <[EMAIL PROTECTED]> writes: >> Hi Loren, >> >> I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten >> by that spam image. > > The normal scan setups for FuzzyOCR don't rotate the images, so will > in all probability miss a rotated image like this. These were quite > popular for a while and a couple of people developed scansets that > contained rotation as one of the preprocessing steps. I don't seem to > have saved any of the messages relating to that thread. As best I > recall they found that rotating 8 degrees or so worked well. Or maybe > it was 18. > > You can probably find info on the FuzzyOcr mailing list: Hi Loren, I was quite sure that FuzzyOcr project is dead, because a few months ago I was trying to contact his author, Decoder, but no success. Probably he was very busy :) Fortunately, it seems that FuzzyOcr project still is alive. It's a very good message for me, because it's really a very useful utility :) I've found a threat about rotated spam images at FuzzyOcr page [1]. Currently Decoder hasn't time to implement checking image rotation, but he will try to do it in the future. Now we can only work-around it, for example using the preprocessor/scanset settings. Who of you do rotate images in your FuzzyOcr? Do you use fixed degrees or detect the skew angle and rotate the image accordingly? Could you share this? Kind regards, Pawel [1] http://fuzzyocr.own-hero.net/ticket/408
Re: picture spams
Hi Loren, I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten by that spam image. The normal scan setups for FuzzyOCR don't rotate the images, so will in all probability miss a rotated image like this. These were quite popular for a while and a couple of people developed scansets that contained rotation as one of the preprocessing steps. I don't seem to have saved any of the messages relating to that thread. As best I recall they found that rotating 8 degrees or so worked well. Or maybe it was 18. You can probably find info on the FuzzyOcr mailing list: ___ Devel-spam mailing list [EMAIL PROTECTED] http://lists.own-hero.net/mailman/listinfo/devel-spam
Re: picture spams
On Fri, 17 Aug 2007, Pawe? T?cza wrote: I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten by that spam image. You can manually mark this picture as bad : # fuzzy-find --delete # fuzzy-find --learn-spam
Re: picture spams
"Loren Wilton" <[EMAIL PROTECTED]> writes: > FuzzyOcr should do a good job on something like that. > >Loren > >> http://dreams.741.com/spam.gif Hi Loren, I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten by that spam image. Here are the message headers: X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on anubis2.poczta.uw.edu.pl X-Spam-Level: x X-Spam-Status: No, score=1.3 required=5.0 tests=SB_GIF_AND_NO_URIS autolearn=disabled version=3.2.1 And here is a piece of output of `spamassassin -D`: [17547] dbg: FuzzyOcr: Starting FuzzyOcr... [17547] info: FuzzyOcr: Processing Message with ID "<[EMAIL PROTECTED]>" (Pawel Tecza <[EMAIL PROTECTED]> -> [EMAIL PROTECTED]) [17547] dbg: FuzzyOcr: fname: "spam.gif" => "spam.gif" [17547] dbg: message: decoding base64 [17547] info: FuzzyOcr: GIF: [342x434] spam.gif (9377) [17547] dbg: FuzzyOcr: Saved: /tmp/.spamassassin17547AFJ63Ztmp/spam.gif [17547] dbg: FuzzyOcr: Saved: /tmp/.spamassassin17547AFJ63Ztmp/raw.eml [17547] info: FuzzyOcr: Found: 1 images [17547] dbg: FuzzyOcr: Connecting to: dbi:mysql:database=FuzzyOcr;host=mysqlhost [17547] dbg: dbiplugin: Creating uncached database handle to 'database=FuzzyOcr;host=mysqlhost_fuzzyocr_fuzzyocr_AutoCommit=1_PrintError=1_Username=fuzzyocr' [17547] dbg: config: using "/var/lib/courier/.spamassassin" for user state dir [17547] dbg: FuzzyOcr: pfile => /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.pnm [17547] dbg: FuzzyOcr: efile => /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.err [17547] dbg: FuzzyOcr: Errors to: /tmp/.spamassassin17547AFJ63Ztmp/raw.err [17547] dbg: FuzzyOcr: File has Content-Type "image/gif" and File Extension "gif" [17547] info: FuzzyOcr: Found GIF header name="spam.gif" [17547] dbg: FuzzyOcr: Saved pid: 17671 [17671] dbg: FuzzyOcr: Exec : /usr/bin/giftext /tmp/.spamassassin17547AFJ63Ztmp/spam.gif [17671] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin17547AFJ63Ztmp/giftext.info [17671] dbg: FuzzyOcr: Stderr: >>/tmp/.spamassassin17547AFJ63Ztmp/giftext.err [17547] dbg: FuzzyOcr: Elapsed [17671]: 0.016500 sec. (/usr/bin/giftext: exit 0) [17547] info: FuzzyOcr: Image is single non-interlaced... [17673] dbg: FuzzyOcr: Exec : /usr/bin/giffix /tmp/.spamassassin17547AFJ63Ztmp/spam.gif [17673] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin17547AFJ63Ztmp/spam.gif-fixed.gif [17673] dbg: FuzzyOcr: Stderr: >>/tmp/.spamassassin17547AFJ63Ztmp/spam.gif.err [17547] dbg: FuzzyOcr: Saved pid: 17673 [17547] dbg: FuzzyOcr: Elapsed [17673]: 0.019540 sec. (/usr/bin/giffix: exit 0) [17674] dbg: FuzzyOcr: Exec : /usr/bin/giftopnm /tmp/.spamassassin17547AFJ63Ztmp/spam.gif-fixed.gif [17674] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin17547AFJ63Ztmp/spam.gif.pnm [17674] dbg: FuzzyOcr: Stderr: >>/tmp/.spamassassin17547AFJ63Ztmp/spam.gif.err [17547] dbg: FuzzyOcr: Saved pid: 17674 [17547] dbg: FuzzyOcr: Elapsed [17674]: 0.173627 sec. (/usr/bin/giftopnm: exit 0) [17547] info: FuzzyOcr: Calculating image hash for: /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.pnm [17681] dbg: FuzzyOcr: Exec : /usr/bin/ppmhist -noheader /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.pnm [17681] dbg: FuzzyOcr: Stdout: >/tmp/.spamassassin17547AFJ63Ztmp/ppmhist.info [17681] dbg: FuzzyOcr: Stderr: >/dev/null [17547] dbg: FuzzyOcr: Saved pid: 17681 [17547] dbg: FuzzyOcr: Elapsed [17681]: 0.022073 sec. (/usr/bin/ppmhist: exit 0) [17547] dbg: FuzzyOcr: Got: <445299:342:434:7::252:254:252:253:90487::44:106:172:95:21369::84:150:204:136:16164::108:182:220:164:8414::20:74:140:65:7329::252:206:4:197:2789> [17547] dbg: FuzzyOcr: delete from FuzzyOcr.Hash where Hash.check < 1186058256 [17547] info: FuzzyOcr: Found[Safe]: Score='0.000' Info: '' [17547] info: FuzzyOcr: Matched [2] time(s). Prev match: 9 min. 52 sec. ago [17547] dbg: FuzzyOcr: update FuzzyOcr.Safe set Safe.match='2',Safe.check='1187354257' where Safe.key='252:254:252:253:90487::44:106:172:95:21369::84:150:204:136:16164::108:182:220:164:8414::20:74:140:65:7329::252:206:4:197:2789' [17547] info: FuzzyOcr: Image in KNOWN_GOOD. Skipping OCR checks... [17547] dbg: FuzzyOcr: Remove DIR: /tmp/.spamassassin17547AFJ63Ztmp [17547] dbg: FuzzyOcr: FuzzyOcr ending successfully... [17547] dbg: FuzzyOcr: Processed in 0.345128 sec. My best regards, Pawel
Re: picture spams
FuzzyOcr should do a good job on something like that. Loren http://dreams.741.com/spam.gif
picture spams
Hi, Will ImageInfo be able to detect and catch this picture spam soon? http://dreams.741.com/spam.gif Thanks