Re: picture spams

2007-08-20 Thread Paweł Tęcza
Loren Wilton [EMAIL PROTECTED] writes:

 Hi Loren,

 I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten
 by that spam image.

 The normal scan setups for FuzzyOCR don't rotate the images, so will
 in all probability miss a rotated image like this.  These were quite
 popular for a while and a couple of people developed scansets that
 contained rotation as one of the preprocessing steps.  I don't seem to
 have saved any of the messages relating to that thread.  As best I
 recall they found that rotating 8 degrees or so worked well.  Or maybe
 it was 18.

 You can probably find info on the FuzzyOcr mailing list:

Hi Loren,

I was quite sure that FuzzyOcr project is dead, because a few
months ago I was trying to contact his author, Decoder,
but no success. Probably he was very busy :) Fortunately, it seems
that FuzzyOcr project still is alive. It's a very good message
for me, because it's really a very useful utility :)

I've found a threat about rotated spam images at FuzzyOcr page [1].
Currently Decoder hasn't time to implement checking image rotation,
but he will try to do it in the future. Now we can only work-around it,
for example using the preprocessor/scanset settings.

Who of you do rotate images in your FuzzyOcr? Do you use fixed
degrees or detect the skew angle and rotate the image accordingly?
Could you share this?

Kind regards,

Pawel

[1] http://fuzzyocr.own-hero.net/ticket/408


Re: picture spams

2007-08-20 Thread Paweł Tęcza
[EMAIL PROTECTED] writes:

 On Fri, 17 Aug 2007, Pawe? T?cza wrote:

 I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten by
 that spam image.

 You can manually mark this picture as bad :

 # fuzzy-find --delete image
 # fuzzy-find --learn-spam image

Hi,

Thanks for the hint! I believe that it's an effective method,
but I have no time to learn my FuzzyOcr manually ;)

Have a nice day,

Pawel


Re: picture spams

2007-08-20 Thread Loren Wilton

I was quite sure that FuzzyOcr project is dead, because a few
months ago I was trying to contact his author, Decoder,
but no success. Probably he was very busy :) Fortunately, it seems


He is very busy getting an advanced degree.  He still manages to put out the 
occasional patch, and several others have done quite a lot of work on it.




I've found a threat about rotated spam images at FuzzyOcr page [1].
Currently Decoder hasn't time to implement checking image rotation,
but he will try to do it in the future. Now we can only work-around it,
for example using the preprocessor/scanset settings.

Who of you do rotate images in your FuzzyOcr? Do you use fixed
degrees or detect the skew angle and rotate the image accordingly?
Could you share this?


I don't personally check for rotated images, since all my image spams get 
quite enough points from other things, so I don't need to make the extra 
effort.


There was someone else a couple months ago that has a fairly long thread in 
the mailing list about his experiments with rotated images.  Unfortunately I 
didn't save any of those in my local archive.  But as best I recall, he was 
suggesting multiple scan sets at (I think) about 8 and 18 degrees each way. 
I remember there was some talk about rotations over a certain angle being 
difficult to detect, but I don't recall the exact details now.  I think it 
was problems with font distortion doing the rotation, and someone else had 
some suggestions to get around that problem.


All this is a little hazy in my memory.  This was not too long after images 
spams started and they started trying to avoid OCR detection. They started 
doing rotated images, but it didn't seem to last too long.  I'm guessing 
that they probably didn't get good hit rates from the spammer's customers.


   Loren




picture spams

2007-08-17 Thread Spamassassin List

Hi,

Will ImageInfo be able to detect and catch this picture spam soon?

http://dreams.741.com/spam.gif

Thanks


Re: picture spams

2007-08-17 Thread Loren Wilton

FuzzyOcr should do a good job on something like that.

   Loren


http://dreams.741.com/spam.gif






Re: picture spams

2007-08-17 Thread Paweł Tęcza
Loren Wilton [EMAIL PROTECTED] writes:

 FuzzyOcr should do a good job on something like that.

Loren

 http://dreams.741.com/spam.gif

Hi Loren,

I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten
by that spam image.

Here are the message headers:

X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on
anubis2.poczta.uw.edu.pl
X-Spam-Level: x
X-Spam-Status: No, score=1.3 required=5.0 tests=SB_GIF_AND_NO_URIS
autolearn=disabled version=3.2.1

And here is a piece of output of `spamassassin -D`:

[17547] dbg: FuzzyOcr: Starting FuzzyOcr...
[17547] info: FuzzyOcr: Processing Message with ID [EMAIL PROTECTED] (Pawel 
Tecza [EMAIL PROTECTED] - [EMAIL PROTECTED])
[17547] dbg: FuzzyOcr: fname: spam.gif = spam.gif
[17547] dbg: message: decoding base64
[17547] info: FuzzyOcr: GIF: [342x434] spam.gif (9377)
[17547] dbg: FuzzyOcr: Saved: /tmp/.spamassassin17547AFJ63Ztmp/spam.gif
[17547] dbg: FuzzyOcr: Saved: /tmp/.spamassassin17547AFJ63Ztmp/raw.eml
[17547] info: FuzzyOcr: Found: 1 images
[17547] dbg: FuzzyOcr: Connecting to: dbi:mysql:database=FuzzyOcr;host=mysqlhost
[17547] dbg: dbiplugin: Creating uncached database handle to 
'database=FuzzyOcr;host=mysqlhost_fuzzyocr_fuzzyocr_AutoCommit=1_PrintError=1_Username=fuzzyocr'
[17547] dbg: config: using /var/lib/courier/.spamassassin for user state dir
[17547] dbg: FuzzyOcr: pfile = /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.pnm
[17547] dbg: FuzzyOcr: efile = /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.err
[17547] dbg: FuzzyOcr: Errors to: /tmp/.spamassassin17547AFJ63Ztmp/raw.err
[17547] dbg: FuzzyOcr: File has Content-Type image/gif and File Extension 
gif
[17547] info: FuzzyOcr: Found GIF header name=spam.gif
[17547] dbg: FuzzyOcr: Saved pid: 17671
[17671] dbg: FuzzyOcr: Exec : /usr/bin/giftext 
/tmp/.spamassassin17547AFJ63Ztmp/spam.gif
[17671] dbg: FuzzyOcr: Stdout: /tmp/.spamassassin17547AFJ63Ztmp/giftext.info
[17671] dbg: FuzzyOcr: Stderr: /tmp/.spamassassin17547AFJ63Ztmp/giftext.err
[17547] dbg: FuzzyOcr: Elapsed [17671]: 0.016500 sec. (/usr/bin/giftext: exit 0)
[17547] info: FuzzyOcr: Image is single non-interlaced...
[17673] dbg: FuzzyOcr: Exec : /usr/bin/giffix 
/tmp/.spamassassin17547AFJ63Ztmp/spam.gif
[17673] dbg: FuzzyOcr: Stdout: 
/tmp/.spamassassin17547AFJ63Ztmp/spam.gif-fixed.gif
[17673] dbg: FuzzyOcr: Stderr: /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.err
[17547] dbg: FuzzyOcr: Saved pid: 17673
[17547] dbg: FuzzyOcr: Elapsed [17673]: 0.019540 sec. (/usr/bin/giffix: exit 0)
[17674] dbg: FuzzyOcr: Exec : /usr/bin/giftopnm 
/tmp/.spamassassin17547AFJ63Ztmp/spam.gif-fixed.gif
[17674] dbg: FuzzyOcr: Stdout: /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.pnm
[17674] dbg: FuzzyOcr: Stderr: /tmp/.spamassassin17547AFJ63Ztmp/spam.gif.err
[17547] dbg: FuzzyOcr: Saved pid: 17674
[17547] dbg: FuzzyOcr: Elapsed [17674]: 0.173627 sec. (/usr/bin/giftopnm: exit 
0)
[17547] info: FuzzyOcr: Calculating image hash for: 
/tmp/.spamassassin17547AFJ63Ztmp/spam.gif.pnm
[17681] dbg: FuzzyOcr: Exec : /usr/bin/ppmhist -noheader 
/tmp/.spamassassin17547AFJ63Ztmp/spam.gif.pnm
[17681] dbg: FuzzyOcr: Stdout: /tmp/.spamassassin17547AFJ63Ztmp/ppmhist.info
[17681] dbg: FuzzyOcr: Stderr: /dev/null
[17547] dbg: FuzzyOcr: Saved pid: 17681
[17547] dbg: FuzzyOcr: Elapsed [17681]: 0.022073 sec. (/usr/bin/ppmhist: exit 0)
[17547] dbg: FuzzyOcr: Got: 
445299:342:434:7::252:254:252:253:90487::44:106:172:95:21369::84:150:204:136:16164::108:182:220:164:8414::20:74:140:65:7329::252:206:4:197:2789
[17547] dbg: FuzzyOcr: delete from FuzzyOcr.Hash where Hash.check  1186058256
[17547] info: FuzzyOcr: Found[Safe]: Score='0.000' Info: ''
[17547] info: FuzzyOcr: Matched [2] time(s). Prev match: 9 min. 52 sec. ago
[17547] dbg: FuzzyOcr: update FuzzyOcr.Safe set 
Safe.match='2',Safe.check='1187354257' where 
Safe.key='252:254:252:253:90487::44:106:172:95:21369::84:150:204:136:16164::108:182:220:164:8414::20:74:140:65:7329::252:206:4:197:2789'
[17547] info: FuzzyOcr: Image in KNOWN_GOOD. Skipping OCR checks...
[17547] dbg: FuzzyOcr: Remove DIR: /tmp/.spamassassin17547AFJ63Ztmp
[17547] dbg: FuzzyOcr: FuzzyOcr ending successfully...
[17547] dbg: FuzzyOcr: Processed in 0.345128 sec.

My best regards,

Pawel


Re: picture spams

2007-08-17 Thread Loren Wilton

Hi Loren,

I did the test and unfortunately my FuzzyOcr (3.5.1) was bitten
by that spam image.


The normal scan setups for FuzzyOCR don't rotate the images, so will in all 
probability miss a rotated image like this.  These were quite popular for a 
while and a couple of people developed scansets that contained rotation as 
one of the preprocessing steps.  I don't seem to have saved any of the 
messages relating to that thread.  As best I recall they found that rotating 
8 degrees or so worked well.  Or maybe it was 18.


You can probably find info on the FuzzyOcr mailing list:

___
Devel-spam mailing list
[EMAIL PROTECTED]
http://lists.own-hero.net/mailman/listinfo/devel-spam