Re: Gif Spam

2006-12-21 Thread Matt Kettler
san wrote:
> Hi,
>
> Is there any rule to stop mails which has .Gif attachment in SA 2.64. 
>   
Generally speaking, not much.

If anything, you can try the SARE stocks ruleset, but I'm not sure if
that ruleset supports such an old version of spamassassin.

http://www.rulesemporium.com/rules/70_sare_stocks.cf

Also, you can make sure that your SA is using RBL tests, and has a rule
in it for XBL. This particular RBL is at least reasonably helpful
against these spams.

But your most powerful options are the ImageInfo and FuzzyOCR plugins,
however both require SA 3.1.0 or higher.

Is there a particular reason you're sticking with 2.64? It's over 2
years and 4 months old now... Would you go 2 years without updating the
scanning engine of your virus scanner ??




Re: Gif Spam

2006-12-21 Thread Mathias Homann
Am Donnerstag, 21. Dezember 2006 19:28 schrieb san:
> Hi,
>
> Is there any rule to stop mails which has .Gif attachment in SA 2.64.

Yes, upgrade to 3.1.7.

bye,
MH
-- 
Die unaufgeforderte Zusendung einer Werbemail an Privatleute verstößt gegen §1 
UWG und 823 I BGB (Beschluß des LG Berlin vom 2.8.1998 Az: 16 O 201/98). Jede 
kommerzielle Nutzung der übermittelten persönlichen Daten sowie deren 
Weitergabe an Dritte ist ausdrücklich untersagt!

gpg key fingerprint: 5F64 4C92 9B77 DE37 D184  C5F9 B013 44E7 27BD 763C


Gif Spam

2006-12-21 Thread san

Hi,

Is there any rule to stop mails which has .Gif attachment in SA 2.64. 
-- 
View this message in context: 
http://www.nabble.com/Gif-Spam-tf2866950.html#a8012562
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



interlaced GIF spam (sample)

2006-10-09 Thread Chip M.
Finally got my first interlaced GIF spam!

Here's the raw message:
http://Puffin.net/software/spam/samples/0003_interlaced.eml
and a page containing each frame extracted into its own separate GIF, 
followed by the "whole" raw GIF:
http://Puffin.net/software/spam/samples/0003_interlaced.htm

Each frame was aligned at (0,0), filled the entire logical viewing area, and 
had no unusual GIF extensions.

As I mentioned a month ago, I'm using a similar technique to ImageInfo, with 
the addition of pixel "density" and number of frames tests, so this spam was
easily caught (by a little qmail filter running after SA).

In a week or two, I should have some Area and other stats from a ham GIF
corpus.  Just sent the crunching software off to the volunteers a couple of
days ago.
- "Chip"




Re: Infuriating gif spam...

2006-09-27 Thread Steve [Spamassasin]
Bill Landry wrote:
>> Version 2.3j works much better...  I'd previously been using version
>> 2.3b for which I had an ebuild for gentoo.
>>
>> One thing I have noticed, however, is a number of errors/warnings which
>> spamd sticks into /var/log/messages when it is started:
>>
>> -- 
>> Sep 26 17:20:48 server spamd[25563]: Subroutine new redefined at
>> /etc/mail/spamassassin/FuzzyOcr.pm line 122.
>> -- 
>>
>> Have I somehow loaded this module twice? I didn't get these messages
>> until I upgraded to version 2.3j from 2.3b
>
> No problem here, these are just informational messages that only
> recently showed up for me with the more recent versions of the
> FuzzyOcr plugin, as well.  However, with the two latest versions, it
> only gets written to the log once during start-up, not with each image
> file that gets scanned like I was seeing a few versions back.

Jorge Valdes replied to me (though I can't find his email on the list) - he 
said to look at v310.pre - I had an unnecessary line:

> loadplugin FuzzyOcr /etc/mail/spamassassin/FuzzyOcr.pm

After having commented that out 2.3j works just as well as it did before and I 
don't get the warnings any more.






Re: Infuriating gif spam...

2006-09-26 Thread Bill Landry
- Original Message - 
From: "Steve [Spamassasin]" <[EMAIL PROTECTED]>



Jorge Valdes wrote:

There are multiple images in these gifs, and because the first image
is 'junk', sending this image through gocr will yield no results. The
problem is that you have to scan all images to find the text.  Try
this with each image:

convert -append News.gif pnm:- | gocr -

That works a treat...


I have an updated version of the FuzzyOcr plugin that has this and
other improvements available here:

http://www.joval.info/proj/FuzzyOcr.html


Version 2.3j works much better...  I'd previously been using version
2.3b for which I had an ebuild for gentoo.

One thing I have noticed, however, is a number of errors/warnings which
spamd sticks into /var/log/messages when it is started:

--
Sep 26 17:20:48 server spamd[25563]: Subroutine new redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 122.
Sep 26 17:20:48 server spamd[25563]: Subroutine parse_config redefined
at /etc/mail/spamassassin/FuzzyOcr.pm line 132.
Sep 26 17:20:49 server spamd[25563]: Subroutine finish_parsing_end
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 184.
Sep 26 17:20:49 server spamd[25563]: Subroutine dummy_check redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 288.
Sep 26 17:20:49 server spamd[25563]: Subroutine load_global_words
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 292.
Sep 26 17:20:49 server spamd[25563]: Subroutine load_personal_words
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 315.
Sep 26 17:20:49 server spamd[25563]: Subroutine max redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 343.
Sep 26 17:20:49 server spamd[25563]: Subroutine within_threshold
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 351.
Sep 26 17:20:49 server spamd[25563]: Subroutine fmt_time redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 388.
Sep 26 17:20:49 server spamd[25563]: Subroutine check_image_hash_db
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 414.
Sep 26 17:20:49 server spamd[25563]: Subroutine add_image_hash_db
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 492.
Sep 26 17:20:49 server spamd[25563]: Subroutine calc_image_hash
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 539.
Sep 26 17:20:49 server spamd[25563]: Subroutine debuglog redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 580.
Sep 26 17:20:49 server spamd[25563]: Subroutine wrong_ctype redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 590.
Sep 26 17:20:49 server spamd[25563]: Subroutine corrupt_img redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 608.
Sep 26 17:20:49 server spamd[25563]: Subroutine known_img_hash redefined
at /etc/mail/spamassassin/FuzzyOcr.pm line 626.
Sep 26 17:20:49 server spamd[25563]: Subroutine removedir redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 637.
Sep 26 17:20:49 server spamd[25563]: Subroutine fuzzyocr_check redefined
at /etc/mail/spamassassin/FuzzyOcr.pm line 657.
--

Have I somehow loaded this module twice? I didn't get these messages
until I upgraded to version 2.3j from 2.3b


No problem here, these are just informational messages that only recently 
showed up for me with the more recent versions of the FuzzyOcr plugin, as 
well.  However, with the two latest versions, it only gets written to the 
log once during start-up, not with each image file that gets scanned like I 
was seeing a few versions back.


Bill 



Re: Infuriating gif spam...

2006-09-26 Thread Steve [Spamassasin]
Jorge Valdes wrote: 
> There are multiple images in these gifs, and because the first image
> is 'junk', sending this image through gocr will yield no results. The
> problem is that you have to scan all images to find the text.  Try
> this with each image:
>
> convert -append News.gif pnm:- | gocr -
That works a treat...
>
> I have an updated version of the FuzzyOcr plugin that has this and
> other improvements available here:
>
> http://www.joval.info/proj/FuzzyOcr.html
>
Version 2.3j works much better...  I'd previously been using version
2.3b for which I had an ebuild for gentoo.

One thing I have noticed, however, is a number of errors/warnings which
spamd sticks into /var/log/messages when it is started:

--
Sep 26 17:20:48 server spamd[25563]: Subroutine new redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 122.
Sep 26 17:20:48 server spamd[25563]: Subroutine parse_config redefined
at /etc/mail/spamassassin/FuzzyOcr.pm line 132.
Sep 26 17:20:49 server spamd[25563]: Subroutine finish_parsing_end
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 184.
Sep 26 17:20:49 server spamd[25563]: Subroutine dummy_check redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 288.
Sep 26 17:20:49 server spamd[25563]: Subroutine load_global_words
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 292.
Sep 26 17:20:49 server spamd[25563]: Subroutine load_personal_words
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 315.
Sep 26 17:20:49 server spamd[25563]: Subroutine max redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 343.
Sep 26 17:20:49 server spamd[25563]: Subroutine within_threshold
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 351.
Sep 26 17:20:49 server spamd[25563]: Subroutine fmt_time redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 388.
Sep 26 17:20:49 server spamd[25563]: Subroutine check_image_hash_db
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 414.
Sep 26 17:20:49 server spamd[25563]: Subroutine add_image_hash_db
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 492.
Sep 26 17:20:49 server spamd[25563]: Subroutine calc_image_hash
redefined at /etc/mail/spamassassin/FuzzyOcr.pm line 539.
Sep 26 17:20:49 server spamd[25563]: Subroutine debuglog redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 580.
Sep 26 17:20:49 server spamd[25563]: Subroutine wrong_ctype redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 590.
Sep 26 17:20:49 server spamd[25563]: Subroutine corrupt_img redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 608.
Sep 26 17:20:49 server spamd[25563]: Subroutine known_img_hash redefined
at /etc/mail/spamassassin/FuzzyOcr.pm line 626.
Sep 26 17:20:49 server spamd[25563]: Subroutine removedir redefined at
/etc/mail/spamassassin/FuzzyOcr.pm line 637.
Sep 26 17:20:49 server spamd[25563]: Subroutine fuzzyocr_check redefined
at /etc/mail/spamassassin/FuzzyOcr.pm line 657.
--

Have I somehow loaded this module twice? I didn't get these messages
until I upgraded to version 2.3j from 2.3b







Re: Infuriating gif spam...

2006-09-26 Thread Jorge Valdes

Steve [Spamassasin] wrote:

I've been getting a _lot_ of spam recently which has been defeating my
spamassassin configuration - all of it has the same general form... A
message with auto-generated prose and an image.  I installed FuzzyOCR
and this helped, but one particular variant still slips through.

The problematic spams all embed a GIF image which confuses gocr (in
spite of being easily human-readable) - though I'm not sure why.  Three
images which defeat FuzzyOCR for me are:

http://temporary.shic.dynalias.net/Evil_Spam_Samples.zip

I would like to know if there is a straightforward way either (a) to
configure FuzzyOCR to decode the text, or (b), assuming that is hard, a
way to identify this kind of 'strange' GIF and apply a static score to
them (at least as a temporary measure?)

Thanks in advance for any pointers...
  
There are multiple images in these gifs, and because the first image is 
'junk', sending this image through gocr will yield no results. The 
problem is that you have to scan all images to find the text.  Try this 
with each image:


convert -append News.gif pnm:- | gocr -

I have an updated version of the FuzzyOcr plugin that has this and other 
improvements available here:


http://www.joval.info/proj/FuzzyOcr.html

--
Jorge Valdes
Intercom El Salvador
[EMAIL PROTECTED]




Infuriating gif spam...

2006-09-26 Thread Steve [Spamassasin]
I've been getting a _lot_ of spam recently which has been defeating my
spamassassin configuration - all of it has the same general form... A
message with auto-generated prose and an image.  I installed FuzzyOCR
and this helped, but one particular variant still slips through.

The problematic spams all embed a GIF image which confuses gocr (in
spite of being easily human-readable) - though I'm not sure why.  Three
images which defeat FuzzyOCR for me are:

http://temporary.shic.dynalias.net/Evil_Spam_Samples.zip

I would like to know if there is a straightforward way either (a) to
configure FuzzyOCR to decode the text, or (b), assuming that is hard, a
way to identify this kind of 'strange' GIF and apply a static score to
them (at least as a temporary measure?)

Thanks in advance for any pointers...




Re: animated GIF spam

2006-08-22 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kenneth Porter wrote:
> --On Tuesday, August 22, 2006 1:07 AM -0500 "Chip M."
> <[EMAIL PROTECTED]> wrote:
>
>>> For interlaced ... I have no idea.  Depends a lot on how the
>>> interlaced images are stored, I guess.
>>
>> Yes, exactly.  Until there's samples, I'm not going to worry
>> about it.
>
> There's also progressive JPEG.
>
> 
> 
> 
>
>
These do not pose a problem currently, FuzzyOcr can handle them as far
as I am aware.


Chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE64DZJQIKXnJyDxURApmqAJ45da6se7aCswGQQtwOo6slEXESTACfeMIq
wYoVzlsgoebqByqdT3+ZrP4=
=BClH
-END PGP SIGNATURE-



Re: animated GIF spam

2006-08-22 Thread Kenneth Porter
--On Tuesday, August 22, 2006 1:07 AM -0500 "Chip M." 
<[EMAIL PROTECTED]> wrote:



For interlaced ... I have no idea.  Depends a lot on how the interlaced
images are stored, I guess.


Yes, exactly.  Until there's samples, I'm not going to worry about it.


There's also progressive JPEG.








Re: animated GIF spam

2006-08-22 Thread Logan Shaw

On Mon, 21 Aug 2006, John Rudd wrote:

On Aug 21, 2006, at 10:13 PM, Chip M. wrote:



While skimming thru my daily rejected spam pile, did a double take when a
GIF spam seemed to "blink" at me.  Thought it was a sw glitch at first...
then realized the sneaky Borg had adapted again.

Took a look at the frames in PaintShopPro's AnimationShop, and the first
three are all but blank (wee bit of noise), followed by the payload.


Given the way the GIF format works, that is actually a
reasonable way to inject "salt" into a given image to throw
off checksumming.  (If only the programmer who is doing the
technical end of this would get a real job instead of working
for a spammer...)

For animated, is there a clean break between "frames" of animation, something 
that netpbm or whatever can easily identify and break out into individual 
images?


Yes, briefly, the GIF format is a sequence of chunks.  Before
any image data comes along, a chunk defines the overall size of
the GIF (sort of the size of the canvas), and then you can have
a series of other chunks.  One type of chunk says "draw this
image on the virtual canvas at these coordinates using this
palette" and another says "delay this long".  Putting these
two types of chunks together in the right sequence gives the
ability to do animations.  (It also, incidentally, gives you
the ability to do full 24-bit color.  Few people know GIF
is actually capable of this.  But even though it is capable,
it is a hack, and very wasteful of space, so maybe that's for
the better.)

It would be CPU intensive, but the right way to fight it might be to 
run the FuzzyOCR on each frame.  And/or have a setting for "maximum frames to 
process", and if the GIF goes over that number of frames, give it a huge spam 
score.


Yeah, that is a bit tricky.  I can think of a way to do a
denial-of-service attack against the "run it on each frame"
approach, but I won't share what that is.  In theory, if that
happens, one could write a plugin to examine the internal
structure of the GIF and detect that.

The one thing that would be important to guard against is
suddenly flagging all animated GIFs as spam.  Although I think
they're really tacky and annoying, that doesn't mean that they
are actually spam.

For interlaced ... I have no idea.  Depends a lot on how the interlaced 
images are stored, I guess.  And whether or not netpbm can generate the final 
image for processing, instead of having to work on the interlaced data.


I'm pretty sure it should be able to.  If I recall correctly,
interlaced GIFs just have the rows in a different order.
It should be no problem to get the full image.

  - Logan


Re: animated GIF spam

2006-08-22 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

John Rudd wrote:
>
> On Aug 21, 2006, at 10:13 PM, Chip M. wrote:
>
>> While skimming thru my daily rejected spam pile, did a double take
>> when a
>> GIF spam seemed to "blink" at me.  Thought it was a sw glitch at
>> first...
>> then realized the sneaky Borg had adapted again.
>>
>> Took a look at the frames in PaintShopPro's AnimationShop, and the
>> first
>> three are all but blank (wee bit of noise), followed by the payload.
>>
>> Below are links to the raw message, and the extracted GIF:
>> http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
>> http://Puffin.net/software/spam/samples/0001b_been.gif
>>
>> Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)
I'll implement that in the next release :) thx :D
>>
>> The good news is that ImageInfo should have no problem with this
>> particular
>> instance, as the initial width x height are "correct".
>>
>> Time to recalibrate those phaser frequencies!  :)
>> - "Chip"
>>
>
> I also heard that interlaced gif spam is appearing now.
This will be supported then, too. Not a big deal:)
>
> It'd be interesting to see how to counter them.
>
> For animated, is there a clean break between "frames" of animation,
> something that netpbm or whatever can easily identify and break out
> into individual images?  It would be CPU intensive, but the right
> way to fight it might be to run the FuzzyOCR on each frame.  And/or
> have a setting for "maximum frames to process", and if the GIF goes
> over that number of frames, give it a huge spam score.  Or "add this
> score per frame", so that the number of frames increases the spam
> score directly, and automatically bail out if they cross a certain
> threshold (score from number of animation frames alone >= 20, then
> just return 20 ... or something; which saves you on processing the
> frames themselves).
Sounds good :) But there might be a better way... but I'm not sure
atm, got to read up on it in the netpbm manual first:)
>
> For interlaced ... I have no idea.  Depends a lot on how the
> interlaced images are stored, I guess.  And whether or not netpbm
> can generate the final image for processing, instead of having to
> work on the interlaced data.
>
>
>

Chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE6rlvJQIKXnJyDxURAg8iAKCnQkgGNY/o+iJDf+WG0KSisyi32QCeJ8zR
DfefnLEv8Tkow0O6HhbieLs=
=lj4i
-END PGP SIGNATURE-



Re: animated GIF spam

2006-08-21 Thread Chip M.
At 10:26 PM 8/21/2006 -0700, John Rudd wrote:
>I also heard that interlaced gif spam is appearing now.

Yes, I saw that post, however there wasn't a publicly available sample.
Any such would be much appreciated.

>It'd be interesting to see how to counter them.

Should be easy.  One approach is "pixel density".  What I've been doing is
reading JUST enough of the header to calculate the area (just like Dallas'
excellent ImageInfo plugin), then dividing by the total raw file size of
just the image (i.e. what one gets after base64 decoding just the GIF part),
less the size of the obvious parts of the header.  Works well, and is
blindingly fast.

Ham generally have a much LOWER density, because it's typically clipart,
whereas spam is generally text, which compresses extremely well, resulting
in a much HIGHER density.  It's not fool proof, so I use a sliding scale,
and have had only one FP this month (from an idiot (redundant) recruiter to
one of my testers - the PNG misfiring was only half the points required to
reject, and the able idiot managed to do several other things rare in Ham).

The beauty is that the spammer can "easily" foil this by lowerering the
density by adding more complexity, which increases the file size, so more
bandwidth is consumed. :)

Some stock spams do use a fancier font which scores lower, so I'm still 
considering other types of analysis as a backup.


Specifically to address animated GIFs, it would be very easy to "walk" the 
raw image, calculating each frame's pixel density, simply ignoring the 
obvious chaff frames.

Tomorrow, I'll write some code to decompose the frames and see what sort of 
numbers I get.

>For interlaced ... I have no idea.  Depends a lot on how the interlaced 
>images are stored, I guess.

Yes, exactly.  Until there's samples, I'm not going to worry about it.

What we also need is a diverse Ham GIF corpus.  Does anyone know of one?
- "Chip"

P.S.  Dallas:  it never occurred to me to _JUST_ score the area.  My pixel 
density approach fails on multi-GIFs, so you saved my bacon there. ;)




Re: animated GIF spam

2006-08-21 Thread Spamassassin List

While skimming thru my daily rejected spam pile, did a double take when a
GIF spam seemed to "blink" at me.  Thought it was a sw glitch at first...
then realized the sneaky Borg had adapted again.

Took a look at the frames in PaintShopPro's AnimationShop, and the first
three are all but blank (wee bit of noise), followed by the payload.

Below are links to the raw message, and the extracted GIF:
http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
http://Puffin.net/software/spam/samples/0001b_been.gif

Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)

The good news is that ImageInfo should have no problem with this 
particular

instance, as the initial width x height are "correct".


Yes ImageInfo got them well.



Re: animated GIF spam

2006-08-21 Thread John Rudd


On Aug 21, 2006, at 10:13 PM, Chip M. wrote:

While skimming thru my daily rejected spam pile, did a double take 
when a
GIF spam seemed to "blink" at me.  Thought it was a sw glitch at 
first...

then realized the sneaky Borg had adapted again.

Took a look at the frames in PaintShopPro's AnimationShop, and the 
first

three are all but blank (wee bit of noise), followed by the payload.

Below are links to the raw message, and the extracted GIF:
http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
http://Puffin.net/software/spam/samples/0001b_been.gif

Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)

The good news is that ImageInfo should have no problem with this 
particular

instance, as the initial width x height are "correct".

Time to recalibrate those phaser frequencies!  :)
- "Chip"



I also heard that interlaced gif spam is appearing now.

It'd be interesting to see how to counter them.

For animated, is there a clean break between "frames" of animation, 
something that netpbm or whatever can easily identify and break out 
into individual images?  It would be CPU intensive, but the right way 
to fight it might be to run the FuzzyOCR on each frame.  And/or have a 
setting for "maximum frames to process", and if the GIF goes over that 
number of frames, give it a huge spam score.  Or "add this score per 
frame", so that the number of frames increases the spam score directly, 
and automatically bail out if they cross a certain threshold (score 
from number of animation frames alone >= 20, then just return 20 ... or 
something; which saves you on processing the frames themselves).


For interlaced ... I have no idea.  Depends a lot on how the interlaced 
images are stored, I guess.  And whether or not netpbm can generate the 
final image for processing, instead of having to work on the interlaced 
data.






animated GIF spam

2006-08-21 Thread Chip M.
While skimming thru my daily rejected spam pile, did a double take when a
GIF spam seemed to "blink" at me.  Thought it was a sw glitch at first...
then realized the sneaky Borg had adapted again.

Took a look at the frames in PaintShopPro's AnimationShop, and the first 
three are all but blank (wee bit of noise), followed by the payload.

Below are links to the raw message, and the extracted GIF:
http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
http://Puffin.net/software/spam/samples/0001b_been.gif

Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)

The good news is that ImageInfo should have no problem with this particular 
instance, as the initial width x height are "correct".

Time to recalibrate those phaser frequencies!  :)
- "Chip"




RE: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Jeff Moss
Patching GIF.pm seems to have fixed the problem.  I patched gocr because
that was in the instructions that got posted, but patching GIF.pm wasn't
so I missed it.

  Jeff Moss

-Original Message-
From: Davin Flatten [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 03, 2006 3:54 PM
To: Jeff Moss
Cc: users@spamassassin.apache.org
Subject: Re: GIF Spam -- Setting up the 'OCR scanner and image validator
SA-plugin'

Jeff-

Make sure you apply the patches to both the gocr source and 
Image::ExifTool.   The gocr patch deals specifically with the segfault 
issues.

 From the docs:

# - Perl module Image::ExifTool and a patch for GIF pics:
#   http://antispam.imp.ch/patches/patch-GIF-Colortable
#
# - Gocr from http://jocr.sourceforge.net and a patch to
#   avoid segfaults with gocr:
#   http://antispam.imp.ch/patches/patch-gocr-segfault


Hope this helps.
-Davin


Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Davin Flatten

Jeff-

You might also want to see if you copy the message out of a client 
application like Thunderbird and then copy the image to your server and 
running giftopnm on it.  It might be that uudeview is the problem and 
not giftopnm.  The errors sounds like a corrupt gif image.  This should 
not effect the plugin however.


I would suggest turning on debugging output on Spamassassin to see where 
in the plugin the problem is occurring. Use the facility 'ocrtext' to 
and grep your logs for 'ocrtext'.  Should give you some info.


If you running spamd try:  --debug=ocrtext

-D, --debug[=areas]Print debugging messages (for areas)

Hope this helps.
-Davin


Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Davin Flatten

Jeff-

Make sure you apply the patches to both the gocr source and 
Image::ExifTool.   The gocr patch deals specifically with the segfault 
issues.


From the docs:

# - Perl module Image::ExifTool and a patch for GIF pics:
#   http://antispam.imp.ch/patches/patch-GIF-Colortable
#
# - Gocr from http://jocr.sourceforge.net and a patch to
#   avoid segfaults with gocr:
#   http://antispam.imp.ch/patches/patch-gocr-segfault


Hope this helps.
-Davin


RE: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Jeff Moss
Still trying to debug SA crashing with the OCR plugin.  I extracted the
base64 encoding from one of the offending messages.  Then I converted it
to image001.gif with uudeview.  But when I try to convert it to a pnm
file from the command line I get errors.

[filter]# giftopnm image001.gif > image001.pnm
giftopnm: too much input data, ignoring extra...
giftopnm: bogus character 0x00, ignoring
[filter]#

I have no idea what's causing this, how to fix it, or if it's even
related to the crashing problem.

  Jeff Moss


-Original Message-
From: Stuart Johnston [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 03, 2006 10:41 AM
To: users@spamassassin.apache.org
Subject: Re: GIF Spam -- Setting up the 'OCR scanner and image validator
SA-plugin'

Davin Flatten wrote:
> Just thought this might help someone out.  Thanks to M. Blapp for an 
> excellent SA Plugin.  Optical Character Recognition (OCR) can be used
to 
> nab those pesky spam messages that are hidden in gif,jpeg, or png
images...

This OCR stuff looks promising.  Any comments on performance?  How much
extra load does it put on a 
server?



RE: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread davea
I will be testing this later this evening using the instructions provided.
I will keep you posted.

Dave Augustus

> We're getting some image-spam stuck in the queue because they crash SA
> with this plugin turned on. We are using a custom setup built from
> amavisd-lite.
> I'm still trying to figure out what's causing it.
>
>   Jeff Moss
>
> -Original Message-
> From: Stuart Johnston [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 03, 2006 10:41 AM
> To: users@spamassassin.apache.org
> Subject: Re: GIF Spam -- Setting up the 'OCR scanner and image validator
> SA-plugin'
>
> Davin Flatten wrote:
>> Just thought this might help someone out.  Thanks to M. Blapp for an
>> excellent SA Plugin.  Optical Character Recognition (OCR) can be used
> to
>> nab those pesky spam messages that are hidden in gif,jpeg, or png
> images...
>
> This OCR stuff looks promising.  Any comments on performance?  How much
> extra load does it put on a
> server?
>
>



RE: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Jeff Moss
We're getting some image-spam stuck in the queue because they crash SA
with this plugin turned on. We are using a custom setup built from
amavisd-lite.
I'm still trying to figure out what's causing it.

  Jeff Moss 

-Original Message-
From: Stuart Johnston [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 03, 2006 10:41 AM
To: users@spamassassin.apache.org
Subject: Re: GIF Spam -- Setting up the 'OCR scanner and image validator
SA-plugin'

Davin Flatten wrote:
> Just thought this might help someone out.  Thanks to M. Blapp for an 
> excellent SA Plugin.  Optical Character Recognition (OCR) can be used
to 
> nab those pesky spam messages that are hidden in gif,jpeg, or png
images...

This OCR stuff looks promising.  Any comments on performance?  How much
extra load does it put on a 
server?



Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Davin Flatten

Stuart-

Not significant that I have noticed.  We are running a dedicated 
spamassassin gateway

however.  It's only job is to process spam.  It is running dual Xeon
2.80GHz/2MB cache with 4GB of RAM over RAID5 with some scratch partitions
loaded in RAM.  We also run clamav, mimedefang, bayes out of mysql, and
milter-greylist on the same machine.

We process 15,000-30,000 emails a day on this machine.

One thing that could be improved would be to add which directory the 
plugin uses as scratch.  I would put this over into my memory based 
mounts and that would at least lower the I/O overhead.


-Davin



Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Stephan Bosch

Davin Flatten schreef:
Just thought this might help someone out.  Thanks to M. Blapp for an 
excellent SA Plugin.  Optical Character Recognition (OCR) can be used to 
nab those pesky spam messages that are hidden in gif,jpeg, or png images...


I ran a search on the patch and I didn't see any references to the bayes 
learner. Wouldn't it be a logical choice to feed (and test) the OCR text 
to the bayes learner just like any other plaintext mail content? The OCR 
results will of course contain some gibberish, but that shouldn't be 
very different from the usual bayes poison. I think this could further 
improve the OCR feature (haven't tested the patch yet btw).


Regards,

Stephan



Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Matthias Keller
Theo Van Dinter wrote:
> On Thu, Aug 03, 2006 at 02:14:38PM +0200, Matthias Keller wrote:
>   
>> I downloaded the archive for 3.1.0 and there's no Timeout.pm at all - so
>> i guess this has been introduced in 3.1.1 or so..?
>> 
>
> Correct, it was added into 3.1.1 (bug 4696).
>
>   
>> Does anyone know if it's safe to let it away?
>> 
>
> I haven't looked at the plugin -- if the Timeout code is not actively being
> used by the plugin, then you should be able to just comment out the line.
>   
Hmm it seems to be used, at least I find one occurence of
Mail::SpamAssassin::Timeout in the .pm file

#
# Limit the scantime
#
$permsgstatus->enter_helper_run_mode();
my $timer = Mail::SpamAssassin::Timeout->new({ secs =>
$self->{main}->{conf}->{ocrtext_timeout} });
my $err = $timer->run_and_catch(sub {
..

So I guess this plugins really only runs from 3.1.1 onwards??
> The flip side is, why are you still running 3.1.0? ;)
>   
I know, but this is a productive system and I'll have to test an upgrade
first on the test server as I cant take any risks on that server...
But an upgrade is on top of my to do list

Matt



Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Stuart Johnston

Davin Flatten wrote:
Just thought this might help someone out.  Thanks to M. Blapp for an 
excellent SA Plugin.  Optical Character Recognition (OCR) can be used to 
nab those pesky spam messages that are hidden in gif,jpeg, or png images...


This OCR stuff looks promising.  Any comments on performance?  How much extra load does it put on a 
server?




Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Theo Van Dinter
On Thu, Aug 03, 2006 at 07:19:51AM -0400, Davin Flatten wrote:
> You could try commenting out the line that loads this module.  On you 
> machine it might be already loaded, but on my installation it does not 
> get loaded by default.

If you have 3.1.1 installed it should be there already.

> Notice on the bottom of the page that they have an archive.  I would try 
> to match the version numbers so you don't introduce any wierd bugs.

Alternately, think about upgrading.  3.1.4 fixes a lot of bugs from,
say, 3.1.0.  :)

-- 
Randomly Generated Tagline:
Leela: Bender, why are you spending so much time in the bathroom? Are 
  you jacking on in there?


pgpJGplgLPcCL.pgp
Description: PGP signature


Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Theo Van Dinter
On Thu, Aug 03, 2006 at 02:14:38PM +0200, Matthias Keller wrote:
> I downloaded the archive for 3.1.0 and there's no Timeout.pm at all - so
> i guess this has been introduced in 3.1.1 or so..?

Correct, it was added into 3.1.1 (bug 4696).

> Does anyone know if it's safe to let it away?

I haven't looked at the plugin -- if the Timeout code is not actively being
used by the plugin, then you should be able to just comment out the line.


The flip side is, why are you still running 3.1.0? ;)

-- 
Randomly Generated Tagline:
"It's always darkest before dawn. So if you're going to steal your
 neighbour's newspaper, that's the time to do it." - Zen Musings


pgp3tmcYXiGtm.pgp
Description: PGP signature


Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Loren Wilton

I downloaded the archive for 3.1.0 and there's no Timeout.pm at all - so
i guess this has been introduced in 3.1.1 or so..?

Does anyone know if it's safe to let it away?


JM would be the one with the definitive answer.  But my recollection is that 
it is a new/clean implementation of a manager for timeout signals, and can 
probably be used alongside the stuff in older versions of SA just fine.  I 
assume that something in the OCR plugin must be using it?  If not, I can't 
see much reason to load the Timeout module if it isn't already there.


   Loren



Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Matthias Keller
Davin Flatten wrote:
> Matthias-
>
> Yes I had the same issue on my setup which I forgot to mention.  I had
> to copy the Timeout.pm module from the SpamAssassin source tree into
> the installation path.  On my machine it was
Hmm
I downloaded the archive for 3.1.0 and there's no Timeout.pm at all - so
i guess this has been introduced in 3.1.1 or so..?

Does anyone know if it's safe to let it away?

Thanks

Matt


Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Davin Flatten

Matthias-

Yes I had the same issue on my setup which I forgot to mention.  I had 
to copy the Timeout.pm module from the SpamAssassin source tree into the 
installation path.  On my machine it was


cp 
/usr/local/src/Mail-SpamAssassin-3.1.1/lib/Mail/SpamAssassin/Timeout.pm 
/usr/local/share/perl/5.8.8/Mail/SpamAssassin/Timeout.pm


You could try commenting out the line that loads this module.  On you 
machine it might be already loaded, but on my installation it does not 
get loaded by default.


Maybe someone else knows why this is not installed by default.  If you 
don't have the source you can download it from: 
http://spamassassin.apache.org/downloads.cgi?update=200607261000


Notice on the bottom of the page that they have an archive.  I would try 
to match the version numbers so you don't introduce any wierd bugs.


Hope this helps.

Sincerely,
Davin Flatten



Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-03 Thread Matthias Keller
Davin Flatten wrote:
> Just thought this might help someone out.  Thanks to M. Blapp for an
> excellent SA Plugin.  Optical Character Recognition (OCR) can be used
> to nab those pesky spam messages that are hidden in gif,jpeg, or png
> images...
>
> Here is what I did to get the plugin running.
> (...)
> # OCR - performs Optical Character Recognition on spam images
> #
> loadplugin ocrtext /etc/mail/spamassassin/ocrtext.pm
> loadplugin Mail::SpamAssassin::Timeout

Hi

First of all, thanks for the detailed instructions.

I'm running SA 3.1.0 with perl 5.8.7

After following your instructions I get this error tough:

[31630] warn: plugin: failed to parse plugin (from @INC): Can't locate
Mail/SpamAssassin/Timeout.pm in @INC (@INC contains: lib
/usr/lib/perl5/site_perl/5.8.7/i586-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.7
/usr/lib/perl5/5.8.7/i586-linux-thread-multi /usr/lib/perl5/5.8.7
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.7/i586-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl) at (eval
48) line 1.
[31630] warn: plugin: failed to create instance of plugin
Mail::SpamAssassin::Timeout: Can't locate object method "new" via
package "Mail::SpamAssassin::Timeout" at (eval 49) line 1.
[31630] warn: plugin: eval failed: Can't locate object method "new" via
package "Mail::SpamAssassin::Timeout" at
/etc/mail/spamassassin/ocrtext.pm line 396.

It seems the Timeout thingie doesn't exist here -- can i just leave out
the line in the v310.pre  or is it needed??

Thanks

Matt


Re: GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-02 Thread Tim Litwiller

results - are very good on my preliminary tests.
these two spams look exactly the same in my email program except the 
subject line


here is a spam before

---snip---

From - Wed Aug 02 22:29:15 2006

X-Mozilla-Status: 0001
X-Mozilla-Status2: 
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on 
	---.---.com

X-Spam-Level: ***
X-Spam-Status: No, score=3.0 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_12_24,
FROM_LOCAL_NOVOWEL,HTML_MESSAGE autolearn=no version=3.1.1
Received: (qmail 9735 invoked from network); 2 Aug 2006 21:32:47 -0500
Received: from ip33-5.asiaonline.net (202.85.33.5)
 by ---.---.com with SMTP; 2 Aug 2006 21:32:47 -0500
From:   "Is giant" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: All Apparel
Date:   Thu, 3 Aug 2006 10:39:03 -0800
MIME-Version: 1.0
Content-Type: multipart/related;
boundary="=_NextPart_000_0004_01C6B6E9.09F4EF40"
X-Mailer: Microsoft Office Outlook, Build 11.0.5510
Thread-Index: Aca26Qn08PnHMV7tSnWAeaDtkgcv8g==
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869

Message-Id: <[EMAIL PROTECTED]>
---snip---



here is the spam after

---snip ---

From - Wed Aug 02 23:36:26 2006

X-Mozilla-Status: 0001
X-Mozilla-Status2: 
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on 
	---.---.com

X-Spam-Level: 
X-Spam-Status: Yes, score=12.6 required=5.0 tests=BAYES_00,
DATE_IN_FUTURE_12_24,HTML_MESSAGE,INLINE_IMAGE,RCVD_IN_BL_SPAMCOP_NET,
SPAMPIC_ALPHA_3,SPAMPIC_WORDS_3,SUSPECT_GIF autolearn=no version=3.1.1
X-Spam-Report: 
	*  0.9 SUSPECT_GIF Suspect gif image found

*  1.5 SPAMPIC_ALPHA_3 Image contains many alphanumeric chars
*  2.8 DATE_IN_FUTURE_12_24 Date: is 12 to 24 hours after Received: date
* -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
*  [score: 0.]
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  1.5 INLINE_IMAGE RAW: Inline Images
*  1.6 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in 
bl.spamcop.net
*  [Blocked - see ]
*  7.0 SPAMPIC_WORDS_3 Contains inline spam picture (3)
Received: (qmail 10707 invoked from network); 2 Aug 2006 23:20:49 -0500
Received: from unknown (HELO ?221.2.37.198?) (221.2.37.198)
 by ---.---.com with SMTP; 2 Aug 2006 23:20:49 -0500
From:   "Value" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: [SPAM-Score-12.6] Kodak CamerasIn
Date:   Thu, 3 Aug 2006 12:19:46 -0800
MIME-Version: 1.0
Content-Type: multipart/related;
boundary="=_NextPart_000_0004_01C6B6F7.1B91DFC0"
X-Mailer: Microsoft Office Outlook, Build 11.0.5510
Thread-Index: Aca29xuT14ns0TbSSbinW6TfiY1R5w==
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869
Message-Id: <[EMAIL PROTECTED]>

X-Spam-Prev-Subject: Kodak CamerasIn

---snip---


GIF Spam -- Setting up the 'OCR scanner and image validator SA-plugin'

2006-08-02 Thread Davin Flatten
Just thought this might help someone out.  Thanks to M. Blapp for an 
excellent SA Plugin.  Optical Character Recognition (OCR) can be used to 
nab those pesky spam messages that are hidden in gif,jpeg, or png images...


Here is what I did to get the plugin running.

Test the components that the plugin uses first. 
( Check out the documentation at 
http://antispam.imp.ch/patches/patch-ocrtext for requirements. )


1. Copy a spam image for an example to your sa machine.
2. Use giftopnm or jpegtopnm or pngtopnm to convert whatever type of 
image you have to a pnm image like so:

 giftopnm Xj105jQX.gif > Xj105jQX.pnm
3. Run gocr on the pnm file like so:
 gocr Xj105jQX.pnm

This should output some text with lots of garbage.  If you got this far 
you should be ready to get the plugin going.


1. cd to /etc/mail/spamassassin
2. download the patch file from: 
http://antispam.imp.ch/patches/patch-ocrtext

3. type 'patch < patch-ocrtext'
  This will create two files in  your current directory called  
ocrtext.cf and ocrtext.pm

4. Edit v310.pre and add the following lines:

# OCR - performs Optical Character Recognition on spam images
#
loadplugin ocrtext /etc/mail/spamassassin/ocrtext.pm
loadplugin Mail::SpamAssassin::Timeout

5. Edit the ocrtext.cr file and change the following settings:

## This points to your gocr binary not just the path.  Try 'which gocr'.
gocr_path   /usr/local/bin/gocr
## This is JUST the path to your pnm binarys ( i.e. pngtopnm, giftopnm, 
jpegtopnm )

pnmtools_path   /usr/bin

6. Run spamassassin -D --lint  and check for errors.

If all went well restart spamassassin or force it to reread it's config 
however you would on your system.


Then try typing something like 'tail -f /var/log/mail.log | grep 
SPAMPIC_ALPHA', on a high volume server you should see some rules 
matching after a few minutes.  If so then you are OCR'ing the images!


Hope this helps!
Sincerely,
Davin Flatten

--
Davin Flatten
Unix Systems Administrator
University of Massachusetts
Amherst, MA 01003

Phone: 413-545-1580
Email: [EMAIL PROTECTED]