RE: How do you fight image-spam?

2020-01-17 Thread Emanuel Gonzalez
I haven't touched the plugin at all, download the rpm file from 
http://repo.iotti.biz/CentOS/7/noarch/spamassassin-FuzzyOcr-3.6.0-12.el7.lux.1.noarch.rpm
 and follow installation steps

Any ideas?

Regards,

De: Matus UHLAR - fantomas 
Enviado: viernes, 17 de enero de 2020 10:55
Para: users@spamassassin.apache.org 
Asunto: Re: How do you fight image-spam?

On 17.01.20 13:46, Emanuel Gonzalez wrote:
>I'm trying to fight an image, which refers to an attempt at Microsoft 
>phishing, i install FuzzyOCR, i know this plugin is very old.
>
>the installation is fine, but I don't see the plug-in loading correctly, 
>because I train a spam message and in the logs I don't see any information 
>related to the import to the plugin database:
>
>spamassassin --debug FuzzyOcr < Vista\ Previa\ -\ Confirme\ inicio\ de\ 
>sesion.eml > /dev/null
>ene 17 09:02:52.231 [31789] dbg: FuzzyOcr: focr_bin_helper: 
>'pnmnorm,pnminvert,pamthreshold,ppmtopgm,pamtopnm'
[...]
>ene 17 09:02:52.568 [31789] info: FuzzyOcr: Loaded preprocessor normalize: 
>/usr/bin/pnmnorm
>ene 17 09:02:52.568 [31789] info: FuzzyOcr: Loaded preprocessor invert: 
>/usr/bin/pnminvert
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Loaded preprocessor ppmtopgm: 
>/usr/bin/ppmtopgm
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Loaded preprocessor pamtopnm: 
>/usr/bin/pamtopnm
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Loaded preprocessor pamthreshold: 
>/usr/bin/pamthreshold -simple -threshold 0.5
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Loaded preprocessor maketiff: 
>pnmtotiff -color -truecolor
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan ocrad: /usr/bin/ocrad 
>-s5 $input
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan ocrad-invert: 
>/usr/bin/ocrad -s5 -i $input
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan 
>ocrad-decolorize-invert: /usr/bin/ocrad -s5 -i $input
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan ocrad-decolorize: 
>/usr/bin/ocrad -s5 $input
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan gocr: /usr/bin/gocr -i 
>$input
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan gocr-180: /usr/bin/gocr 
>-l 180 -d 2 -i $input
>ene 17 09:02:52.569 [31789] info: FuzzyOcr: Added <45> words from 
>"/etc/mail/spamassassin/FuzzyOcr.words"

I would expect some more lines here, did you break it?
note that fuzzyocr plugin can run for a long time.

>This plugin is work fine in Centos 7?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.


Re: How do you fight image-spam?

2020-01-17 Thread Matus UHLAR - fantomas

On 17.01.20 13:46, Emanuel Gonzalez wrote:

I'm trying to fight an image, which refers to an attempt at Microsoft phishing, 
i install FuzzyOCR, i know this plugin is very old.

the installation is fine, but I don't see the plug-in loading correctly, 
because I train a spam message and in the logs I don't see any information 
related to the import to the plugin database:

spamassassin --debug FuzzyOcr < Vista\ Previa\ -\ Confirme\ inicio\ de\ sesion.eml 
> /dev/null
ene 17 09:02:52.231 [31789] dbg: FuzzyOcr: focr_bin_helper: 
'pnmnorm,pnminvert,pamthreshold,ppmtopgm,pamtopnm'

[...]

ene 17 09:02:52.568 [31789] info: FuzzyOcr: Loaded preprocessor normalize: 
/usr/bin/pnmnorm
ene 17 09:02:52.568 [31789] info: FuzzyOcr: Loaded preprocessor invert: 
/usr/bin/pnminvert
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Loaded preprocessor ppmtopgm: 
/usr/bin/ppmtopgm
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Loaded preprocessor pamtopnm: 
/usr/bin/pamtopnm
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Loaded preprocessor pamthreshold: 
/usr/bin/pamthreshold -simple -threshold 0.5
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Loaded preprocessor maketiff: 
pnmtotiff -color -truecolor
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan ocrad: /usr/bin/ocrad 
-s5 $input
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan ocrad-invert: 
/usr/bin/ocrad -s5 -i $input
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan ocrad-decolorize-invert: 
/usr/bin/ocrad -s5 -i $input
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan ocrad-decolorize: 
/usr/bin/ocrad -s5 $input
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan gocr: /usr/bin/gocr -i 
$input
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Using scan gocr-180: /usr/bin/gocr 
-l 180 -d 2 -i $input
ene 17 09:02:52.569 [31789] info: FuzzyOcr: Added <45> words from 
"/etc/mail/spamassassin/FuzzyOcr.words"


I would expect some more lines here, did you break it?
note that fuzzyocr plugin can run for a long time.


This plugin is work fine in Centos 7?


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.


How do you fight image-spam?

2020-01-17 Thread Emanuel Gonzalez
Hi, everyone.!!

I'm trying to fight an image, which refers to an attempt at Microsoft phishing, 
i install FuzzyOCR, i know this plugin is very old.

the installation is fine, but I don't see the plug-in loading correctly, 
because I train a spam message and in the logs I don't see any information 
related to the import to the plugin database:

spamassassin --debug FuzzyOcr < Vista\ Previa\ -\ Confirme\ inicio\ de\ 
sesion.eml > /dev/null
ene 17 09:02:52.231 [31789] dbg: FuzzyOcr: focr_bin_helper: 
'pnmnorm,pnminvert,pamthreshold,ppmtopgm,pamtopnm'
ene 17 09:02:52.231 [31789] info: FuzzyOcr: Adding <5> new helper apps
ene 17 09:02:52.231 [31789] dbg: FuzzyOcr: focr_bin_helper: 'tesseract'
ene 17 09:02:52.231 [31789] info: FuzzyOcr: Adding <1> new helper apps
ene 17 09:02:52.231 [31789] dbg: FuzzyOcr: focr_bin_helper: 
'pnmnorm,pnminvert,convert,ppmtopgm,tesseract'
ene 17 09:02:52.231 [31789] warn: FuzzyOcr: pnmnorm is already defined, 
skipping...
ene 17 09:02:52.231 [31789] warn: FuzzyOcr: pnminvert is already defined, 
skipping...
ene 17 09:02:52.231 [31789] warn: FuzzyOcr: ppmtopgm is already defined, 
skipping...
ene 17 09:02:52.231 [31789] warn: FuzzyOcr: tesseract is already defined, 
skipping...
ene 17 09:02:52.231 [31789] info: FuzzyOcr: Adding <1> new helper apps
ene 17 09:02:52.232 [31789] info: FuzzyOcr: Starting preprocessor parser for 
file "/etc/mail/spamassassin/FuzzyOcr.preps"...
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: preprocessor normalize {
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: command = pnmnorm
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: }
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: preprocessor invert {
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: command = pnminvert
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: }
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: preprocessor ppmtopgm {
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: command = ppmtopgm
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: }
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: preprocessor pamtopnm {
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: command = pamtopnm
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: }
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: preprocessor pamthreshold {
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: command = pamthreshold
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: args = -simple -threshold 0.5
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: }
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: preprocessor maketiff {
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: command = pnmtotiff
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: args = -color -truecolor
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line: }
ene 17 09:02:52.232 [31789] info: FuzzyOcr: Starting scanset parser for file 
"/etc/mail/spamassassin/FuzzyOcr.scansets"...
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line scanset ocrad {
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line command = $ocrad
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line args = -s5 $input
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line }
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line scanset ocrad-invert {
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line command = $ocrad
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line args = -s5 -i $input
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line }
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line scanset ocrad-decolorize-invert 
{
ene 17 09:02:52.232 [31789] dbg: FuzzyOcr: line preprocessors = ppmtopgm, 
pamthreshold, pamtopnm
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line command = $ocrad
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line args = -s5 -i $input
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line }
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line scanset ocrad-decolorize {
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line preprocessors = ppmtopgm, 
pamthreshold, pamtopnm
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line command = $ocrad
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line args = -s5 $input
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line }
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line scanset gocr {
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line command = $gocr
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line args = -i $input
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line }
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line scanset gocr-180 {
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line command = $gocr
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line args = -l 180 -d 2 -i $input
ene 17 09:02:52.233 [31789] dbg: FuzzyOcr: line }
ene 17 09:02:52.567 [31789] info: FuzzyOcr: Searching in: /usr/local/netpbm/bin
ene 17 09:02:52.567 [31789] info: FuzzyOcr: Searching in: /usr/local/bin
ene 17 09:02:52.567 [31789] info: FuzzyOcr: Searching in: /usr/bin
ene 17 09:02:52.567 [31789] info: FuzzyOcr: Using gifsicle => /usr/bin/gifsicle
ene 17 09:02:52.567 [31789] info: FuzzyOcr: Using giffix => /usr/bin/giffix
ene 17 09:02:52.567 [31789] info: FuzzyOcr: 

Re: Image spam - FuzzyOCR?

2016-09-02 Thread RW
On Fri, 02 Sep 2016 10:19:22 +0700
Olivier wrote:

> > Not really, he just said it matches against a word list. My point is
> > that out of the several SA OCR plugins that have been written,
> > FuzzyOCR is the one that's specifically designed for doing fuzzy
> > matching on a finite word list. If you just pass the OCR output to
> > Bayes or add it to the body, it's not "fuzzy OCR" anymore.  
> 
> To my understanding, the fuzzy part refeered to the way it does OCR
> (several passes, with different angles, colours, etc.), not
> to the word matching.


From:




The methods mainly are:

-  Optical Character Recognition using different engines and settings
-  Fuzzy word matching algorithm applied to OCR results
...


Re: Image spam - FuzzyOCR?

2016-09-01 Thread Olivier

> Not really, he just said it matches against a word list. My point is
> that out of the several SA OCR plugins that have been written, FuzzyOCR
> is the one that's specifically designed for doing fuzzy matching on a
> finite word list. If you just pass the OCR output to Bayes or add it to
> the body, it's not "fuzzy OCR" anymore.

To my understanding, the fuzzy part refeered to the way it does OCR
(several passes, with different angles, colours, etc.), not
to the word matching.

Olivier


Re: Image spam - FuzzyOCR?

2016-09-01 Thread RW
On Thu, 1 Sep 2016 15:16:37 +0200
Matus UHLAR - fantomas wrote:

> >> On Thu, Sep 1, 2016 at 12:27 AM, Olivier
> >> <olivier.nic...@cs.ait.ac.th> wrote:  
> >> > I am running it, it does not do a very good job at extracting the
> >> > text from the images. Then it uses it's own list of keywords to
> >> > detect spam: to me it's the biggest problem, it should push back
> >> > the text to SpamAssassin and let SA rules decide what to do with
> >> > it. 
> >>   I do agree that the OCR program should be doing the OCR'ing
> >> and the text filtering should be left to a program that does that
> >> for a living.  
> 
> On 01.09.16 13:59, RW wrote:
> >It's a long time since I've used it, but IIRC the point of FuzzyOCR
> >is that it does fuzzy matching on a dictionary of "bad" words -
> >similar to the way that spelling checkers find the mostly likely
> >suggestions. This gives it a very limited ability to deal with
> >imperfectly read words.  
> 
> it's the same as Olivier wrote above :-)

Not really, he just said it matches against a word list. My point is
that out of the several SA OCR plugins that have been written, FuzzyOCR
is the one that's specifically designed for doing fuzzy matching on a
finite word list. If you just pass the OCR output to Bayes or add it to
the body, it's not "fuzzy OCR" anymore.


> >Putting garbled OCR text through SA body rules may be more trouble
> >than it's worth.  
> 
> garbled, yes. I've had this discussion some years back and tesseract
> has currently much much better results than it had those years ago.


Unless it can cope with current CAPTCHAs the spammer has a reserve. 

The first OCR plugin came towards the end of a period where people were
being hammered by image spam. There's been nothing like that since,
probably because it doesn't work well as spam.  As I've said I find it
can be caught by other means. I must have put about 50k spams through
SA since I last had an FN that was an image spam. 


RE: Image spam - FuzzyOCR?

2016-09-01 Thread Richard Mealing
>-Original Message-
>From: Matus UHLAR - fantomas [mailto:uh...@fantomas.sk] 
>Sent: Thursday, September 1, 2016 14:30
>To: users@spamassassin.apache.org
>Subject: Re: Image spam - FuzzyOCR? 

>>On Wed, 31 Aug 2016 12:55:15 + Richard Mealing wrote:
>>> 2)  I'm getting some horny date spam coming through with just
>>> images and text inside an image at the bottom. My bayes seems to be 
>>> scoring this with -1.90 Bayes_00. I keep sending this to my database 
>>> as spam but I'm not sure how many I need to feed it and I don't get 
>>> much.

>On 01.09.16 14:25, RW wrote:
>>It not a good sign when spam resists being trained way from BAYES_00.
>>
>>IIWY I'd reset the database, and if possible turn-off autotraining and 
>>train manually.
>>
>>Also you might want to set:
>>
>>  bayes_token_sources  all
>>
>>This adds in mimepart hashes, which may help Bayes identify repeated 
>>images.

>I think what happens more often is that the training data are sent to wrong 
>user.
>when using amavis, training must be done as 'amavis' user, or other than 
>amavis runs as.

I'm scanning for quite a few different domains (100+) and I'm not that familiar 
with how bayes works - I can't really find much documentation. TBH it seems to 
be working fine and scoring quite well, but there are instances where it fails.
Also I am using it through sql - 

use_bayes 1
bayes_auto_learn 1
bayes_auto_expire 1
bayes_store_module  Mail::SpamAssassin::BayesStore::SQL
bayes_sql_dsn   DBI:mysql:sa_bayes:x.x.x.x:3306
bayes_sql_username  sa_user
bayes_sql_password   


I need to do more reading on how to make it better, but I have a few dormant 
domains delivering emails to a POP box and I rsync that to my filtering server 
and run sa-learn just using some bash script. I read this isn't recommended 
though, but I would have thought using a domain that no one should know about, 
like a honeypot, this should be ok? Maybe I should just rethink the whole 
thing. 
I remember someone telling me about that flesh plugin. I'm sure it was my boss! 
Was it not called pornsweeper? Looks like the DNS was removed for the website, 
but I looked at googles cached copy.. 

Thanks for all your advice, it is much appreciated. 

>--
>Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
>Warning: I wish NOT to receive e-mail advertising to this address.
>Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
>"Where do you want to go to die?" [Microsoft]


Re: Image spam - FuzzyOCR?

2016-09-01 Thread Matus UHLAR - fantomas

On Wed, 31 Aug 2016 12:55:15 + Richard Mealing wrote:

2)  I'm getting some horny date spam coming through with just
images and text inside an image at the bottom. My bayes seems to be
scoring this with -1.90 Bayes_00. I keep sending this to my database
as spam but I'm not sure how many I need to feed it and I don't get
much.


On 01.09.16 14:25, RW wrote:

It not a good sign when spam resists being trained way from BAYES_00.

IIWY I'd reset the database, and if possible turn-off autotraining and
train manually.

Also you might want to set:

 bayes_token_sources  all

This adds in mimepart hashes, which may help Bayes identify repeated
images.


I think what happens more often is that the training data are sent to wrong
user.
when using amavis, training must be done as 'amavis' user, or other than
amavis runs as.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"Where do you want to go to die?" [Microsoft]


Re: Image spam - FuzzyOCR?

2016-09-01 Thread RW
On Wed, 31 Aug 2016 12:55:15 +
Richard Mealing wrote:

> 2)  I'm getting some horny date spam coming through with just
> images and text inside an image at the bottom. My bayes seems to be
> scoring this with -1.90 Bayes_00. I keep sending this to my database
> as spam but I'm not sure how many I need to feed it and I don't get
> much. 

It not a good sign when spam resists being trained way from BAYES_00.

IIWY I'd reset the database, and if possible turn-off autotraining and
train manually.

Also you might want to set:

  bayes_token_sources  all

This adds in mimepart hashes, which may help Bayes identify repeated
images.


Re: Image spam - FuzzyOCR?

2016-09-01 Thread Matus UHLAR - fantomas

On Thu, Sep 1, 2016 at 12:27 AM, Olivier  wrote:
> I am running it, it does not do a very good job at extracting the
> text from the images. Then it uses it's own list of keywords to
> detect spam: to me it's the biggest problem, it should push back
> the text to SpamAssassin and let SA rules decide what to do with it.
>
  I do agree that the OCR program should be doing the OCR'ing and
the text filtering should be left to a program that does that for a
living.


On 01.09.16 13:59, RW wrote:

It's a long time since I've used it, but IIRC the point of FuzzyOCR is
that it does fuzzy matching on a dictionary of "bad" words - similar to
the way that spelling checkers find the mostly likely suggestions. This
gives it a very limited ability to deal with imperfectly read words.


it's the same as Olivier wrote above :-)


Putting garbled OCR text through SA body rules may be more trouble than
it's worth.


garbled, yes. I've had this discussion some years back and tesseract has
currently much much better results than it had those years ago.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Boost your system's speed by 500% - DEL C:\WINDOWS\*.*


Re: Image spam - FuzzyOCR?

2016-09-01 Thread RW
On Thu, 1 Sep 2016 06:23:37 -0400
Mauricio Tavares wrote:

> On Thu, Sep 1, 2016 at 12:27 AM, Olivier
>  wrote:

> > I am running it, it does not do a very good job at extracting the
> > text from the images. Then it uses it's own list of keywords to
> > detect spam: to me it's the biggest problem, it should push back
> > the text to SpamAssassin and let SA rules decide what to do with it.
> >  
>   I do agree that the OCR program should be doing the OCR'ing and
> the text filtering should be left to a program that does that for a
> living.

It's a long time since I've used it, but IIRC the point of FuzzyOCR is
that it does fuzzy matching on a dictionary of "bad" words - similar to
the way that spelling checkers find the mostly likely suggestions. This
gives it a very limited ability to deal with imperfectly read words.

Putting garbled OCR text through SA body rules may be more trouble than
it's worth.





Re: Image spam - FuzzyOCR?

2016-09-01 Thread li...@rhsoft.net



Am 01.09.2016 um 12:23 schrieb Mauricio Tavares:

I do agree that the OCR program should be doing the OCR'ing and
the text filtering should be left to a program that does that for a
living. In the modern, systemd world this is of course an ancient and
outdated design philosophy


this is simply *not* true und hence systemd ships a lot of different 
binaries doing different things and so *clearly* follows the unix philosophy


the only difference is that instead all this tools living in different 
upstream repos, maintained by independent teams and hopefully get 
adopted properly in case of changes which affect more than one needed 
changes are done in the same repo


some people just have the illusion that Lennart Pöttering is the one and 
only programmer of all that tools - no he is not - the different tools 
are maintained by different people and just get tightly integrated 
because they are all talking together and working in the same team 
instead different projects fighting against each other in case of 
problems and point to the other tool which is broken





Re: Image spam - FuzzyOCR?

2016-09-01 Thread Mauricio Tavares
On Thu, Sep 1, 2016 at 12:27 AM, Olivier <olivier.nic...@cs.ait.ac.th> wrote:
> Richard,
>
>> I am looking at Fuzzy ocr to detect more image spam and I had a couple
>> of questions;
>
> FuzzyOCR does not detect image spam per se, it detects spam text in an
> image. To classify image spam, you could consider image Cerberus that
> does a classification on images metadata (size, presence of text, etc.)
>
>> 1)  Is this being used? Does it detect image spam, or should I be
>> looking at something else?
>
> Yes. No, maybe.
>
> I am running it, it does not do a very good job at extracting the text
> from the images. Then it uses it's own list of keywords to detect spam:
> to me it's the biggest problem, it should push back the text to
> SpamAssassin and let SA rules decide what to do with it.
>
  I do agree that the OCR program should be doing the OCR'ing and
the text filtering should be left to a program that does that for a
living. In the modern, systemd world this is of course an ancient and
outdated design philosophy.

>> 2)  I'm getting some horny date spam coming through with just
>> images and text inside an image at the bottom. My bayes seems to be
>> scoring this with -1.90 Bayes_00. I keep sending this to my database
>> as spam but I'm not sure how many I need to feed it and I don't get
>> much. Are there any other means of feeding bayes with image spam (or
>> any spam really) from a source on the internet? Or is that a bad idea
>> since that's not my spam?
>
> The ideal plugin would be able to look at a picture and decide that it's
> an horny date :) I remember we once had a student that wanted to work on
> classifying picture by the amount of flesh to decide whether it was a
> naked picture or not/ But I don't think he ever succeeded.
>
  I need to find where I saw this - might even have been in
wikipedia of all places -- but China or some other country has a
program that blocks images on the internet based on the amount of
flesh. As a result, it would block a picture of a bunch of pigs
feeding. Maybe it is the same guy?

>> 3)  If I use Fuzzy OCR on FreeBSD, how does it get updated?
>
> I doubt FuzzyOCR ever gets updated, on FreeBSD or elsewhere.
>
>> 4)  I installed it from the ports and I had to install tesseract
>> or I got a dependency warning message. Now I still get a warning -
>> warn: FuzzyOcr: Cannot find executable for gifinter - Is this normal?
>> How should I omit this error since I can't find gifinter in the ports
>> tree?
>
> gifinter used to be part of /usr/ports/graphics/giflib
> but the NEWS file mentions that:
> Version 5.0.1
> =
> Retirements
> ---
> * gifinter is gone.  Use convert -interlace from the ImageMagick suite.
>
> In my case, I still have an old executable of gifinter laying around,
> but I think you would configure FuzzyOCF.cf with an approprate line of
> the form:
>
> focr_bin_gifinter /usr/local/bin/convert -interlace and the needed
> parameters.
>
> Best regards,
>
> Olivier


Re: Image spam - FuzzyOCR?

2016-08-31 Thread Olivier
Richard,

> I am looking at Fuzzy ocr to detect more image spam and I had a couple
> of questions;

FuzzyOCR does not detect image spam per se, it detects spam text in an
image. To classify image spam, you could consider image Cerberus that
does a classification on images metadata (size, presence of text, etc.)

> 1)  Is this being used? Does it detect image spam, or should I be
> looking at something else?

Yes. No, maybe.

I am running it, it does not do a very good job at extracting the text
from the images. Then it uses it's own list of keywords to detect spam:
to me it's the biggest problem, it should push back the text to
SpamAssassin and let SA rules decide what to do with it.

> 2)  I'm getting some horny date spam coming through with just
> images and text inside an image at the bottom. My bayes seems to be
> scoring this with -1.90 Bayes_00. I keep sending this to my database
> as spam but I'm not sure how many I need to feed it and I don't get
> much. Are there any other means of feeding bayes with image spam (or
> any spam really) from a source on the internet? Or is that a bad idea
> since that's not my spam?

The ideal plugin would be able to look at a picture and decide that it's
an horny date :) I remember we once had a student that wanted to work on
classifying picture by the amount of flesh to decide whether it was a
naked picture or not/ But I don't think he ever succeeded.

> 3)  If I use Fuzzy OCR on FreeBSD, how does it get updated?

I doubt FuzzyOCR ever gets updated, on FreeBSD or elsewhere.

> 4)  I installed it from the ports and I had to install tesseract
> or I got a dependency warning message. Now I still get a warning -
> warn: FuzzyOcr: Cannot find executable for gifinter - Is this normal?
> How should I omit this error since I can't find gifinter in the ports
> tree?

gifinter used to be part of /usr/ports/graphics/giflib
but the NEWS file mentions that:
Version 5.0.1
=
Retirements
---
* gifinter is gone.  Use convert -interlace from the ImageMagick suite.

In my case, I still have an old executable of gifinter laying around,
but I think you would configure FuzzyOCF.cf with an approprate line of
the form:

focr_bin_gifinter /usr/local/bin/convert -interlace and the needed
parameters.

Best regards,

Olivier


Image spam - FuzzyOCR?

2016-08-31 Thread Richard Mealing
Hi everyone,

I am looking at Fuzzy ocr to detect more image spam and I had a couple of 
questions;


1)  Is this being used? Does it detect image spam, or should I be looking 
at something else?

2)  I'm getting some horny date spam coming through with just images and 
text inside an image at the bottom. My bayes seems to be scoring this with 
-1.90 Bayes_00. I keep sending this to my database as spam but I'm not sure how 
many I need to feed it and I don't get much. Are there any other means of 
feeding bayes with image spam (or any spam really) from a source on the 
internet? Or is that a bad idea since that's not my spam?

3)  If I use Fuzzy OCR on FreeBSD, how does it get updated?

4)  I installed it from the ports and I had to install tesseract or I got a 
dependency warning message. Now I still get a warning - warn: FuzzyOcr: Cannot 
find executable for gifinter - Is this normal? How should I omit this error 
since I can't find gifinter in the ports tree?

Thanks,
Rich



Re: Increase in Image Spam

2014-02-21 Thread Kevin A. McGrail

On 2/20/2014 10:35 PM, Amir Caspi wrote:

On Feb 20, 2014, at 8:07 PM, Kevin A. McGrail kmcgr...@pccc.com wrote:


No need to run through 3.3.2.  The emails are well over the 256KB limit hard 
coded in sa-learn with 3.3.2.

Understood, and thanks for checking on this.  Now that I know this is the 
problem, I've manually edited Mail::SpamAssassin::ArchiveIterator.pm to change 
the BIG_BYTES limit from 256K to 1500K (which I've found is a reasonable size 
for my small system).  I've verified that this change allows sa-learn to work 
properly for these messages.

Is there any reason that such a manual edit could cause problems elsewhere, or 
am I safe to have made this change?  (Neglect the fact that large messages 
could cause high loads, my system can handle that.)

Or, would you recommend that instead of making this change, I just set opt_all 
= 1 in sa-learn's instantiation of ArchiveIterator?  (That is, modify sa-learn 
instead of ArchiveIterator.)

I don't know, sorry.  Let us know if you find any issues for sure.

Now, that brings up the other question: I have other mails that are well below the 256K 
limit (and certainly below the 1500K limit I just made), but they are still not being 
examined by sa-learn.  These messages are pretty old (from July 2013) ... are they being 
ignored because they are too old?  I don't see that sa-learn is using opt_before or 
opt_after for Archive_Iterator, and I don't see anywhere else where it's excluding old 
messages... and there are no errors in the debug output, but I'm still getting 0 
message examined.

This sample mbox of old mails is here:

https://www.dropbox.com/s/zvbmvk8pb06v0m8/SA_testspam_old.mbox

If it's being ignored based on date, how would I know that?

Sorry for being dense. =)


The file isn't in mbox format.  No From separators.

Regards,
KAM


Re: Increase in Image Spam

2014-02-20 Thread Axb

On 02/20/2014 06:06 PM, Amir Caspi wrote:

Hi all,

Following some off-list discussions with Kevin, John, et al., I had a 
question that was suggested I bring up on-list, so here it is:

For whatever reason, many of the FNs I've been getting lately are 
passing because they hit BAYES_00, even though they are matching 
AC_SPAMMY_URI_PATTERNS.  I need to enable bayes tokens in the headers so I can 
see why these are considered so hammy when I know for sure they're not...

But, I would love if there were a way to ignore the bayes score if 
AC_SPAMMY_URI_PATTERNS matches.  I know this is rather silly -- the whole point 
of Bayes is to help determine if an email is spam or ham regardless of the 
other rules -- but I'm just flummoxed by having these obviously-spammy emails 
being treated as ham.

Should I create a rule that adds extra points if AC_SPAMMY_URI_PATTERNS 
hits AND a low Bayes score is found?  Or should I just make 
AC_SPAMMY_URI_PATTERNS a poison pill, since I've never gotten an FP out of it?  
Not sure what else to do about these Bayes-killing spams (besides wiping my 
entire Bayes DB and starting over).

Thoughts?


Amir,

What kind of traffic are you dealing with? personal, corporate? ISPish?
How many domains/users/msgs/day?

There's a number of options depending on the amount of traffic you handle.




Re: Increase in Image Spam

2014-02-20 Thread Amir Caspi
On Feb 20, 2014, at 10:15 AM, Axb axb.li...@gmail.com wrote:

 What kind of traffic are you dealing with? personal, corporate? ISPish?
 How many domains/users/msgs/day?

This is mostly personal email with a little bit of corporate.  In this 
instance, it is for a single domain with 3 users and approximately 50-100 total 
legitimate messages per day (but HUNDREDS of spams per day, most of which are 
properly classified; I am seeing only a few [10] FNs per day, although those 
FNs are, as I described, getting Bayes_00... they are almost always image spam 
with not much text.)

I do have a number of other domains but I don't monitor the spam quality on 
those actively (and I haven't received complaints).

Thanks.

--- Amir

Re: Increase in Image Spam

2014-02-20 Thread Axb

On 02/20/2014 06:22 PM, Amir Caspi wrote:

On Feb 20, 2014, at 10:15 AM, Axb axb.li...@gmail.com wrote:


What kind of traffic are you dealing with? personal, corporate?
ISPish? How many domains/users/msgs/day?


This is mostly personal email with a little bit of corporate.  In
this instance, it is for a single domain with 3 users and
approximately 50-100 total legitimate messages per day (but HUNDREDS
of spams per day, most of which are properly classified; I am seeing
only a few [10] FNs per day, although those FNs are, as I described,
getting Bayes_00... they are almost always image spam with not much
text.)

I do have a number of other domains but I don't monitor the spam
quality on those actively (and I haven't received complaints).



In your case this is what I'd do.

I hope you're running SA 3.4 so:

Assuming you can check maillogs and can either detect some spammed 
unknown user patterns or have  a dedicated trap domain to spare, I'd 
accept that mail and write some header rules to score the trap 
rcpt/domain REAL high and use a rule like


tflags RULENAME autolearn_force

obviously you'll need
bayes_auto_learn  1


That would help feed your small Bayes DB pretty fast and help detect all 
kinds of crap.


h2h





Re: Increase in Image Spam

2014-02-20 Thread Amir Caspi
On Feb 20, 2014, at 10:34 AM, Axb axb.li...@gmail.com wrote:

 I hope you're running SA 3.4 so:

I am still on 3.3.2 because nobody has yet packaged 3.4 for CentOS 5.x, from 
what I can tell.  I have the package from the rpmforge-extras repo, and 3.3.2 
is still the most current version there (and on Atomic and AtRPMs).

I'm not sure who is responsible for updating the packages, but I'll probably 
have to wait a while until they get 3.4 uploaded there.

 Assuming you can check maillogs and can either detect some spammed unknown 
 user patterns or have  a dedicated trap domain to spare, I'd accept that mail 
 and write some header rules to score the trap rcpt/domain REAL high and use a 
 rule like
 
 tflags RULENAME autolearn_force

I'm not entirely sure what you mean here.  Are you saying to use a 
honeypot/spamtrap to feed the Bayes DB?  My problem is not that my Bayes DB 
doesn't have enough spam in it, it's that these particular FNs are scoring 00.  
Let me note that the Bayes DBs are per-user, not per-domain.  Here's the magic 
output from my Bayes DB:

0.000  0  3  0  non-token data: bayes db version
0.000  0 239650  0  non-token data: nspam
0.000  0  85695  0  non-token data: nham
0.000  0 145773  0  non-token data: ntokens
0.000  0 1387110367  0  non-token data: oldest atime
0.000  0 1392917375  0  non-token data: newest atime
0.000  0 1392886526  0  non-token data: last journal sync atime
0.000  0 1392637273  0  non-token data: last expiry atime
0.000  05529600  0  non-token data: last expire atime delta
0.000  0   9005  0  non-token data: last expire reduction 
count

I don't think this counts as a small DB, does it?

Bayes is set to autolearn, and I manually run sa-learn about once a week on my 
spam folder (to learn the FNs, plus lower-scoring spam that was not 
autolearned).  MANY such image spams are caught properly, including by Bayes; 
the problem is that some of them, somehow, manage to slip through and score 
very low (00 or 20).  I just have no idea how that is happening (which is why I 
should start enabling token output in the headers and look), but that's why I 
was thinking of scoring AC_SPAMMY_URI_PATTERNS very high if Bayes is scoring 
very low, although I guess that kind of defeats the purpose of Bayes and 
introduces the risk of FPs.

-- Amir



Re: Increase in Image Spam

2014-02-20 Thread Benny Pedersen

On 2014-02-20 18:06, Amir Caspi wrote:


for whatever reason, many of the FNs I've been getting lately are
passing because they hit BAYES_00, even though they are matching
AC_SPAMMY_URI_PATTERNS.  I need to enable bayes tokens in the headers
so I can see why these are considered so hammy when I know for sure
they're not...


meta AC_URI_BAYES_HAM (AC_SPAMMY_URI_PATTERNS  BAYES_00)

score with 5 ?


But, I would love if there were a way to ignore the bayes score if
AC_SPAMMY_URI_PATTERNS matches.


see above, dont count on scores, make rules to add scores, for the spam 
that is really spam



I know this is rather silly -- the
whole point of Bayes is to help determine if an email is spam or ham
regardless of the other rules -- but I'm just flummoxed by having
these obviously-spammy emails being treated as ham.


you should really just train bayes more then, spammers will always loose 
if bayes is well trained



Should I create a rule that adds extra points if
AC_SPAMMY_URI_PATTERNS hits AND a low Bayes score is found?


yep as i showed on above


Or should
I just make AC_SPAMMY_URI_PATTERNS a poison pill, since I've never
gotten an FP out of it?


this will work aswell but if bayes is trained to bayes_60 or highter is 
does not really ned more help on bayes scoreing



Not sure what else to do about these
Bayes-killing spams (besides wiping my entire Bayes DB and starting
over).


this will be counter productive :=)


Thoughts?


samples somewhere ?`


Re: Increase in Image Spam

2014-02-20 Thread Amir Caspi
On Feb 20, 2014, at 11:21 AM, Kris Deugau kdeu...@vianet.ca wrote:

 Have you tried learning one specific FN, then reprocessing that message
 to see what Bayes score it gets?  IME it will usually shift from
 BAYES_00 to at least BAYES_40 in most cases, even with a large sitewide
 DB with far more tokens than the usual per-user DB.

Well, I just tried this, and sa-learn seems to be refusing to learn the 
messages.  I've placed an example MBOX here, temporarily (I will delete this 
within the next 24-48 hours for security):

https://www.dropbox.com/s/m4fuv670wnvwa16/SA_testspam.mbox

When I run sa-learn on this mailbox, it says:

Learned tokens from 0 message(s) (0 message(s) examined)

(This is using SA 3.3.2 on a CentOS 5.10 box.)

I tried placing other spam in here and it learned those fine, so clearly 
something about these two messages is confusing sa-learn.

Anyone have an idea why sa-learn is refusing to even examine these messages?

(Note that the messages are out of order; the first one is newer than the 
second.  The older one scored Bayes_50, the newer one scored Bayes_00.)

Any thoughts are greatly appreciated, I don't know why sa-learn won't even 
touch these... and that may explain why they continue to have low scores!

--- Amir

Re: Increase in Image Spam

2014-02-20 Thread Benny Pedersen

On 2014-02-20 21:43, Axb wrote:


Redis DB in RAM - do the math :)


got results as 781250

now its time to see how much power so many pi' is using :=)

have anyone thinked about running mysql in memory ?, if its slow?

engine=memory in the spamd init script, and engine=myisam on shutdown

yes i know its risky, but would be nice to see comparisons


Re: Increase in Image Spam

2014-02-20 Thread Amir 'CG' Caspi
On Thu, February 20, 2014 12:57 pm, John Hardin wrote:
 0 messages examined generally means either the format isn't what
 sa-learn expected, or the message is larger than the size limit.

The file format is most certainly MBOX... it was created by my MUA, and
running file on it tells me that it is ASCII mail text.  As I
mentioned, adding other spams to it results in those other spams being
properly learned, so it can't be a format issue unless the specific
messages themselves are not formatted in a way that sa-learn likes (though
the MTA and MUA like it just fine).

If it's a size issue, how can I increase the size limit for sa-learn? 
But, I don't think it's a size issue since these messages are under 512k
each.

Note that I have some other spams for which this is now an issue but which
I think worked fine in the past (with SA 3.3.1 for sure); is it possible
something got borked in sa-learn between 3.3.1 and 3.3.2 and nobody
noticed?  (I can't install 3.4 since it hasn't been RPM'd for CentOS 5.x
yet.)

I tried running sa-learn -D but the debug output didn't tell me anything
(that I could see) about why it was skipping the messages.  Running
spamassassin on the messages works just fine (I see SA output, so it's
matching rules), as does running spamc/spamd.  It is only sa-learn that
seems to be choking, and I have no idea why.

Any additional suggestions on how I can diagnose this?  Is it looking like
something I can fix, or a bug in sa-learn?

Thanks.

--- Amir




Re: Increase in Image Spam

2014-02-20 Thread Axb

On 02/20/2014 10:35 PM, Amir 'CG' Caspi wrote:

Note that I have some other spams for which this is now an issue but which
I think worked fine in the past (with SA 3.3.1 for sure); is it possible
something got borked in sa-learn between 3.3.1 and 3.3.2 and nobody
noticed?  (I can't install 3.4 since it hasn't been RPM'd for CentOS 5.x
yet.)


what's wrong with installing from source?
(NOT Cpan install)




Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail

On 2/20/2014 4:35 PM, Amir 'CG' Caspi wrote:

If it's a size issue, how can I increase the size limit for sa-learn?
But, I don't think it's a size issue since these messages are under 512k
each.

--max-size= I believe.  Default is 256K.


Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail

On 2/20/2014 4:39 PM, Axb wrote:

On 02/20/2014 10:35 PM, Amir 'CG' Caspi wrote:
Note that I have some other spams for which this is now an issue but 
which

I think worked fine in the past (with SA 3.3.1 for sure); is it possible
something got borked in sa-learn between 3.3.1 and 3.3.2 and nobody
noticed?  (I can't install 3.4 since it hasn't been RPM'd for CentOS 5.x
yet.)


what's wrong with installing from source?
(NOT Cpan install)
Theoretically CPAN install should work now as well though FreeBSD users 
will need to wait for the 3.4.1 release to install cleanly due to a 
variable collision (script).


Regards,
KAM


Re: Increase in Image Spam

2014-02-20 Thread Benny Pedersen

On 2014-02-20 22:39, Axb wrote:
noticed?  (I can't install 3.4 since it hasn't been RPM'd for CentOS 
5.x

yet.)


what's wrong with installing from source?
(NOT Cpan install)


http://searchcode.com/codesearch/view/21483839

the harddest part is to know howto :=)


Re: Increase in Image Spam

2014-02-20 Thread Benny Pedersen

On 2014-02-20 22:39, Kevin A. McGrail wrote:

On 2/20/2014 4:35 PM, Amir 'CG' Caspi wrote:

If it's a size issue, how can I increase the size limit for sa-learn?
But, I don't think it's a size issue since these messages are under 
512k

each.

--max-size= I believe.  Default is 256K.


and small mbox files exists, it could just be missing --mbox on 
commandline else it would use maildir as default


Re: Increase in Image Spam

2014-02-20 Thread Amir 'CG' Caspi
On Thu, February 20, 2014 2:39 pm, Axb wrote:
 what's wrong with installing from source?

I run a virtual-hosting server where the individual site RPMs are copied
from server-level RPMs. Basically all software has to be installed as RPMs
in order to propagate to the individual virtual hosts.

--- Amir



Re: Increase in Image Spam

2014-02-20 Thread Benny Pedersen

On 2014-02-20 22:56, Amir 'CG' Caspi wrote:

I run a virtual-hosting server where the individual site RPMs are 
copied
from server-level RPMs. Basically all software has to be installed as 
RPMs

in order to propagate to the individual virtual hosts.


google on dist2rpm, you basicly just use source from cpan to make rpms, 
when rpms is build update like you always do in centos


i just still dont understand centos people not make it self more 
natively create the spec file and rebuild with a src rpms if cpan is not 
an option


Re: Increase in Image Spam

2014-02-20 Thread Amir 'CG' Caspi
On Thu, February 20, 2014 2:49 pm, Benny Pedersen wrote:
 On 2014-02-20 22:39, Kevin A. McGrail wrote:
 --max-size= I believe.  Default is 256K.

sa-learn barfs, that flag is not accepted.  That flag works for spamc, but
not for sa-learn.  sa-learn man page and CLI help don't have any mention
of a max message size.

 and small mbox files exists, it could just be missing --mbox on
 commandline else it would use maildir as default

Here is the exact command I am running, and the exact output:

-bash-3.2$ file SA_testspam.mbox
testspam: ASCII mail text

-bash-3.2$ sa-learn --mbox --progress --spam SA_testspam.mbox
Learned tokens from 0 message(s) (0 message(s) examined)


As you can see, it is an MBOX file, and I'm passing the --mbox flag, it
just doesn't like these two messages.  (To reiterate, adding a few other
spams results in THOSE spams getting considered, but these two messages
still being ignored.)

Very strange.

--- Amir



Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail

On 2/20/2014 5:07 PM, Amir 'CG' Caspi wrote:

On Thu, February 20, 2014 2:49 pm, Benny Pedersen wrote:

On 2014-02-20 22:39, Kevin A. McGrail wrote:

--max-size= I believe.  Default is 256K.

sa-learn barfs, that flag is not accepted.  That flag works for spamc, but
not for sa-learn.  sa-learn man page and CLI help don't have any mention
of a max message size.
Are you using 3.4.0?  I believe the size was hard-coded until then when 
the max-size option was added to sa-learn.




Re: Increase in Image Spam

2014-02-20 Thread Martin Gregorie
On Thu, 2014-02-20 at 16:39 -0500, Kevin A. McGrail wrote:
 On 2/20/2014 4:35 PM, Amir 'CG' Caspi wrote:
  If it's a size issue, how can I increase the size limit for sa-learn?
  But, I don't think it's a size issue since these messages are under 512k
  each.
 --max-size= I believe.  Default is 256K.
 
Sorry, no. According to my manpage (SA 3.3.2) there is no --max-size
option and (second try) sa-learn --max-size is rejected as an unknown
option.

On the same subject, is there any change that a max-size configuration
parameter could be supplied via local.cf? 

Reasons:

1) IMO a single central setting is better than remembering to specify
   and change it in several scripts. Currently it needs to be set to 
   the same value in every script or MTA configuration that can run
   spamc and/or sa-learn and its quite easy to miss one.

2) There currently seems to be no way of overriding the default max
   message size for the commands spamassassin, spamd or sa-learn.

3) It improves system documentation to have all parameter settings in
   one place.

I accept that setting the message size in local.cf may slow spamc down
slightly if spamd doesn't already send a reply to spamc, which could
pass the setting back, before accepting the message but the overhead of
adding the reply message should be quite small.


Martin

 





Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail

On 2/20/2014 5:16 PM, Martin Gregorie wrote:

On Thu, 2014-02-20 at 16:39 -0500, Kevin A. McGrail wrote:

On 2/20/2014 4:35 PM, Amir 'CG' Caspi wrote:

If it's a size issue, how can I increase the size limit for sa-learn?
But, I don't think it's a size issue since these messages are under 512k
each.

--max-size= I believe.  Default is 256K.


Sorry, no. According to my manpage (SA 3.3.2) there is no --max-size
option and (second try) sa-learn --max-size is rejected as an unknown
option.

Try 3.4.0

 --max-size bSkip messages larger than b bytes;
  defaults to 256 KiB, 0 implies no limit

I'll fix KiB to read KB.


On the same subject, is there any change that a max-size configuration
parameter could be supplied via local.cf?

Don't believe so.

1) IMO a single central setting is better than remembering to specify
and change it in several scripts. Currently it needs to be set to
the same value in every script or MTA configuration that can run
spamc and/or sa-learn and its quite easy to miss one.
My systems run with different limits in different places and in fact on 
different servers with spamc connecting to spamd boxes on other 
systems.  Unifying wouldn't be something I would want to see.


2) There currently seems to be no way of overriding the default max
message size for the commands spamassassin, spamd or sa-learn.

I believe this is false.

Typically if you were using spamassassin, a size limit it would be 
implemented by your .procmailrc implementation for example.


Spamd would be limited by spamc -s parameter.

sa-learn has the --max-size option added with 3.4.0

3) It improves system documentation to have all parameter settings in
one place.
SA is an API as well as a collection of programs implementing the API.  
It's a Swiss army tool with a whole bunch of configurable settings.  
And, as in my case, many of the tools can run on different servers by 
different users, etc.  One place for parameters is very hard.


But if you want to discuss further and can provide patches that don't 
break existing functionality, I'm always looking to get more people 
involved and submitting patches.

I accept that setting the message size in local.cf may slow spamc down
slightly if spamd doesn't already send a reply to spamc, which could
pass the setting back, before accepting the message but the overhead of
adding the reply message should be quite small.
More to the point, spamc would have to process all config files first 
which would slow it down.  The point of spamc is to be a VERY 
lightweight connection to spamd.


regards,
KAM


Re: Increase in Image Spam

2014-02-20 Thread Amir 'CG' Caspi
On Thu, February 20, 2014 3:16 pm, Kevin A. McGrail wrote:
 Are you using 3.4.0?  I believe the size was hard-coded until then when
 the max-size option was added to sa-learn.

No, as mentioned previously in this flurry of emails, I'm using 3.3.2. 
However, note that using spamassassin directly (not learning, just
classifying) works just fine, there is no complaint of max message size. 
Using spamc with --max-size, no complaints either.  And, finally, sa-learn
with -D (debug) does not show me any error messages or warnings related to
message size, or ANYTHING in fact that would lead me to understand why
it's skipping these messages.  If they exceed the maximum size, sa-learn
is being very quiet about it and not throwing an explicit error in the
debug output.

I echo Martin's question of whether it's possible to override the max size
in local.cf, because on my system (with virtual hosts that call spamc)
that would be much more preferable than having to specify max-size in
every virtual host's /etc/procmailrc (which is how I have to do it now).

Thanks.

--- Amir




Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail

  
  
I think you were just on the email
  chain on list so my reply to another person went to you.
  
  On 2/20/2014 5:21 PM, Benny Pedersen wrote:

On 2014-02-20 23:16, Kevin A. McGrail wrote:
  
  
  Are you using 3.4.0?  I believe the size
was hard-coded until then

when the max-size option was added to sa-learn.

  
  
  SpamAssassin 3.4.0 (2014-02-07)
  
  
  yes i do ebuilds for gentoo self
  
  
  3.4 is not in gentoo yet
  
  
  Kevin: do i need to be reply private here ?
  



-- 
  Kevin A. McGrail
  President
  
Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422
  
http://www.pccc.com/
  
703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-359-8451 (fax)
kmcgr...@pccc.com
  
  
  

  



Re: Increase in Image Spam

2014-02-20 Thread Benny Pedersen

On 2014-02-20 23:16, Kevin A. McGrail wrote:


Are you using 3.4.0?  I believe the size was hard-coded until then
when the max-size option was added to sa-learn.


SpamAssassin 3.4.0 (2014-02-07)

yes i do ebuilds for gentoo self

3.4 is not in gentoo yet

Kevin: do i need to be reply private here ?


Re: Increase in Image Spam

2014-02-20 Thread Martin Gregorie
On Thu, 2014-02-20 at 17:29 -0500, Kevin A. McGrail wrote:
 More to the point, spamc would have to process all config files first 
 which would slow it down.  The point of spamc is to be a VERY 
 lightweight connection to spamd.
 
That's why I suggested that spamc could be handed that value by spamd
before it ships the message over. 

This is or should be lightweight: in the past I was able to get 25,000
request/responses per second from a process that was answering queries
against a large (500k entry) in-memory red/black btree. This was on a
single core 625 MHz AlphaServer with both processes on the same box. IOW
the cost per message pair was comfortably under 40mS once the time
needed to search the btree is subtracted. Most present-day servers
should do considerably better.


Martin





Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail

On 2/20/2014 5:48 PM, Martin Gregorie wrote:

On Thu, 2014-02-20 at 17:29 -0500, Kevin A. McGrail wrote:

More to the point, spamc would have to process all config files first
which would slow it down.  The point of spamc is to be a VERY
lightweight connection to spamd.


That's why I suggested that spamc could be handed that value by spamd
before it ships the message over.
I had the same suggestion.  If you really want this, I'd say off the 
cuff you should implement a new version of the spamc protocol and have 
the spamc/spamd negotiate whether the connection was going to be 
accepted by sending the message size ahead of time coupled with a 
local.cf option for the spamd max message size.


You can open a feature request for this at bugzilla and I'd be happy to 
help testing any patches you might come up with.


So in short, if you like the idea, take a whack at the code and make a 
patch.


regards,
KAM


Re: Increase in Image Spam

2014-02-20 Thread Amir 'CG' Caspi
On Thu, February 20, 2014 3:52 pm, Kevin A. McGrail wrote:
 Questions that will be answered by that is solved in 3.4.0 aren't
 really going to get much support from me...

Understood, though it'll be a while before I can upgrade to 3.4 due to the
RPM issue that I've mentioned previously.  However, I Googled this issue
before mailing and this iterator error you posted SHOULD appear in
sa-learn even in 3.3.x, but it does not seem to.  More to the point, when
trying to run on a spam that had previously worked fine with v3.3.1,
sa-learn STILL says 0 messages examined and that spam is only 4K, so
there's no chance it's running up against the max-size limit.  (On the
other hand, that spam is many months old -- does sa-learn have a date
limit as well?  If so, is that customizable?)

--- Amir



Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail

On 2/20/2014 6:01 PM, Amir 'CG' Caspi wrote:

On Thu, February 20, 2014 3:52 pm, Kevin A. McGrail wrote:

Questions that will be answered by that is solved in 3.4.0 aren't
really going to get much support from me...

Understood, though it'll be a while before I can upgrade to 3.4 due to the
RPM issue that I've mentioned previously.  However, I Googled this issue
before mailing and this iterator error you posted SHOULD appear in
sa-learn even in 3.3.x, but it does not seem to.  More to the point, when
trying to run on a spam that had previously worked fine with v3.3.1,
sa-learn STILL says 0 messages examined and that spam is only 4K, so
there's no chance it's running up against the max-size limit.  (On the
other hand, that spam is many months old -- does sa-learn have a date
limit as well?  If so, is that customizable?)
Probably best if you install 3.4.0 (or even trunk) on a test system and 
throw the offending email onto that server and run sa-learn on that box 
with -D.


Then we can start discussing apples to apples and add more debugging if 
needed.


regards,
KAM


Re: Increase in Image Spam

2014-02-20 Thread Amir 'CG' Caspi
On Thu, February 20, 2014 4:08 pm, Kevin A. McGrail wrote:
 Probably best if you install 3.4.0 (or even trunk) on a test system and
 throw the offending email onto that server and run sa-learn on that box
 with -D.

In the meantime, anyone want to do it on my behalf? =)  I provided the
mbox link earlier; I unfortunately do not have a test system available. 
(I'm not quite a professional sysadmin...)

--- Amir




Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail
Resend the mbox.link and I will likely have a cycle to throw it through.
Regards,
KAM

Amir 'CG' Caspi ceph...@3phase.com wrote:

On Thu, February 20, 2014 4:08 pm, Kevin A. McGrail wrote:
 Probably best if you install 3.4.0 (or even trunk) on a test system
and
 throw the offending email onto that server and run sa-learn on that
box
 with -D.

In the meantime, anyone want to do it on my behalf? =)  I provided the
mbox link earlier; I unfortunately do not have a test system available.

(I'm not quite a professional sysadmin...)

   --- Amir


Re: Increase in Image Spam

2014-02-20 Thread Amir 'CG' Caspi
On Thu, February 20, 2014 5:13 pm, Kevin A. McGrail wrote:
 Resend the mbox.link and I will likely have a cycle to throw it through.

https://www.dropbox.com/s/m4fuv670wnvwa16/SA_testspam.mbox

To be deleted in 24-48 hours (don't want spammers harvesting it).

If you have a chance, please run it through both 3.3.2 and 3.4.0, to see
if there's a difference... clearly, it's not working on _MY_ 3.3.2 for
some reason!  I sent the exact commands that I used in a prior email a
couple of hours ago.

Thanks. =)

--- Amir




Re: Increase in Image Spam

2014-02-20 Thread John Hardin

On Thu, 20 Feb 2014, Ian Zimmerman wrote:


On Thu, 20 Feb 2014 11:57:17 -0800 (PST)
John Hardin jhar...@impsec.org wrote:

Amir When I run sa-learn on this mailbox, it says:

Amir Learned tokens from 0 message(s) (0 message(s) examined)

John 0 messages examined generally means either the format isn't what
John sa-learn expected, or the message is larger than the size limit.

In my case it usually means the message has been learned already and SA
just refuses to do so for the 2nd time :-)


That would be learned tokens from 0 messages (n  0 messages examined).

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You do not examine legislation in the light of the benefits it
  will convey if properly administered, but in the light of the
  wrongs it would do and the harms it would cause if improperly
  administered.  -- Lyndon B. Johnson
---
 2 days until George Washington's 282nd Birthday


Re: Increase in Image Spam

2014-02-20 Thread Kevin A. McGrail

On 2/20/2014 7:18 PM, Amir 'CG' Caspi wrote:
If you have a chance, please run it through both 3.3.2 and 3.4.0, to 
see if there's a difference... clearly, it's not working on _MY_ 3.3.2 
for some reason! I sent the exact commands that I used in a prior 
email a couple of hours ago. Thanks. =) --- Amir


No need to run through 3.3.2.  The emails are well over the 256KB limit 
hard coded in sa-learn with 3.3.2.


3.4.0:

sa-learn -D --mbox --progress --spam  /tmp/temp.mbox 21 | tee /tmp/output

Feb 20 21:51:33.484 [21525] dbg: archive-iterator: _run_mailbox 
/tmp/.spamassassin2152599LqEKtmp, ofs 0, limit 262144
Feb 20 21:51:33.500 [21525] info: archive-iterator: skipping large 
message: 4089 lines, 262160 bytes, limit 262144 bytes
Feb 20 21:51:33.501 [21525] dbg: archive-iterator: _run_mailbox 
/tmp/.spamassassin2152599LqEKtmp, ofs 429849, limit 262144
Feb 20 21:51:33.517 [21525] info: archive-iterator: skipping large 
message: 4088 lines, 262169 bytes, limit 262144 bytes



Re-running with a limit high enough to
sa-learn -D --mbox --progress --spam  /tmp/temp.mbox --max-size=60 
21 | tee /tmp/output


Learned tokens from 2 message(s) (2 message(s) examined)


Output from debug and everything ;-)

regards,
KAM


Increase in Image Spam

2014-02-11 Thread Andy Jezierski
I've been seeing a pretty big increase in image spam over the last month 
or so. I remember using FuzzyOCR years ago when image spam was a much 
bigger problem.

Since FuzzyOCR hasn't been maintained in several years, is there an 
alternative that would work?  Or is there another way to try and catch 
them?

They don't really hit on any rules

X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_99,HTML_MESSAGE,
SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no 
version=3.4.0-rc5 

Thanks
Andy

Re: Increase in Image Spam

2014-02-11 Thread Amir Caspi
On Feb 11, 2014, at 10:25 AM, Andy Jezierski ajezier...@stepan.com wrote:
 They don't really hit on any rules 

A number of image spams have certain template formats and I've written custom 
rules to catch many... however, I've been hesitant to release those rules 
publicly since spammers could just change their templates easily to circumvent 
this.  (Most image spams for me hit moderate or very low Bayes scores, 
sometimes Bayes_00, presumably due to the low amount of spammy tokens and large 
amount of innocuous/hammy tokens...)

I could release the rules publicly but that may end up backfiring, per above.  
John, Kevin, what do you guys think?

--- Amir



Re: Increase in Image Spam

2014-02-11 Thread John Hardin

On Tue, 11 Feb 2014, Amir Caspi wrote:

I could release the rules publicly but that may end up backfiring, per 
above.  John, Kevin, what do you guys think?


Spammers can install SpamAssassin as easily as anyone else, that's a known 
risk. Any rules we provide they can potentially test against their spams 
to minimize score.


How much they actually *do* this I can't say.

We could try it with one of your rules, and if it suddenly stops hitting 
then the spammers are reacting.


I think it has value, even if they do react.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Windows Genuine Advantage (WGA) means that now you use your
  computer at the sufferance of Microsoft Corporation. They can
  kill it remotely without your consent at any time for any reason;
  it also shuts down in sympathy when the servers at Microsoft crash.
---
 Tomorrow: Abraham Lincoln's and Charles Darwin's 205th Birthdays


Re: Increase in Image Spam

2014-02-11 Thread Benny Pedersen

On 2014-02-11 18:25, Andy Jezierski wrote:


They don't really hit on any rules

X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_99,HTML_MESSAGE,

 SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no
version=3.4.0-rc5


bayes is seeing it as spam, so it might be in vain :)

well if bayes is well trained you can add more meta score to that hit, 
but also maybe meta it with  not user in spf whitelist or something ?


eg if spf pass domain is spamming remove it from local.cf as whitelisted 
for that envelope sender, not From: header


meta UNTRUSTED_SPF_PASS (SPF_PASS  !USER_IN_SPF_WHITELIST)

score based on that meta

to distingt that this is usefull add whitelist_from_spf 
*@foo.example.com to local.cf for sender domains that is not spaming


same meta can be made with dkim


Re: Increase in Image Spam

2014-02-11 Thread RW
On Tue, 11 Feb 2014 20:22:00 +0100
Benny Pedersen wrote:

 On 2014-02-11 18:25, Andy Jezierski wrote:
 
  They don't really hit on any rules
  
  X-Spam-Status: No, score=3.5 required=5.0
  tests=BAYES_99,HTML_MESSAGE,
  
   SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no
  version=3.4.0-rc5
 
 bayes is seeing it as spam, so it might be in vain :)
 
 well if bayes is well trained you can add more meta score to that
 hit, but also maybe meta it with  not user in spf whitelist or
 something ?

Actually I find BAYES_99 to be so reliable that I'd be happy to score
it above 5.0. Other have made similar comments too.


Re: Increase in Image Spam

2014-02-11 Thread Benny Pedersen

On 2014-02-11 20:59, RW wrote:


Actually I find BAYES_99 to be so reliable that I'd be happy to score
it above 5.0. Other have made similar comments too.


there is a number of ways to punish spf pass domains for spamming :)

blacklist_from *@foo.example.org

and for the bayes on could make another meta like:

meta NOT_BAYES_HAM_SPF_PASS (!BAYES_00  SPF_PASS)

or simple reject sender domain in mta


Re: Image spam help

2013-09-17 Thread Alex
Hi,

 http://pastebin.com/0xWK4mws

 This is hitting bayes00 because I assume very little of the body is
 spammy. I've added body and subject rules to catch these, but perhaps
 this relates to the recent fuzzyOCR conversation and may help there?

 I was expecting to see the image and am too lazy to download it from
 pastebin. But if you iontend to catch spam text in an image, FuzzyOCR is
 the tool you need.

I didn't attach the image because I didn't think there was anything
someone could do with just an image anyway without some type of OCR.

I realize fuzzyOCR is very limited and resource-intensive, so will
probably just continue to use body and header rules to catch them
until they become more of a problem, unless someone has other
ideas

btw, gmail thinks your domain is spam

Thanks,
Alex


Re: Image spam help

2013-09-17 Thread Olivier Nicole
Alex,

 I realize fuzzyOCR is very limited and resource-intensive, so will
 probably just continue to use body and header rules to catch them
 until they become more of a problem, unless someone has other
 ideas

From past experience, there were very few spam where fuzzyOCR would have
made a difference, header rules used to catch 99% of the image only spam.

 btw, gmail thinks your domain is spam

What was the error message from gmail? Recently I see a lot of ham being
miss-classified by gmail, all mailing lists that I have been reading
regularly.

Bests,

Olivier


Image spam help

2013-09-16 Thread Alex
Hi guys,

I'm hoping someone can help me with an image spam. I haven't seen one
of these in a while, and I can't figure out how to catch them
effectively.

This one is probably now being caught by the RBLs, but I'm hoping
there's some other characteristic within the email that can be used to
block them before they are listed.

http://pastebin.com/0xWK4mws

This is hitting bayes00 because I assume very little of the body is
spammy. I've added body and subject rules to catch these, but perhaps
this relates to the recent fuzzyOCR conversation and may help there?

Thanks,
Alex


Re: Image spam help

2013-09-16 Thread Olivier Nicole
Hi Alex,

 I'm hoping someone can help me with an image spam. I haven't seen one
 of these in a while, and I can't figure out how to catch them
 effectively.
 
 This one is probably now being caught by the RBLs, but I'm hoping
 there's some other characteristic within the email that can be used to
 block them before they are listed.
 
 http://pastebin.com/0xWK4mws
 
 This is hitting bayes00 because I assume very little of the body is
 spammy. I've added body and subject rules to catch these, but perhaps
 this relates to the recent fuzzyOCR conversation and may help there?

I was expecting to see the image and am too lazy to download it from
pastebin. But if you iontend to catch spam text in an image, FuzzyOCR is
the tool you need.

My only restriction is that FuzzyOCR uses it's own list of spam words
instead of pushing back the decoded text to SA for SA to analyze.

Best regards,

Olivier


Re: Image spam help

2013-09-16 Thread Ian Turner
On Tuesday, September 17, 2013 09:44:21 AM Olivier Nicole wrote:
 My only restriction is that FuzzyOCR uses it's own list of spam words
 instead of pushing back the decoded text to SA for SA to analyze.

This is necessary because of the poor quality of the OCR. It's only going to 
be useful if the number of words you try to match against is very small.


Re: Image spam help

2013-09-16 Thread Olivier Nicole
  My only restriction is that FuzzyOCR uses it's own list of spam words
  instead of pushing back the decoded text to SA for SA to analyze.
 This is necessary because of the poor quality of the OCR. It's only going to 
 be useful if the number of words you try to match against is very small.

While it happens inside a single run of SA, it will not take that much
time to run all the tests on the text extracted from fuzzyOCR.

Either the text is garbage and SA should not trigger or the text is
pretty readable and OCR gives good output and it would be a waste not to
fully test that extracted text.

The problem I see is elsewhere: running OCR is time consuming, fuzzyOCR
will perform several extractions, with different parameters, in order to
catch some obfuscation artifacts, and it will stop as soon as one
extraction has provided spammy words, so it saves computation. While if
you want to push back the extacted text to normal SA, you have to run
all the different extractions (takes time/CPU) and you may end-up having
several copies of the same text to parse with SA (I am not sure if it
would increase the spamines to have several instances of the same bad
word in a message).

Olivier


Re: Image spam help

2013-09-16 Thread John Hardin

On Tue, 17 Sep 2013, Olivier Nicole wrote:

(I am not sure if it would increase the spamines to have several 
instances of the same bad word in a message).


It might. There are rules that consider more than N instances of a given 
phrase (like you'd see on a pharma spam where there are per-pill prices of 
a few dozen drugs) to be a hit.


There isn't really anything that scores a point *per instance*, though, so 
it's not open-ended.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  After ten years (1998-2008) of draconian gun control in the State
  of Massachusetts, the results are in: firearms-related assaults up
  78%, firearms-related homicides up 67%, assault-related emergency
  room visits up 331%. Gun Control does not reduce violent crime.
---
 Tomorrow: the 226th anniversary of the signing of the U.S. Constitution


Image spam

2013-09-02 Thread emailitis.com
We are getting a lot of Spam getting through which is a remote image and for
some reason is not being picked up by SA.  I have put them below with all
details, including the SA rules found and message details.  For ease, the
rules are all pasted below here also. 

 

Are there others who have seen these and are preventing them getting
through?  Can you share how? 

 

http://pastebin.com/SC9JSZSW

http://pastebin.com/qSxV47z2

http://pastebin.com/Ds0adR49

http://pastebin.com/HkNjdm5R

 

I have already tried to make some common rules score more but that does not
seem to be working.  In /etc/mail/spamassassin/local.cf we have put in the
following but I am not sure that these scores are in fact replacing the
default ones:

score URIBL_BLACK 3.5

score URIBL_DBL_SPAM 3

score T_REMOTE_IMAGE 3.5

score RCVD_IN_BRBL_LASTEXT 3.5

 

Many thanks, in advance, for any assistance that the gurus can offer.

 

Kind Regards,

Christoph 

 

 

/root/weeklymail/Sunmaillog:Aug 31 11:11:05 plesk3 spamd[11160]: spamd:
result: . 0 -
BAYES_00,HTML_EXTRA_CLOSE,HTML_MESSAGE,LOCALPART_IN_SUBJECT,LOTS_OF_MONEY,RA
ZOR2_CHECK,RDNS_NONE
scantime=1.2,size=10587,user=qscand,uid=10124,required_score=5.0,rhost=local
host.localdomain,raddr=127.0.0.1,rport=35181,mid=URL,bayes=0.02,autole
arn=no

 

/root/weeklymail/Sunmaillog:Aug 31 14:21:34 plesk3 spamd[27015]: spamd:
result: Y 5 -
BAYES_50,DIET_1,HTML_EXTRA_CLOSE,HTML_IMAGE_RATIO_08,HTML_MESSAGE,LOTS_OF_MO
NEY,RDNS_NONE,T_REMOTE_IMAGE
scantime=2.1,size=9289,user=qscand,uid=10124,required_score=5.0,rhost=localh
ost.localdomain,raddr=127.0.0.1,rport=54771,mid=URL
mailto:2095436507427320957529157...@hmv4drc.sheikargemamai.com
,bayes=0.480496,autolearn=no

 

/root/weeklymail/Sunmaillog:Aug 31 16:07:21 plesk3 spamd[12813]: spamd:
result: . 4 -
BAYES_20,HTML_EXTRA_CLOSE,HTML_IMAGE_RATIO_08,HTML_MESSAGE,RDNS_NONE,T_REMOT
E_IMAGE
scantime=1.1,size=8535,user=qscand,uid=10124,required_score=5.0,rhost=localh
ost.localdomain,raddr=127.0.0.1,rport=44411,mid=URL
mailto:2097436507427320974651...@3nmrx8.spakerhmoner.com
,bayes=0.087335,autolearn=no

 

/root/weeklymail/Sunmaillog:Aug 31 18:07:59 plesk3
spamd[12813]: spamd: result: . 1 -
BAYES_50,HTML_EXTRA_CLOSE,HTML_MESSAGE,LOTS_OF_MONEY,RDNS_NONE
scantime=1.4,size=7946,user=qscand,uid=10124,required_score=5.0,rhost=localh
ost.localdomain,raddr=127.0.0.1,rport=45934,mid=URL
mailto:2099436507427320995837246...@mdi9hj1.tyttacekory.com
,bayes=0.50,autolearn=no



Re: Image spam

2013-09-02 Thread John Hardin

On Mon, 2 Sep 2013, emailitis.com wrote:

Here's something else to look into:


/root/weeklymail/Sunmaillog:Aug 31 11:11:05 plesk3 spamd[11160]: spamd:
result: . 0 - BAYES_00

/root/weeklymail/Sunmaillog:Aug 31 14:21:34 plesk3 spamd[27015]: spamd:
result: Y 5 - BAYES_50

/root/weeklymail/Sunmaillog:Aug 31 16:07:21 plesk3 spamd[12813]: spamd:
result: . 4 - BAYES_20

/root/weeklymail/Sunmaillog:Aug 31 18:07:59 plesk3 spamd[12813]: spamd:
result: . 1 - BAYES_50


I could see the BAYES_50s if there was little else other than an image 
link in the message, and the spam campaign was something new, but BAYES_20 
and especially BAYES_00?


Standard Bayes questions:

How do you train? Manually, automatically, or both?

If you train manually, who contributes? Are the contributions reviewed 
prior to training?


Do you retain your manual training corpus to review, and for initial 
retraining if Bayes goes completely off the rails?


Non-Bayes questions: are you using greylisting? It really cuts down on the 
garbage. Are you doing MTA SMTP-time DNSBL filtering using ZEN? It's very 
reliable and appears to have ~30% spam-only overlap with __REMOTE_IMAGE.


Suggestion: a meta of __REMOTE_IMAGE and LOTS_OF_MONEY might help, 
assuming you don't have a lot of ham that hits both rules.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Yet another example of a Mexican doing a job Americans are
  unwilling to do.   -- Reno Sepulveda, on UniVision reporters asking
President Obama some pointed questions about
the BATFE Fast and Furious scandal.
---
 458 days since the first successful private support mission to ISS (SpaceX)


RE: Image spam

2013-09-02 Thread emailitis.com
Thanks John,

 Standard Bayes questions:
 
 How do you train? Manually, automatically, or both?
Automatically.  Recently I am manually training on Spam that I receive to
about 10 email addresses of our own like the ones shown but not sure how
much difference that is making.  I THINK I used to get even more BAYES_00 so
maybe it is working.  But some Spam-heavy mailboxes are not ours and we
would not be able to train the owner how to do the training.  And I have
been doing only Spam, not Ham, training.
I expect that in the dim and distant past, we did not do as much 

 If you train manually, who contributes? Are the contributions reviewed
prior
 to training?
 
 Do you retain your manual training corpus to review, and for initial
retraining
 if Bayes goes completely off the rails?
Not sure how easily we could make it for our clients to assist with manual
training - I suspect they would not have the time or knowledge or
inclination so to do.

 Do you retain your manual training corpus to review, and for initial
retraining
 if Bayes goes completely off the rails?
No, we do not have this sadly.  In the past we only ever let SA do the
automatic training so I guess it was not perfect.  But even with a re-train
I am not sure how we could capture emails being sent to clients which are
Spam.

 Non-Bayes questions: are you using greylisting? It really cuts down on the
 garbage. Are you doing MTA SMTP-time DNSBL filtering using ZEN? It's very
 reliable and appears to have ~30% spam-only overlap with
 __REMOTE_IMAGE.
No, we cancelled it because the delay was causing some issues but we will
look to re-activating that.

 
 Suggestion: a meta of __REMOTE_IMAGE and LOTS_OF_MONEY might help,
 assuming you don't have a lot of ham that hits both rules.
Thank you for that suggestion which I will put in place.  Only one today
that met both criteria and that was Spam!  And it got through with a score
of 4.2!

Kind regards,
Christoph


 -Original Message-
 From: John Hardin [mailto:jhar...@impsec.org]
 Sent: 02 September 2013 08:01
 To: users@spamassassin.apache.org
 Subject: Re: Image spam
 
 On Mon, 2 Sep 2013, emailitis.com wrote:
 
 Here's something else to look into:
 
  /root/weeklymail/Sunmaillog:Aug 31 11:11:05 plesk3 spamd[11160]:
 spamd:
  result: . 0 - BAYES_00
 
  /root/weeklymail/Sunmaillog:Aug 31 14:21:34 plesk3 spamd[27015]:
 spamd:
  result: Y 5 - BAYES_50
 
  /root/weeklymail/Sunmaillog:Aug 31 16:07:21 plesk3 spamd[12813]:
 spamd:
  result: . 4 - BAYES_20
 
  /root/weeklymail/Sunmaillog:Aug 31 18:07:59 plesk3 spamd[12813]:
 spamd:
  result: . 1 - BAYES_50
 
 I could see the BAYES_50s if there was little else other than an image
link in
 the message, and the spam campaign was something new, but BAYES_20
 and especially BAYES_00?
 
 Standard Bayes questions:
 
 How do you train? Manually, automatically, or both?
 
 If you train manually, who contributes? Are the contributions reviewed
prior
 to training?
 
 Do you retain your manual training corpus to review, and for initial
retraining
 if Bayes goes completely off the rails?
 
 Non-Bayes questions: are you using greylisting? It really cuts down on the
 garbage. Are you doing MTA SMTP-time DNSBL filtering using ZEN? It's very
 reliable and appears to have ~30% spam-only overlap with
 __REMOTE_IMAGE.
 
 Suggestion: a meta of __REMOTE_IMAGE and LOTS_OF_MONEY might help,
 assuming you don't have a lot of ham that hits both rules.
 
 
 --
   John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
   jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
   key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
 ---
Yet another example of a Mexican doing a job Americans are
unwilling to do.   -- Reno Sepulveda, on UniVision reporters asking
  President Obama some pointed questions about
  the BATFE Fast and Furious scandal.
 ---
   458 days since the first successful private support mission to ISS
(SpaceX)



New type of image spam

2012-06-12 Thread Joseph Brennan


Seen: Spam using an INPUT tag, type=image, instead of an IMG tag. There is
no form tag, so clicking does nothing, but the image loads to screen. Below
is the complete body of a sample (included here since it is very short).

The string after id= varies per sample. I munged it here to ''.

The image is a picture of text written in Chinese.

Joseph Brennan
Columbia University Information Technology



!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN
HTMLHEAD
META content=text/html; charset=utf-8 http-equiv=Content-Type
META name=GENERATOR content=MSHTML 8.00.6001.23181/HEAD
BODYINPUT id= border=0 
src=http://img04.taobaocdn.com/imgextra/i4/167488816/T2tRdHXgXM_!!167488816.gif; 
type=image

/BODY/HTML





Re: Large image spam

2012-06-07 Thread JP Kelly
Hmm...
can you explain further?

 sha256 checksum and add to local clamav (.hb?) file?


On May 29, 2012, at 12:47 PM, Michael Scheidell wrote:

 On 5/29/12 2:44 PM, JP Kelly wrote:
 I've been getting a fair amount of spam which contains a large image which 
 causes SA to bypass scanning due to the large file size.
 Has anyone found a way to combat these types of spam?
 JP Kelly
 sha256 checksum and add to local clamav (.hb?) file?
 
 
 -- 
 Michael Scheidell, CTO
 o: 561-999-5000
 d: 561-948-2259
 *| *SECNAP Network Security Corporation
 
 * Best Mobile Solutions Product of 2011
 * Best Intrusion Prevention Product
 * Hot Company Finalist 2011
 * Best Email Security Product
 * Certified SNORT Integrator
 
 __
 This email has been scanned and certified safe by SpammerTrap(r). For 
 Information please see http://www.spammertrap.com/
 __   



Large image spam

2012-05-29 Thread JP Kelly
I've been getting a fair amount of spam which contains a large image which 
causes SA to bypass scanning due to the large file size.
Has anyone found a way to combat these types of spam?
JP Kelly

Re: Large image spam

2012-05-29 Thread Michael Scheidell

On 5/29/12 2:44 PM, JP Kelly wrote:

I've been getting a fair amount of spam which contains a large image which 
causes SA to bypass scanning due to the large file size.
Has anyone found a way to combat these types of spam?
JP Kelly

sha256 checksum and add to local clamav (.hb?) file?


--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
*| *SECNAP Network Security Corporation

 * Best Mobile Solutions Product of 2011
 * Best Intrusion Prevention Product
 * Hot Company Finalist 2011
 * Best Email Security Product
 * Certified SNORT Integrator

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.spammertrap.com/
__  
 


RE: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-21 Thread Sharma, Ashish
David, 

[We don't use OCR, as it happens.  We usually catch image spams anyway
using other techniques.]

Can you please outline the other techniques that you use to catch image spams?

Thanks
Ashish Sharma

-Original Message-
From: David F. Skoll [mailto:d...@roaringpenguin.com] 
Sent: Thursday, July 21, 2011 7:50 AM
To: users@spamassassin.apache.org
Subject: Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

On Wed, 20 Jul 2011 21:18:48 -0400
dar...@chaosreigns.com wrote:

 It still seems strange to me that anybody has ever bothered with
 using OCR to deal with image spam, when it's so easy, and for me not
 problematic, to just block all emails that might be image spam -
 those with an attached image that is embedded in the body of an html
 mail.

We receive many legitimate [sic] emails that use an embedded image
in that way.  Lots of companies think it's really cool to include their
logo in a .sig :(

 I've been very happily using this since 2006, and it completely made
 image spam go away.

Is this on a business account where it's critical for you to accept
email from... ahem... somewhat less-than-knowledgeable people?

 Inlined attached images are not a feature that I find anywhere near
 worth having enough to justify needing to OCR image spam.

Unfortunately, we can't block those.  The FP rate for us would be
horrendous.

[We don't use OCR, as it happens.  We usually catch image spams anyway
using other techniques.]

Regards,

David.



RE: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-21 Thread Sharma, Ashish
All,

The current functionality requires me to receive mails that contains image and 
process them.

So I want a good tool to deal with image spam.

Please suggest some.

Thanks
Ashish Sharma

-Original Message-
From: Jason Bertoch [mailto:ja...@i6ix.com] 
Sent: Thursday, July 21, 2011 8:03 AM
To: users@spamassassin.apache.org
Subject: Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

On 7/20/2011 9:18 PM, dar...@chaosreigns.com wrote:
 On 07/20, Sharma, Ashish wrote:
 Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image 
 spam?
 It still seems strange to me that anybody has ever bothered with using OCR
 to deal with image spam, when it's so easy, and for me not problematic, to
 just block all emails that might be image spam - those with an attached
 image that is embedded in the body of an html mail.

 Inlined attached images are not a feature that I find anywhere near worth
 having enough to justify needing to OCR image spam.


Image spam was a huge deal when it first came out, and there were 
several sources scrambling to offer a solution, including resources to 
involve Bayes on the decoded text.  Those worked well enough to deter, 
for the time-being anyway, that method of spamming.

That said, while I agree with your sentiment toward inline images and 
HTML mail in general, they are a common business practice and many folks 
simply can't use the outright block method.

At my last job, I eventually found that image-spam dropped to such a 
significant low that I didn't need OCR anymore but was still required to 
allow inline images through.

/Jason


Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-21 Thread Axb

http://wiki.apache.org/spamassassin/UnmaintainedCustomPlugins

OCR scanner and image validator SA-plugin

OCR Plugin

may be worth a try.. no idea how well they work

sarcasm
The Spamassassin wiki is so cool
/sarcasm


On 2011-07-21 8:53, Sharma, Ashish wrote:

All,

The current functionality requires me to receive mails that contains image and 
process them.

So I want a good tool to deal with image spam.

Please suggest some.

Thanks
Ashish Sharma

-Original Message-
From: Jason Bertoch [mailto:ja...@i6ix.com]
Sent: Thursday, July 21, 2011 8:03 AM
To: users@spamassassin.apache.org
Subject: Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

On 7/20/2011 9:18 PM, dar...@chaosreigns.com wrote:

On 07/20, Sharma, Ashish wrote:

Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image 
spam?

It still seems strange to me that anybody has ever bothered with using OCR
to deal with image spam, when it's so easy, and for me not problematic, to
just block all emails that might be image spam - those with an attached
image that is embedded in the body of an html mail.

Inlined attached images are not a feature that I find anywhere near worth
having enough to justify needing to OCR image spam.



Image spam was a huge deal when it first came out, and there were
several sources scrambling to offer a solution, including resources to
involve Bayes on the decoded text.  Those worked well enough to deter,
for the time-being anyway, that method of spamming.

That said, while I agree with your sentiment toward inline images and
HTML mail in general, they are a common business practice and many folks
simply can't use the outright block method.

At my last job, I eventually found that image-spam dropped to such a
significant low that I didn't need OCR anymore but was still required to
allow inline images through.

/Jason




Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-21 Thread David F. Skoll
On Thu, 21 Jul 2011 07:47:00 +0100
Sharma, Ashish ashish.shar...@hp.com wrote:

 Can you please outline the other techniques that you use to catch
 image spams?

We find Bayes (we have our own implementation) and RBLs (again, we have
our own) work pretty well.

Regards,

David.


Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-21 Thread Kris Deugau

dar...@chaosreigns.com wrote:

On 07/20, Sharma, Ashish wrote:

Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image 
spam?


It still seems strange to me that anybody has ever bothered with using OCR
to deal with image spam, when it's so easy, and for me not problematic, to
just block all emails that might be image spam - those with an attached
image that is embedded in the body of an html mail.


I have to ask - have you ever tried this in the context of an ISP mail 
system?


A great many users consider sending pictures and videos by email to be 
the ultimate purpose of email...  and many of the same set of users take 
great delight in (ab)using Outlook's stationery or using Incredimail, 
as well as overdosing on funny fonts and colours in the text.


-kgd


Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread Sharma, Ashish
Hi,

I am currently using FuzzyOCR(3.6.0) for image spam control on my 
Spamassassin(3.3.1) stack.

The FuzzyOCR parent location (http://fuzzyocr.own-hero.net/wiki/Downloads) 
suggests the above FuzzyOCR is available only for testing on Spamassassin 3.2.x 

Somehow I am running this version of FuzzyOCR for my Spamassassin stack.

Lately I am not convinced with FuzzyOCR performance and the errors that I keep 
getting on it. 

Moreover the community support and active development on FuzzyOCR too seems to 
be missing.

Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image 
spam?

Thanks
Ashish Sharma


Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread darxus
On 07/20, Sharma, Ashish wrote:
 Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image 
 spam?

It still seems strange to me that anybody has ever bothered with using OCR
to deal with image spam, when it's so easy, and for me not problematic, to
just block all emails that might be image spam - those with an attached
image that is embedded in the body of an html mail.

In my postfix main.cf I have:
body_checks = pcre:/etc/postfix/body_checks
And that file just contains:
/\bsrc\s*=(?:3D)?\s*[']?cid:/ REJECT Your email was rejected because you 
embedded an attached image in the body.

So if somebody ever sends me a legit email with an inlined attached image,
they'll still get an error, without me causing any backscatter.

My mom was annoyed that she couldn't use some tool to decorate her emails
to me with garbage, but... that doesn't qualify as a negative for me.

I've been very happily using this since 2006, and it completely made image
spam go away.

People can still send me images attached to emails, and they can still send
me emails with images embedded in the body of html emails as long as they're
hosted on a web server and not attached.  It only gets rejected if the
image is attached *and* embedded in the body of the email.

Inlined attached images are not a feature that I find anywhere near worth
having enough to justify needing to OCR image spam.

-- 
I finally figured out the only reason to be alive is to enjoy it.
- Rita Mae Brown
http://www.ChaosReigns.com


Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread David F. Skoll
On Wed, 20 Jul 2011 21:18:48 -0400
dar...@chaosreigns.com wrote:

 It still seems strange to me that anybody has ever bothered with
 using OCR to deal with image spam, when it's so easy, and for me not
 problematic, to just block all emails that might be image spam -
 those with an attached image that is embedded in the body of an html
 mail.

We receive many legitimate [sic] emails that use an embedded image
in that way.  Lots of companies think it's really cool to include their
logo in a .sig :(

 I've been very happily using this since 2006, and it completely made
 image spam go away.

Is this on a business account where it's critical for you to accept
email from... ahem... somewhat less-than-knowledgeable people?

 Inlined attached images are not a feature that I find anywhere near
 worth having enough to justify needing to OCR image spam.

Unfortunately, we can't block those.  The FP rate for us would be
horrendous.

[We don't use OCR, as it happens.  We usually catch image spams anyway
using other techniques.]

Regards,

David.



Re: Suggest OCR plugin on Spamassassin 3.3.1 for image spam

2011-07-20 Thread Jason Bertoch

On 7/20/2011 9:18 PM, dar...@chaosreigns.com wrote:

On 07/20, Sharma, Ashish wrote:

Can someone suggest some better OCR plugin for Spamassassin 3.3.1 for image 
spam?

It still seems strange to me that anybody has ever bothered with using OCR
to deal with image spam, when it's so easy, and for me not problematic, to
just block all emails that might be image spam - those with an attached
image that is embedded in the body of an html mail.

Inlined attached images are not a feature that I find anywhere near worth
having enough to justify needing to OCR image spam.



Image spam was a huge deal when it first came out, and there were 
several sources scrambling to offer a solution, including resources to 
involve Bayes on the decoded text.  Those worked well enough to deter, 
for the time-being anyway, that method of spamming.


That said, while I agree with your sentiment toward inline images and 
HTML mail in general, they are a common business practice and many folks 
simply can't use the outright block method.


At my last job, I eventually found that image-spam dropped to such a 
significant low that I didn't need OCR anymore but was still required to 
allow inline images through.


/Jason


Re: pill image spam learns to walk

2010-01-12 Thread Matus UHLAR - fantomas
 Ted Mittelstaedt wrote on Mon, 11 Jan 2010 15:27:07 -0800:
 It simply means that sites WITHOUT a PTR are still fully compliant mailers.

 Kai Schaetzl wrote:
 This has nothing to do with RFC-compliance, but with policy, well 
 accepted policy. 

On 11.01.10 20:42, Ted Mittelstaedt wrote:
 Policy that should be handled in SA and not the MTA, which I've said  
 twice now.

It would not be a policy then. There are sites/admins who enforce this
policy at SMTP level. And it's their decision.

If you don't have any, better do not complain to those policy makers but to
your ISP.

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows 2000: 640 MB ought to be enough for anybody


Re: pill image spam learns to walk

2010-01-12 Thread Mike Cardwell

On 12/01/2010 06:28, Chip M. wrote:


Presently it renders them as plain text. I'm fully aware of the
potential problems with it. Ideally I'd like to be able to render
those parts as HTML, but I need to be 100% sure that I've stripped
out anything dangerous (including embedded remote content by
default) first. It's on the ToDo List page.


Nice job Mike! :)

I wrestled with that same issue when I added direct viewing of HTML
content to my offline analysis/FP-pipeline/MassChecks tool.

Originally, I was using an ActiveX wrapper around IE, which (of
course) made me nervous.  I added some VERY simple, crude tag
stripping (script, iframe, style), but was never happy with it.
I ended up switching to an open source HTML rendering component
which :) lacked support for all the scary stuff.

Whatever you decide to do, please do post more about it, and q'pla!


I shall. There are a multitude of modules on cpan for fixing up html and 
stripping out tags. I just need to find time to test them. I've got to 
figure out how to cleanse the CSS as well. Eg, you can execute 
javascript from CSS with stuff like: 
background:url(javascript:someFunction();)



I'm also aware of the issues surrounding people potentially
uploading images and then linking to them from spam websites or
spam. That's why I've put http referer restrictions in place.


Perhaps redirecting to an image saying something like
this is spam? :)


Then people couldn't share direct links to email parts such as images. 
For example, if I went to http://spamalyser.com/v/6xnb26gp/ and clicked 
on the image, it would give me a direct link to the image. I might then 
IM that link to somebody. When they click on the URL, the referer wont 
be valid and I don't want it to display a This is spam image. So what 
it does is redirect you back to http://spamalyser.com/v/6xnb26gp/ and 
jump to the point on the page where the image is displayed. It's a 
little difficult to explain.



What about requiring registration?  Yes, it's not enough to
stop the most determined, but will whittle it down to the least
stupid.


Requiring registration in order to paste emails wont get rid of the 
problem. Requiring registration in order to read the pasted emails would 
completely solve the problem, however I think that would also stop most 
people from using the service. I'm trying to keep it simple.


Anywho, this is probably getting off topic now.

--
Mike Cardwell: UK based IT Consultant, LAMP developer, Linux admin
Cardwell IT Ltd. : UK Company - http://cardwellit.com/   #06920226
Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
Spamalyser   : Spam Tool  - http://spamalyser.com/


Re: pill image spam learns to walk

2010-01-12 Thread Henrik K
On Tue, Jan 12, 2010 at 10:15:32AM +, Mike Cardwell wrote:
 On 12/01/2010 06:28, Chip M. wrote:

 Presently it renders them as plain text. I'm fully aware of the
 potential problems with it. Ideally I'd like to be able to render
 those parts as HTML, but I need to be 100% sure that I've stripped
 out anything dangerous (including embedded remote content by
 default) first. It's on the ToDo List page.

 Nice job Mike! :)

 I wrestled with that same issue when I added direct viewing of HTML
 content to my offline analysis/FP-pipeline/MassChecks tool.

 Originally, I was using an ActiveX wrapper around IE, which (of
 course) made me nervous.  I added some VERY simple, crude tag
 stripping (script, iframe, style), but was never happy with it.
 I ended up switching to an open source HTML rendering component
 which :) lacked support for all the scary stuff.

 Whatever you decide to do, please do post more about it, and q'pla!

 I shall. There are a multitude of modules on cpan for fixing up html and  
 stripping out tags. I just need to find time to test them. I've got to  
 figure out how to cleanse the CSS as well. Eg, you can execute  
 javascript from CSS with stuff like:  
 background:url(javascript:someFunction();)

IMO whatever you do, there will always be some hole to be found. Your only
safe option is to render the HTML into image and display that. It will also
be always consistent and not depend on browser version.



Re: pill image spam learns to walk

2010-01-12 Thread Kai Schaetzl
Ted, sorry, but your case is lost (since long, look around) and I won't 
bite in such an off-topic discussion here. Please stop telling others that 
refusing to accept mail from non-rDNS machines is incorrect. If you 
*prefer* to handle this at SA level, that's your choice and you can tell 
that. But stop saying in this authoritative way that it is the only 
reputable (=correct) way. It is definitely not.

My last bits on this topic.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





[OT] spamalyser, was Re: pill image spam learns to walk

2010-01-12 Thread Mike Cardwell
On 12/01/2010 10:24, Henrik K wrote:

 Presently it renders them as plain text. I'm fully aware of the
 potential problems with it. Ideally I'd like to be able to render
 those parts as HTML, but I need to be 100% sure that I've stripped
 out anything dangerous (including embedded remote content by
 default) first. It's on the ToDo List page.

 Nice job Mike! :)

 I wrestled with that same issue when I added direct viewing of HTML
 content to my offline analysis/FP-pipeline/MassChecks tool.

 Originally, I was using an ActiveX wrapper around IE, which (of
 course) made me nervous.  I added some VERY simple, crude tag
 stripping (script, iframe, style), but was never happy with it.
 I ended up switching to an open source HTML rendering component
 which :) lacked support for all the scary stuff.

 Whatever you decide to do, please do post more about it, and q'pla!

 I shall. There are a multitude of modules on cpan for fixing up html and  
 stripping out tags. I just need to find time to test them. I've got to  
 figure out how to cleanse the CSS as well. Eg, you can execute  
 javascript from CSS with stuff like:  
 background:url(javascript:someFunction();)
 
 IMO whatever you do, there will always be some hole to be found. Your only
 safe option is to render the HTML into image and display that. It will also
 be always consistent and not depend on browser version.

That was a good suggestion and something I hadn't considered. I've
updated Spamalyser to generate PDFs from HTML parts using the WebKit
rendering engine and QT. So the HTML should look the same as on any
Webkit based user agent. From my tests so far, it's an accurate
representation of what you see in your email client. It handles remote
content like images and CSS fine, and also content attached to the email
with Content-ID headers references by cid URIs. Here's a prime example:
http://spamalyser.com/v/jfv3iz0l/mime#part_1.2

PDF is better than an image because it allows you to maintain the links
in the document. A PNG thumbnail generated from the PDF is displayed
along side text/html parts. Clicking that preview image takes you to the
PDF.

I've also tweaked some of the styling so the headers are easier to read.

I've also set up a mailman based mailing list which is linked to from
http://spamalyser.com/ so if anyone wants to discuss anything further to
do with Spamalyser the discussion should probably move there. Any
further announcements will happen there, not here.

-- 
Mike Cardwell: UK based IT Consultant, LAMP developer, Linux admin
Cardwell IT Ltd. : UK Company - http://cardwellit.com/   #06920226
Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
Spamalyser   : Spam Tool  - http://spamalyser.com/


Re: [OT] spamalyser, was pill image spam learns to walk

2010-01-12 Thread Kai Schaetzl
Mike Cardwell wrote on Tue, 12 Jan 2010 20:22:44 +:

 It handles remote
 content like images and CSS fine

tip: I would not handle remote content at all as this may lead to account 
verification.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





pill image spam learns to walk

2010-01-11 Thread Jason Haar
Hi there

We've been getting a few of these leaking through in the past couple of
weeks.

http://pastebin.com/m574da717

They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick they
pull. Has anyone come up with a way to fight these? (I've actually added
all the phrases that occur in this image to FuzzyOCR - didn't help)


Thanks

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



Re: pill image spam learns to walk

2010-01-11 Thread --[ UxBoD ]--
- Mike Cardwell spamassassin-us...@lists.grepular.com wrote:

| On 11/01/2010 10:22, Jason Haar wrote:
|  Hi there
| 
|  We've been getting a few of these leaking through in the past couple
| of
|  weeks.
| 
|  http://pastebin.com/m574da717
| 
|  They aren't triggering (enough) network rule matches, contain a
|  bayes-killer, and even FuzzyOCR can't manage the swirly image trick
| they
|  pull. Has anyone come up with a way to fight these? (I've actually
| added
|  all the phrases that occur in this image to FuzzyOCR - didn't help)
| 
| I just copied and pasted that out of pastebin into a little project
| I've 
| been working on. Here's the result:
| 
| http://spamalyser.com/v/6xnb26gp/mime
| 
| Unlike with pastebin, it mime decodes emails and you can see the
| decoded 
| image at the bottom of that page.
| 

That is awesome, Mike! really helps to visualise.

--
Thanks - Phil


Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
scores these new tests on 3.3.0

*  1.0 FORGED_TBIRD_IMG_SIZE Likely forged Thunderbird image spam
*  1.0 FORGED_TBIRD_IMG_ARROW Likely forged Thunderbird image spam

and you could add, say 4.0, for each mail coming thru your SF.net alias 
and not coming from SF.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Charles Gregory
On Mon, 11 Jan 2010, Mike Cardwell wrote:
: I just copied and pasted that out of pastebin into a little project I've 
: been working on. Here's the result:
: http://spamalyser.com/v/6xnb26gp/mime

Question: What does spamalyzer do with an HTML message part?
It is of concern (naturally) that implanted malicious scripts not be 
rendered whole and complete 

- C


Re: pill image spam learns to walk

2010-01-11 Thread Mike Cardwell

On 11/01/2010 14:55, Charles Gregory wrote:

On Mon, 11 Jan 2010, Mike Cardwell wrote:
: I just copied and pasted that out of pastebin into a little project I've
: been working on. Here's the result:
: http://spamalyser.com/v/6xnb26gp/mime

Question: What does spamalyzer do with an HTML message part?
It is of concern (naturally) that implanted malicious scripts not be
rendered whole and complete


Presently it renders them as plain text. I'm fully aware of the 
potential problems with it. Ideally I'd like to be able to render those 
parts as HTML, but I need to be 100% sure that I've stripped out 
anything dangerous (including embedded remote content by default) first. 
It's on the ToDo List page.


I'm also aware of the issues surrounding people potentially uploading 
images and then linking to them from spam websites or spam. That's why 
I've put http referer restrictions in place.


--
Mike Cardwell: UK based IT Consultant, LAMP developer, Linux admin
Cardwell IT Ltd. : UK Company - http://cardwellit.com/   #06920226
Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
Spamalyser   : Spam Tool  - http://spamalyser.com/


Re: pill image spam learns to walk

2010-01-11 Thread Terry Carmen

On 01/11/2010 05:22 AM, Jason Haar wrote:

Hi there

We've been getting a few of these leaking through in the past couple of
weeks.

http://pastebin.com/m574da717

They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick they
pull. Has anyone come up with a way to fight these? (I've actually added
all the phrases that occur in this image to FuzzyOCR - didn't help)
Unless you changed the headers, it looks like it came from an IP with no 
reverse DNS entry.


This is easy enough to stop dead in it's tracks at your MTA. If there 
isn't any reverse DNS, the chances of it being a legitimate mail server 
are pretty slim.


Terry



Re: pill image spam learns to walk

2010-01-11 Thread Alex
Hi,

 Unless you changed the headers, it looks like it came from an IP with no
 reverse DNS entry.

 This is easy enough to stop dead in it's tracks at your MTA. If there isn't
 any reverse DNS, the chances of it being a legitimate mail server are pretty
 slim.

Yes, but not enough to categorically block all incoming mail based on
that, though. At least in my environment, all it would take is one
customer to call and complain, and force me to have to do even more
work to make them an exception and exclude them from this filter.

Thanks,
Alex


Re: pill image spam learns to walk

2010-01-11 Thread Alex
HI,

        *  1.0 FORGED_TBIRD_IMG_SIZE Likely forged Thunderbird image spam
        *  1.0 FORGED_TBIRD_IMG_ARROW Likely forged Thunderbird image spam

 and you could add, say 4.0, for each mail coming thru your SF.net alias
 and not coming from SF.

Just to clarify, you're referring to this, right:

Received: from mx.sourceforge.net by mailsrv1.trimble.co.nz
(envelope-from f...@ef-

How would add the rule you are suggesting? It would be specific to
sourceforge.net, and have a table where its authoritative IP and MX
are stored, right?

Thanks,
Alex


Re: pill image spam learns to walk

2010-01-11 Thread Ted Mittelstaedt

Terry Carmen wrote:

On 01/11/2010 05:22 AM, Jason Haar wrote:

Hi there

We've been getting a few of these leaking through in the past couple of
weeks.

http://pastebin.com/m574da717

They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick they
pull. Has anyone come up with a way to fight these? (I've actually added
all the phrases that occur in this image to FuzzyOCR - didn't help)
Unless you changed the headers, it looks like it came from an IP with no 
reverse DNS entry.


This is easy enough to stop dead in it's tracks at your MTA. If there 
isn't any reverse DNS, the chances of it being a legitimate mail server 
are pretty slim.




This is the WRONG way to do this - it amazes me that in 2010 on an
anti-spam mailing list that we have people making such statements.

The SMTP RFC 2821 does NOT mandate the existence of a PTR record for an 
SMTP sender.  The DNS RFC 1912 also does not mandate a corresponding PTR
for a mailserver hostname.  Implies, yes, but there's no requirement. 
There is a very good reason for this.*  Blocking at the MTA based on the 
lack of a PTR record is incorrect.  The correct way is to assign a spam 
score in SA to hosts lacking a PTR, the same way you do to mail that 
contains HTML, etc.


Ted

* The reason this is NOT mandated anywhere is because if it was then
sites running multiple mailing domains on a single server could easily
overflow the DNS UDP packet space with a list of PTR's for the server - 
causing the resolver to exceed 512 bytes on the DNS UDP response, or

causing a switch to TCP - either of which can break some firewalls.
For example the Cisco PIX came standard out-of-the-box with a DNS
filter that blocked DNS UDP packets larger than 512.




Re: pill image spam learns to walk

2010-01-11 Thread Terry Carmen

On 01/11/2010 12:42 PM, Ted Mittelstaedt wrote:

Terry Carmen wrote:

On 01/11/2010 05:22 AM, Jason Haar wrote:

Hi there

We've been getting a few of these leaking through in the past couple of
weeks.

http://pastebin.com/m574da717

They aren't triggering (enough) network rule matches, contain a
bayes-killer, and even FuzzyOCR can't manage the swirly image trick 
they
pull. Has anyone come up with a way to fight these? (I've actually 
added

all the phrases that occur in this image to FuzzyOCR - didn't help)
Unless you changed the headers, it looks like it came from an IP with 
no reverse DNS entry.


This is easy enough to stop dead in it's tracks at your MTA. If there 
isn't any reverse DNS, the chances of it being a legitimate mail 
server are pretty slim.




This is the WRONG way to do this - it amazes me that in 2010 on an
anti-spam mailing list that we have people making such statements.

The SMTP RFC 2821 does NOT mandate the existence of a PTR record for 
an SMTP sender.  The DNS RFC 1912 also does not mandate a 
corresponding PTR
for a mailserver hostname.  Implies, yes, but there's no requirement. 
There is a very good reason for this.*  Blocking at the MTA based on 
the lack of a PTR record is incorrect.  The correct way is to assign a 
spam score in SA to hosts lacking a PTR, the same way you do to mail 
that contains HTML, etc.


SA is great software, but scanning is not a lightweight process. If I 
can ditch millions of spams before they ever hit SA, and need to 
manually whitelist a couple of IPs, that's a great deal as far as I'm 
concerned.


Every reasonable ISP I've seen has managed to assign a PTR record for 
their mail server. I don't care if it exactly every (or any) domain they 
transport mail for, as long as it exists. Sure, it's possible to break 
things if you work at it hard enough, but generally speaking, I don't care.


Terry













Re: pill image spam learns to walk

2010-01-11 Thread Terry Carmen

On 01/11/2010 12:57 PM, Terry Carmen wrote:
exactly every (or any) domain 


Should be exactly *matches* every (or any) domain

--
Terry Carmen
CNY Support, LLC

315.382.3939
http://cnysupport.com



Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
Terry Carmen wrote on Mon, 11 Jan 2010 12:08:16 -0500:

 Unless you changed the headers, it looks like it came from an IP with no 
 reverse DNS entry.

Yeah, his own delivery chain. Not really a candidate for blocking ;-)

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
Ted Mittelstaedt wrote on Mon, 11 Jan 2010 09:42:25 -0800:

 This is the WRONG way to do this

It's the right way. The FP rate is almost zero and it encourages the few 
offending ones to quickly add rDNS, really quick.

 * The reason this is NOT mandated anywhere is because if it was then
 sites running multiple mailing domains on a single server could easily
 overflow the DNS UDP packet space with a list of PTR's for the server -

We are not talking about adding PTR for all domains, just for exactly 
*one*. And that doesn't even need to resolve back and forth.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
Alex wrote on Mon, 11 Jan 2010 12:38:29 -0500:

 Just to clarify, you're referring to this, right:
 
 Received: from mx.sourceforge.net by mailsrv1.trimble.co.nz
 (envelope-from f...@ef-
 
 How would add the rule you are suggesting? It would be specific to
 sourceforge.net, and have a table where its authoritative IP and MX
 are stored, right?

I would rather look for To: *...@users.sourceforge.net and score if the From 
is not from sourceforge.net (meta-rule). This indicates that it is an 
external mail that was sent to an SF users alias. I've personally not ever 
gotten a legitimate mail to this alias from outside of SF (and I think SF 
admin/dev/news mail uses the target address directly, anyway, and not the 
alias). So, depending on what you get over this route you may either score 
or drop completely.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Ted Mittelstaedt

Kai Schaetzl wrote:

Ted Mittelstaedt wrote on Mon, 11 Jan 2010 09:42:25 -0800:


This is the WRONG way to do this


It's the right way. The FP rate is almost zero and it encourages the few 
offending ones to quickly add rDNS, really quick.



* The reason this is NOT mandated anywhere is because if it was then
sites running multiple mailing domains on a single server could easily
overflow the DNS UDP packet space with a list of PTR's for the server -


We are not talking about adding PTR for all domains, just for exactly 
*one*. And that doesn't even need to resolve back and forth.




Clearly you fail to understand anything, here.

PTR's are not mandated because the standard has to apply to all sites,
both sites with multiple domains and sites without.  It does not mean
that because it's not mandated that it's a bad idea to add a PTR record.
It simply means that sites WITHOUT a PTR are still fully compliant mailers.

The entire point of SA is to filter based on fuzzy logic, meaning
that the sender's mail is only wrong based on an arbitrary standard that
the person running SA pulls out of their ass.  A no PTR rule is
EXACTLY the kind of fuzzy decision that SA is designed to make decisions
on.  That is where that kind of rule belongs.

Your advice is kind of like the guy who puts a spoiler on a sports
car that is never driven faster than 100mph.  The spoiler, Spamassassin
in this case, is an expensive, gas-mileage sucking dunsel that is only 
there because of the bragging rights the guy gets by having it there,

it does absolutely nothing to help the car.  In fact, anyone who knows
anything about fast cars, looks at the thing and thinks how gay is
that? and what a moron the idiot driving it is.

If you want to build a mailserver WITHOUT SA, then sure, go ahead and
add in rules like no PTR to the MTA - because you cannot do it any
other way.

But don't spend the money and CPU cycles putting SA on a mailserver
and then have it sit there doing nothing, like that spoiler on the
ass-end of a trans-am.

In other words, be a professional not a bozo!

Ted


Re: pill image spam learns to walk - best way to block it - hostkarma

2010-01-11 Thread Marc Perkel
For what it's worth my Lunk Email Filter service block 100% of virus 
generated spam such as this pill image spam. But anyone can tap into 
this for free by doing 2 things.


First - add tarbaby.junkemailfilter.com as you highest numbered MX record.

Second - use the hostkarma.junkemailfilter.com black list.

To be really effective you need to do both. Bot spam tends to spam all 
MX records and focuses on the highest MX. So using us as the highest MX 
lets us harvest your spam bot info. Then when you use our black list - 
it's tuned to the spambots that are spamming you. So it becomes even 
more effective.


And spam attempts to our tarbaby server is a spam that you're not 
getting not need to use your resources to block. So a significant amount 
of your spam will just go away.


We catch spam bots on the first attempt and within 2 minutes they are 
listed in our black list.


Here's the info on these lists:

http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists

Feel free to use it.



Re: pill image spam learns to walk

2010-01-11 Thread Kai Schaetzl
Ted Mittelstaedt wrote on Mon, 11 Jan 2010 15:27:07 -0800:

 It simply means that sites WITHOUT a PTR are still fully compliant mailers.

This has nothing to do with RFC-compliance, but with policy, well accepted 
policy. If you can't understand that I can't help. No need to shoot this out.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: pill image spam learns to walk

2010-01-11 Thread Ted Mittelstaedt

Kai Schaetzl wrote:

Ted Mittelstaedt wrote on Mon, 11 Jan 2010 15:27:07 -0800:


It simply means that sites WITHOUT a PTR are still fully compliant mailers.


This has nothing to do with RFC-compliance, but with policy, well accepted 
policy. 


Policy that should be handled in SA and not the MTA, which I've said 
twice now.



If you can't understand that I can't help.


You cannot help someone when you have no real grasp of the topic
under discussion.


No need to shoot this out.



Well, let's see.  I say it is wrong to tell people to make PTR
checks in the MTA when they have SA running, and to make them in
SA.  Then I explain why you shouldn't do them in the MTA and cite
facts to back up my statements.

You know you can't argue against facts, and you know your wrong,
and rather than just man up and admit it, you try to cover it
up by making the false claim that I am advising to not make PTR
checks at all.  You repeat this false claim multiple times to make 
yourself believe it, and maybe to attempt to get me to forget what I 
said, and adopt your false claim and start arguing for it.


No wonder you don't want to get into a shooting match.  You know
you were caught, and your outgunned.

Ted


Kai





  1   2   3   4   >