Re: DNSing MX to 127.0.0.1: Ruleset (or something) for this?

2006-08-21 Thread Dave Pooser
> Exim has a feature ignore_target_hosts which causes it to strip certain IP
> addresses from the list of MX hosts for a domain. I use it to block all
> abusive or unreachable MXs (listed below). This kicks in when Exim is
> doing address verification at SMTP time, for example "sender verify fail
> for <[EMAIL PROTECTED]>: all relevant MX records point to non-existent hosts"
> 
> 0.0.0.0/8# this net
> 10.0.0.0/8# RFC 1918
> 127.0.0.0/8# this host
> 169.254.0.0/16# link-local
> 172.16.0.0/12# RFC 1918
> 192.0.2.0/24# example net
> 192.168.0.0/16# RFC 1918
> 198.18.0.0/15# benchmark net
> 224.0.0.0/3# multicast & reserved
> 
> It would probably be good to augment this list with bogon or hijacked
> address space, but then it would be more work to keep up-to-date.

I do something similar for some host-based firewalls; I just grab
 via shell script on a
weekly basis and plug that bogon list into ipfw. At the moment that list
includes:
0.0.0.0/7
2.0.0.0/8
5.0.0.0/8
7.0.0.0/8
10.0.0.0/8
23.0.0.0/8
27.0.0.0/8
31.0.0.0/8
36.0.0.0/7
39.0.0.0/8
42.0.0.0/8
49.0.0.0/8
50.0.0.0/8
77.0.0.0/8
78.0.0.0/7
92.0.0.0/6
96.0.0.0/4
112.0.0.0/5
120.0.0.0/8
127.0.0.0/8
169.254.0.0/16
172.16.0.0/12
173.0.0.0/8
174.0.0.0/7
176.0.0.0/5
184.0.0.0/6
192.0.2.0/24
192.168.0.0/16
197.0.0.0/8
198.18.0.0/15
223.0.0.0/8
224.0.0.0/3

When I finally steal the time to set up my exim/SA filtering gateway I'll
check out using that list for ignore_target_hosts as well.
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com
"And the beer I had for breakfast
Wasn't bad, so I had one more for dessert."




Re: Can't locate object method "type"

2006-08-21 Thread Loren Wilton

Old version of Net::DNS?

   Loren



Re: Can't locate object method "type"

2006-08-21 Thread John Andersen
On Sunday 20 August 2006 23:50, Loren Wilton wrote:
> Old version of Net::DNS?
>
> Loren

Not according to cpan.  That was the first thing I checked.
I wonder what low-level stuff Net::DNS is trying to use?
Its running on an older SuSE Distro, and the newer boxes
don't seem to show this message.


-- 
_
John Andersen


pgpPpt4a61Ayr.pgp
Description: PGP signature


Enumerating the robots?

2006-08-21 Thread Loren Wilton
It was mentioned that several people are getting hammered by world-wide 
robot attacks.  I see from the little spam I get that there is a new spam 
sending tool for robots that is running a stock spam.  I suspect the traffic 
is a combination of distributing the new spam tool and sending out the new 
spam.


With all this traffic from robots, lots of people here must be getting quite 
a lot of information in their logs about connections from robots.  I wonder 
if there would be value in a central database that attempts to enumerater 
the robots?


Most of them are probably on dynamic ip.  But if the sending IP and 
attempted connect time could be logged at many sites and combined, there 
would be fairly conclusive evidence that a given IP had been sending spam at 
a particular time.  Perhaps that could be submitted to at least some of the 
more responsible service providers, and they could do something to track it 
back to a customer and send them an email that their machine is infected. 
(Or possibly be even more proactive, I suppose.)


The database might also be usable in front door spam blocking.  Most people 
probably shouldn't be accepting direct connections from dynamic ips on 
someone else's network, especially if that ip has a recent history of 
sending spam (say in the last 6 hours or so).  It might be possible to make 
a server that could provide yes/no answers on whether the IP has sent spam 
in the last minute/hour/6 hours/day or so.


I'd think that such a database could be built almost automatically.  For 
instance, if you log the IPs of connection attempts that you reject for 
various problems, you could just harvest those IPs once an hour or so to 
some central site, no human judgement calls required.  If the mail is 
accepted and gets a high SA score, and you can still determine the sending 
IP, then that might be automatically harvested also.


Thoughts?  Does somethign like this have any value?

   Loren



Re: Enumerating the robots?

2006-08-21 Thread John Andersen
On Monday 21 August 2006 00:09, Loren Wilton wrote:
> The database might also be usable in front door spam blocking.  Most people
> probably shouldn't be accepting direct connections from dynamic ips on
> someone else's network, especially if that ip has a recent history of
> sending spam (say in the last 6 hours or so).  It might be possible to make
> a server that could provide yes/no answers on whether the IP has sent spam
> in the last minute/hour/6 hours/day or so.
>
> I'd think that such a database could be built almost automatically.  For
> instance, if you log the IPs of connection attempts that you reject for
> various problems, you could just harvest those IPs once an hour or so to
> some central site, no human judgement calls required.  If the mail is
> accepted and gets a high SA score, and you can still determine the sending
> IP, then that might be automatically harvested also.

It sounds a lot more reasonable that arbitrarily blocking all dynamic IPs.
(Group punishment seems to be politically incorrect everywhere except among
mail nazis.)

It would be cool if it could be set up like Razor where it was easy to 
report .  (come to think of it, you could do it thru Razor by just contriving 
a mail containing ONLY the DNS, and reporting that to Razor).




Side Note:
Dynamic IPs are not all that dynamic any more with the increasing penetration
of broadband.  I've had the same IP for over a year now on some of my cable 
modems.  Even if I shell out for a static, I can't buy the reverse as my ISP
does not offer that, so I get tagged as a dynamic IP and can't send to
some people.  I have to forward a lot of it thru my un-reliable ISPs over
worked mail server. 

-- 
_
John Andersen


pgp5sNsDLUnUc.pgp
Description: PGP signature


Rule to trap unqualified image names

2006-08-21 Thread Ramprasad
I need to trap images that are not given full names
Something like this 


-=_NextPart_000_00EB_01C5061E.42C54EA0
Content-Type: image/gif; name="zpalaver"
Content-Transfer-Encoding: base64
Content-ID: <[EMAIL PROTECTED]>




The name should have been zpalaver.gif but the extension is deliberately
omitted. Can someone help me with a regex for images without \.(?:gif|
png|jpg) extensions


Thanks
Ram







Re: Rule to trap unqualified image names

2006-08-21 Thread Loren Wilton

-=_NextPart_000_00EB_01C5061E.42C54EA0
Content-Type: image/gif; name="zpalaver"



The name should have been zpalaver.gif but the extension is deliberately
omitted. Can someone help me with a regex for images without \.(?:gif|
png|jpg) extensions


There's a better way than a full rule if you have the right plugin 
installed, but I don't recall the magic keyword off the top of my head.  You 
can do it the slow way with


full NO_SUFFIX m'\nContent-Type: image/gif; name=\"\w+\"'

   Loren



Re: MySQL, DBI, transactions problem

2006-08-21 Thread Dimitar G. Katerinski
On Monday 21 August 2006 03:44, Benny Pedersen wrote:
> On Sun, August 20, 2006 20:21, Dimitar G. Katerinski wrote:
> > I'm trying to setup Spamassassin to use mysql for bayes storage. However
> > I'm experiencing problems with DBI complaining about "Transactions not
> > supported by database at /usr/lib/perl5/DBI.pm line 670."
>
> yep see bug
>
> http://bugs.gentoo.org/show_bug.cgi?id=143107
>
> > I know that this is not strictly a spamassassin issue, but maybe someone
> > from this list came upon such problem.
> >
> > Here's my setup:
> > OS: debian unstable
> > libdbi-perl - 1.51-2
> > mysql: 5.0.24
> >
> > ~# spamassassin -V
> > SpamAssassin version 3.1.4
> >   running on Perl version 5.8.8
> >
> > /etc/spamassassin/local.cf:
> > bayes_store_module   Mail::SpamAssassin::BayesStore::MySQL
>
> change to sql there, mysql does not work, sql does

Тhanks a lot, that fixed the problem ;-)


Re: MySQL, DBI, transactions problem

2006-08-21 Thread Mark Martinec
> > > /etc/spamassassin/local.cf:
> > > bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
> >
> > change to sql there, mysql does not work, sql does
>
> Тhanks a lot, that fixed the problem ;-)

That just avoided the problem by using a less efficient plugin
which does not need transactions.

  Mark


Re: MySQL, DBI, transactions problem

2006-08-21 Thread Dimitar G. Katerinski
On Monday 21 August 2006 13:45, Mark Martinec wrote:
> > > > /etc/spamassassin/local.cf:
> > > > bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
> > >
> > > change to sql there, mysql does not work, sql does
> >
> > Тhanks a lot, that fixed the problem ;-)
>
> That just avoided the problem by using a less efficient plugin
> which does not need transactions.
>
>   Mark
A little bit off topic on this list, but since you are involved with 
amavisd-new, do you know how to avoid the same problem in amavisd-new?

Regards,
Dimitar


Re: Enumerating the robots?

2006-08-21 Thread DAve

Loren Wilton wrote:
It was mentioned that several people are getting hammered by world-wide 
robot attacks.  I see from the little spam I get that there is a new 
spam sending tool for robots that is running a stock spam.  I suspect 
the traffic is a combination of distributing the new spam tool and 
sending out the new spam.


With all this traffic from robots, lots of people here must be getting 
quite a lot of information in their logs about connections from robots.  
I wonder if there would be value in a central database that attempts to 
enumerater the robots?


Most of them are probably on dynamic ip.  But if the sending IP and 
attempted connect time could be logged at many sites and combined, there 
would be fairly conclusive evidence that a given IP had been sending 
spam at a particular time.  Perhaps that could be submitted to at least 
some of the more responsible service providers, and they could do 
something to track it back to a customer and send them an email that 
their machine is infected. (Or possibly be even more proactive, I suppose.)


The database might also be usable in front door spam blocking.  Most 
people probably shouldn't be accepting direct connections from dynamic 
ips on someone else's network, especially if that ip has a recent 
history of sending spam (say in the last 6 hours or so).  It might be 
possible to make a server that could provide yes/no answers on whether 
the IP has sent spam in the last minute/hour/6 hours/day or so.


I'd think that such a database could be built almost automatically.  For 
instance, if you log the IPs of connection attempts that you reject for 
various problems, you could just harvest those IPs once an hour or so to 
some central site, no human judgement calls required.  If the mail is 
accepted and gets a high SA score, and you can still determine the 
sending IP, then that might be automatically harvested also.


Thoughts?  Does somethign like this have any value?

   Loren


Something like http://dhsield.org, but limited to email instead of all 
ports?


DAve


--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


Re: MySQL, DBI, transactions problem

2006-08-21 Thread Mark Martinec
> A little bit off topic on this list, but since you are involved with
> amavisd-new, do you know how to avoid the same problem in amavisd-new?

With Bayes db? Fix it within SpamAssassin, and amavisd-new will be happy too.
Like I mentioned previously, I've seen it too and it helped to reinstall 
DBD::mysql and DBI modules, but I'm not sure what is behind.

  Mark


Re: MySQL, DBI, transactions problem

2006-08-21 Thread Dimitar G. Katerinski
On Monday 21 August 2006 15:27, Mark Martinec wrote:
> > A little bit off topic on this list, but since you are involved with
> > amavisd-new, do you know how to avoid the same problem in amavisd-new?
>
> With Bayes db? Fix it within SpamAssassin, and amavisd-new will be happy
> too. Like I mentioned previously, I've seen it too and it helped to
> reinstall DBD::mysql and DBI modules, but I'm not sure what is behind.
Nope. I didn't write it clear, sorry about that. I want to make amavisd-new 
use sql for all *_ quarantine_method, but DBI is complaining about the same 
thing.

Regards,
Dimitar


RE: Blocking based on ALL IPs in the header

2006-08-21 Thread Rob McEwen
Magnus Holmgren said:
>It depends on the blacklist. Some, like Spamhaus SBL, 
>only list IP addresses known to be operated
>by spammers (and not unsuspecting home users with 
>hijacked computers). SA scores mail with such IP 
>addresses in ANY Received line. For other lists, the 
>first hop is ignored unless it's the *only* hop.

The software this company uses for spam filtering is **not** SA. And it
treats all RBLs the same... ANY RBL that is set up in a user's system will
check against ALL IPs in the header.

Therefore, in this situation, if the solution is to disable RBLs which
target zombies, and ONLY keep RBLs like SBL, then that is like getting a
lobotomy to fix a headache.

Sure, the FP problem would go away, but the spam caught by RBL lookups would
decrease dramatically.

In contrast, if ONLY the sending server's IP were checked... and RBLs like
XBL were ALSO used, then the FP problem would ALSO go away, but without any
noticeable decrease in the percent of spam caught by RBL lookups.

You might ask, why did I post this in the first place... forgive me for
being so off-topic... but I have these guys at this big software company and
this big bank who seem to think I'm the one who has lost his mind... So I
was hoping for to feedback to make sure that I'm not the one who is crazy
here!

Rob McEwen



Feeding bayes outbounds

2006-08-21 Thread Joe Zitnik
Our scanning program has the ability to archive all e-mail, both inbound
and outbound, which we have been doing for months now.  Given that your
outbound mail is almost certainly ham, the majority of it's content is
going to be specific to our business sector, wouldn't feeding outbounds
through bayes manually be a win win situation?  Am I oversimplifying
things, or am I missing something with that logic?


a new kind of spam (with images)

2006-08-21 Thread Stephane Bentebba

hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go throught it :
spam which has poor text, poor token, or none, and a subject always changing
the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news concerning..." 
or "we have a runner !"

would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and 
recognize the image ?
i can send you samples of this spam if you like... (prefer not to attach 
them)


header extract :
<<
X-Spam-FPS_Level: **
X-Spam-FPS_Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
empereur.rungis

X-Spam-FPS_Status: No, score=2.5 required=6.0 tests=DATE_IN_PAST_06_12,
HTML_60_70,HTML_MESSAGE,LONGWORDS autolearn=no version=3.0.4

>


fanx in advance


--
*Stéphane Bentebba
*Technicien de Maintenance

Tél.:
 +33 (0)1.41.73.20.16
Fax.:
 +33 (0)1.41.73.20.08

[EMAIL PROTECTED] 
www.fps.fr

   *FPS France*
Parc d'affaire Silic
43, rue de la Grosse Pierre
BP 40160
94.533 RUNGIS Cedex








MISSING_SUBJECT always matching

2006-08-21 Thread Jessica Perry Hekman
Hi all. I just started using spamassassin for the first time. It's 
marking everything as spam, because MISSING_SUBJECT is always matching, 
although the mail does have Subject: lines.

I searched the archives, and found mail from someone else in 2005 with 
this problem. The suggestion there was that perhaps the headers were 
garbled so that SA wasn't parsing them properly. I looked at the 
incoming headers and see nothing wrong -- here are the headers for the 
mail confirming my subscription to this list (which was marked as spam 
due to MISSING_SUBJECT and MISSING_TO and other things):

>From [EMAIL PROTECTED] Mon Aug 21 13:54:02 2006
Return-Path: <[EMAIL PROTECTED]>
Delivered-To: jphekman-arborius:[EMAIL PROTECTED]
X-Envelope-To: [EMAIL PROTECTED]
>From [EMAIL PROTECTED] Mon Aug 21 13:54:02 2006
Return-Path: <[EMAIL PROTECTED]>
Delivered-To: jphekman-arborius:[EMAIL PROTECTED]
X-Envelope-To: [EMAIL PROTECTED]
Received: (qmail 88561 invoked from network); 21 Aug 2006 13:54:02 -
Received: from localhost.pair.com (HELO misilay.pair.com) (127.0.0.1)
  by localhost.pair.com with SMTP; 21 Aug 2006 13:54:02 -
Received: from mail.apache.org (hermes.apache.org [209.237.227.199])
by misilay.pair.com (Postfix) with SMTP id 2F108C9397
for <[EMAIL PROTECTED]>; Mon, 21 Aug 2006 09:54:02 -0400 
(EDT)
Received: (qmail 59925 invoked by uid 500); 21 Aug 2006 13:54:01 -
Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
List-Help: 
List-Post: 
List-Subscribe: 
Date: 21 Aug 2006 13:54:01 -
Message-ID: <[EMAIL PROTECTED]>
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Delivered-To: responder for users@spamassassin.apache.org
Received: (qmail 59916 invoked by uid 99); 21 Aug 2006 13:54:01 -
Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Aug 2006 06:54:01 
-0700
Received: from [66.92.76.133] (HELO pendaran.arborius.net) 
(66.92.76.133)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Aug 2006 06:53:59 
-0700
Received: by pendaran.arborius.net (Postfix, from userid 1001)
id 0B547F2D21; Mon, 21 Aug 2006 09:53:39 -0400 (EDT)
Received: by pendaran.arborius.net (tmda-sendmail, from uid 1001);
Mon, 21 Aug 2006 09:53:38 -0400 (EDT)
MIME-Version: 1.0
Content-type: text/plain; charset=us-ascii
List-Unsubscribe: 

Subject: WELCOME to users@spamassassin.apache.org


SA's headers after processing were:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on 
misilay.pair.com
X-Spam-Level: 
X-Spam-Status: Yes, score=4.4 required=3.5 tests=MISSING_HB_SEP,
MISSING_SUBJECT,NO_RECEIVED,NO_RELAYS,TO_CC_NONE autolearn=no
version=3.1.3
X-Spam-Report:
* -0.0 NO_RELAYS Informational: message was not relayed via SMTP
*  2.5 MISSING_HB_SEP Missing blank line between message header and body
*  1.7 MISSING_SUBJECT Missing Subject: header
* -0.0 NO_RECEIVED Informational: message has no Received headers
*  0.1 TO_CC_NONE No To: or Cc: header

Thanks for any help you can give!

Jessica



RE: a new kind of spam (with images)

2006-08-21 Thread Randal, Phil
Upgrade to SA 3.1.4 and use a daily sa-update cron job to get the latest
rules.

Make sure you're using DCC, Razor, Pyzor, URIBLs, etc.

Use the rules from www.rulesemporium.com, and add the ImageIfo plugin
from there (http://www.rulesemporium.com/plugins.htm).

And train your bayes well.

Works for me.

Cheers,

Phil

--
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK  

> -Original Message-
> From: Stephane Bentebba [mailto:[EMAIL PROTECTED] 
> Sent: 21 August 2006 14:32
> To: users@spamassassin.apache.org
> Subject: a new kind of spam (with images)
> 
> hi all,
> 
> i am more or less happy with my spamassassin configuration
> works good for one year
> but i have problem with a new kind of spam which easylly go 
> throught it :
> spam which has poor text, poor token, or none, and a subject 
> always changing
> the only thing which remain the same is the image incoporated in it
> it get always very low hit (bellow 3)
> subject on the image in the body is either "breaking news 
> concerning..." 
> or "we have a runner !"
> would it be possible to find a solution ?
> add / modify a test to look at first bytes of an attachement and 
> recognize the image ?
> i can send you samples of this spam if you like... (prefer 
> not to attach 
> them)
> 
> header extract :
> <<
> X-Spam-FPS_Level: **
> X-Spam-FPS_Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
> empereur.rungis
> X-Spam-FPS_Status: No, score=2.5 required=6.0 
> tests=DATE_IN_PAST_06_12,
> HTML_60_70,HTML_MESSAGE,LONGWORDS autolearn=no version=3.0.4
> > >
> 
> fanx in advance
> 
> 
> -- 
> *Stéphane Bentebba
> *Technicien de Maintenance
>   
> Tél.:
>+33 (0)1.41.73.20.16
> Fax.:
>+33 (0)1.41.73.20.08
> 
> [EMAIL PROTECTED] 
> www.fps.fr
> 
>    *FPS France*
> Parc d'affaire Silic
> 43, rue de la Grosse Pierre
> BP 40160
> 94.533 RUNGIS Cedex
> 
> 
> 
> 
> 
> 


Re: a new kind of spam (with images)

2006-08-21 Thread Stephane Bentebba




fanku,

i do use DCC, razor and pyzor
do use also uribl with qmail-scanner (i think)
there is a cron wich sa-learn all ham and spam (more than 2000 spams
inside)

but i could gain in using the rest you adviced to me
so i will have a serious look at it - as soon as i can
especially the sa-update function 
the plugin imageinfo unfortunatelly don't try to recognise the image so
it won't help me on this problem

fanx great phil


Randal, Phil a écrit :

  Upgrade to SA 3.1.4 and use a daily sa-update cron job to get the latest
rules.

Make sure you're using DCC, Razor, Pyzor, URIBLs, etc.

Use the rules from www.rulesemporium.com, and add the ImageIfo plugin
from there (http://www.rulesemporium.com/plugins.htm).

And train your bayes well.

Works for me.

Cheers,

Phil

--
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK  

  
  
-Original Message-
From: Stephane Bentebba [mailto:[EMAIL PROTECTED]] 
Sent: 21 August 2006 14:32
To: users@spamassassin.apache.org
Subject: a new kind of spam (with images)

hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go 
throught it :
spam which has poor text, poor token, or none, and a subject 
always changing
the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news 
concerning..." 
or "we have a runner !"
would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and 
recognize the image ?
i can send you samples of this spam if you like... (prefer 
not to attach 
them)

header extract :
<<
X-Spam-FPS_Level: **
X-Spam-FPS_Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
empereur.rungis
X-Spam-FPS_Status: No, score=2.5 required=6.0 
tests=DATE_IN_PAST_06_12,
HTML_60_70,HTML_MESSAGE,LONGWORDS autolearn=no version=3.0.4

fanx in advance


-- 
*Stéphane Bentebba
*Technicien de Maintenance
	
Tél.:
	 +33 (0)1.41.73.20.16
Fax.:
	 +33 (0)1.41.73.20.08

[EMAIL PROTECTED] 
www.fps.fr

 	*FPS France*
Parc d'affaire Silic
43, rue de la Grosse Pierre
BP 40160
94.533 RUNGIS Cedex






  
  

  


-- 
mysignature


  

   Stéphane Bentebba
  Technicien
de Maintenance
  
  
  

  
Tél.:
 
 +33 (0)1.41.73.20.16
 
  
  
Fax.:
 
 +33 (0)1.41.73.20.08
 
  

  
  [EMAIL PROTECTED]
www.fps.fr
  
  



  FPS France
Parc d'affaire Silic
43, rue de la Grosse Pierre
BP 40160
94.533 RUNGIS Cedex
  

  










Re: a new kind of spam (with images)

2006-08-21 Thread Matthias Keller

Stephane Bentebba wrote:

hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go throught it :
spam which has poor text, poor token, or none, and a subject always 
changing

the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news 
concerning..." or "we have a runner !"

would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and 
recognize the image ?
i can send you samples of this spam if you like... (prefer not to 
attach them)

Have a look at FuzzyOCR
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin

Works very well for me - I'm using it in conjuction with ImageInfo and 
since I'm using them those image spams get through VERY rarely


Matt


Re: MySQL, DBI, transactions problem

2006-08-21 Thread Mark Martinec
> Nope. I didn't write it clear, sorry about that. I want to make amavisd-new
> use sql for all *_ quarantine_method, but DBI is complaining about the same
> thing.

I understood. It is the same problem, once you resolve the problem with 
Mail::SpamAssassin::BayesStore::MySQL, amavisd logging and quarantining
to SQL will work too.

  Mark


Re: a new kind of spam (with images)

2006-08-21 Thread Spamassassin List

Stephane Bentebba wrote:

hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go throught it :
spam which has poor text, poor token, or none, and a subject always 
changing

the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news concerning..." 
or "we have a runner !"

would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and 
recognize the image ?
i can send you samples of this spam if you like... (prefer not to attach 
them)

Have a look at FuzzyOCR
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin

Works very well for me - I'm using it in conjuction with ImageInfo and 
since I'm using them those image spams get through VERY rarely


They will also block off legit emails too 



Re: a new kind of spam (with images)

2006-08-21 Thread Stephane Bentebba

looks like exactly i was in need,
beside, sa-update seems very powerfull thought

fanx alot

Chris a écrit :


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hey Stephane... you might want to try FuzzyOcr.


see http://wiki.apache.org/spamassassin/FuzzyOcrPlugin

for more info...


Chris

Stephane Bentebba wrote:
 


fanku,

i do use DCC, razor and pyzor do use also uribl with qmail-scanner
(i think) there is a cron wich sa-learn all ham and spam (more than
2000 spams inside)

but i could gain in using the rest you adviced to me so i will have
a serious look at it - as soon as i can especially the sa-update
function the plugin imageinfo unfortunatelly don't try to recognise
the image so it won't help me on this problem

fanx great phil


Randal, Phil a écrit :
   


Upgrade to SA 3.1.4 and use a daily sa-update cron job to get the
latest rules.

Make sure you're using DCC, Razor, Pyzor, URIBLs, etc.

Use the rules from www.rulesemporium.com, and add the ImageIfo
plugin from there (http://www.rulesemporium.com/plugins.htm).

And train your bayes well.

Works for me.

Cheers,

Phil

-- Phil Randal Network Engineer Herefordshire Council Hereford,
UK


 


-Original Message- From: Stephane Bentebba
[mailto:[EMAIL PROTECTED] Sent: 21 August 2006 14:32
To: users@spamassassin.apache.org Subject: a new kind of spam
(with images)

hi all,

i am more or less happy with my spamassassin configuration
works good for one year but i have problem with a new kind of
spam which easylly go throught it : spam which has poor text,
poor token, or none, and a subject always changing the only
thing which remain the same is the image incoporated in it it
get always very low hit (bellow 3) subject on the image in the
body is either "breaking news concerning..." or "we have a
runner !" would it be possible to find a solution ? add /
modify a test to look at first bytes of an attachement and
recognize the image ? i can send you samples of this spam if
you like... (prefer not to attach them)

header extract : << X-Spam-FPS_Level: **
X-Spam-FPS_Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on
empereur.rungis X-Spam-FPS_Status: No, score=2.5 required=6.0
tests=DATE_IN_PAST_06_12, HTML_60_70,HTML_MESSAGE,LONGWORDS
autolearn=no version=3.0.4

fanx in advance


-- *Stéphane Bentebba *Technicien de Maintenance  Tél.: +33
(0)1.41.73.20.16 Fax.: +33 (0)1.41.73.20.08

[EMAIL PROTECTED]  www.fps.fr
  *FPS France* Parc
d'affaire Silic 43, rue de la Grosse Pierre BP 40160 94.533
RUNGIS Cedex






   



 


-- mysignature *Stéphane Bentebba *Technicien de Maintenance  Tél.:
+33 (0)1.41.73.20.16 Fax.: +33 (0)1.41.73.20.08

[EMAIL PROTECTED]  www.fps.fr
  *FPS France* Parc
d'affaire Silic 43, rue de la Grosse Pierre BP 40160 94.533 RUNGIS
Cedex





   



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE6cQXJQIKXnJyDxURAi1lAKCiWPPAX5NE4g0IMVCDqiksbxpmrwCeKPTm
rvcmvqZFGxrmedr4y/l6mkg=
=k+21
-END PGP SIGNATURE-


 



--
*Stéphane Bentebba
*Technicien de Maintenance

Tél.:
 +33 (0)1.41.73.20.16
Fax.:
 +33 (0)1.41.73.20.08

[EMAIL PROTECTED] 
www.fps.fr

   *FPS France*
Parc d'affaire Silic
43, rue de la Grosse Pierre
BP 40160
94.533 RUNGIS Cedex








SA autolearn=no

2006-08-21 Thread Rory Vieira
Hi,

Just a small problem with sa...

Two different machines:
Home=AMD Athlon XP 2600+, 512M, 40G hdd, SuSE 9.3, SA 3.1.4 (from cpan)
Work=AMD 4200 Dualcore, 2G, 3 * 300G sata, SuSE 10.1, SA 3.1.4 (from cpan)

The configuration on both machines is the same. Eg, I installed SA from cpan, 
and didn't do anything more.
The only thing really different is that I tried to follow 
http://gtmp.org/pub/sa-postfix.en.html on the work machine (which doesn't seem 
to work also LOL, but that's beside the point here ;) )

On my work, I get almost the same messages as I do at home. So most of the spam 
messages are the same too.
But the funny thing is that my home machine is properly assigning messages as 
spam, but the work machine is not.
In the headers (work) I found: autolearn=no
But that doesn't mean anything to me :(

I'm not sure what I'm doing wrong.
Hopefully someone on this list has an answer to this...

Cheers,
Rory





Re: How can I (we) get rid of this?

2006-08-21 Thread Stuart Johnston

Anders Norrbring wrote:

Hiya all!
I'm getting really sick on recieving 10-100 of the attached mails every 
day. Any suggestions on how to get rid of them?  Apparently my 
Amavis-new and SpamAssassin only tags them from 0 to 1.6 points.


FuzzyOCR, ImageInfo, SARE, sa-update.


Re: SA autolearn=no

2006-08-21 Thread Matt Kettler
Rory Vieira wrote:
> Hi,
>
> Just a small problem with sa...
>
> Two different machines:
> Home=AMD Athlon XP 2600+, 512M, 40G hdd, SuSE 9.3, SA 3.1.4 (from cpan)
> Work=AMD 4200 Dualcore, 2G, 3 * 300G sata, SuSE 10.1, SA 3.1.4 (from cpan)
>
> The configuration on both machines is the same. Eg, I installed SA from cpan, 
> and didn't do anything more.
> The only thing really different is that I tried to follow 
> http://gtmp.org/pub/sa-postfix.en.html on the work machine (which doesn't 
> seem to work also LOL, but that's beside the point here ;) )
>
> On my work, I get almost the same messages as I do at home. So most of the 
> spam messages are the same too.
> But the funny thing is that my home machine is properly assigning messages as 
> spam, but the work machine is not.
> In the headers (work) I found: autolearn=no
> But that doesn't mean anything to me :(
>   
The autolearn=no means the message did not score high enough to trigger
automatic bayes training. Don't worry about it for now..

Realistically, in order to determine why your work machine isn't tagging
the messages as spam we'd at least need to see a list of rule hits, such
as from an X-Spam-Status header. Preferably a pair of X-Spam-Status
headers, one from your home, and one from your work, for the same (or
almost same) spam message. That would allow us to compare what rules are
hitting at each site and compare the differences.







Re: a new kind of spam (with images)

2006-08-21 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Spamassassin List wrote:
>> Stephane Bentebba wrote:
>>> hi all,
>>>
>>> i am more or less happy with my spamassassin configuration
>>> works good for one year
>>> but i have problem with a new kind of spam which easylly go
>>> throught it :
>>> spam which has poor text, poor token, or none, and a subject
>>> always changing
>>> the only thing which remain the same is the image incoporated in it
>>> it get always very low hit (bellow 3)
>>> subject on the image in the body is either "breaking news
>>> concerning..." or "we have a runner !"
>>> would it be possible to find a solution ?
>>> add / modify a test to look at first bytes of an attachement and
>>> recognize the image ?
>>> i can send you samples of this spam if you like... (prefer not to
>>> attach them)
>> Have a look at FuzzyOCR
>> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>>
>> Works very well for me - I'm using it in conjuction with ImageInfo
>> and since I'm using them those image spams get through VERY rarely
>
> They will also block off legit emails too
How so?
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE6dHBJQIKXnJyDxURAvwSAKCF5ui6PjFaVE5Bu1/2OiJOZqvw6QCgoawE
tJi3dueHtpY8BlTryoz6FvU=
=Fp0w
-END PGP SIGNATURE-



Re: How can I (we) get rid of this?

2006-08-21 Thread Anders Norrbring

Stuart Johnston skrev:

Anders Norrbring wrote:

Hiya all!
I'm getting really sick on recieving 10-100 of the attached mails 
every day. Any suggestions on how to get rid of them?  Apparently my 
Amavis-new and SpamAssassin only tags them from 0 to 1.6 points.


FuzzyOCR, ImageInfo, SARE, sa-update.


I haven't looked at FuzzyOCR or ImageInfo at all, are they compatible 
with SA 2.64?


I run RulesDuJour with these rulesets TRUSTED_RULESETS="TRIPWIRE 
SARE_EVILNUMBERS0 SARE_EVILNUMBERS1 SARE_EVILNUMBERS2 SARE_HTML0 
SARE_UNSUB SARE_URI0 SARE_OBFU0 SARE_WHITELIST SARE_RANDOM SARE_REDIRECT 
SARE_BAYES_POISON_NXM SARE_CODING SARE_HEADER SARE_SPECIFIC SARE_ADULT 
SARE_BML SARE_FRAUD SARE_SPOOF SARE_RANDOM SARE_OEM SARE_HIGHRISK ";


Seems like none of them grabs it.

--

Anders Norrbring
Norrbring Consulting


Re: a new kind of spam (with images)

2006-08-21 Thread Stuart Johnston

decoder wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Spamassassin List wrote:

Stephane Bentebba wrote:

hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go
throught it :
spam which has poor text, poor token, or none, and a subject
always changing
the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news
concerning..." or "we have a runner !"
would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and
recognize the image ?
i can send you samples of this spam if you like... (prefer not to
attach them)

Have a look at FuzzyOCR
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin

Works very well for me - I'm using it in conjuction with ImageInfo
and since I'm using them those image spams get through VERY rarely

They will also block off legit emails too

How so?


I wouldn't expect any from FuzzyOCR but ImageInfo certainly has the chance to 
block legit mail.


Re: How can I (we) get rid of this?

2006-08-21 Thread Theo Van Dinter
On Mon, Aug 21, 2006 at 05:36:40PM +0200, Anders Norrbring wrote:
> I haven't looked at FuzzyOCR or ImageInfo at all, are they compatible 
> with SA 2.64?

No.  They're plugins, which would require SA 3.0 or later.

-- 
Randomly Generated Tagline:
.senilgat gnidaer emit hcum oot dneps uoY


pgpngBomhRacN.pgp
Description: PGP signature


sa-update and VirusScannerTypeUpdates

2006-08-21 Thread Andreas Pettersson

Hi.

I keep seeing suggestions to use sa-update quite often on this list, but 
I thought it was no use doing so between releases according to this page:

http://wiki.apache.org/spamassassin/VirusScannerTypeUpdates
with these exact words in the end:

"Daily and/or weekly updates aren't practical, because it takes weeks to 
evolve a scoreset for a release."


So, how often are there new rules available via sa-update?


Regards,
Andreas



RE: How can I (we) get rid of this?

2006-08-21 Thread Jean-Paul Natola


-Original Message-
From: Stuart Johnston [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 21, 2006 11:05 AM
To: users@spamassassin.apache.org
Subject: Re: How can I (we) get rid of this?

Anders Norrbring wrote:
> Hiya all!
> I'm getting really sick on recieving 10-100 of the attached mails every 
> day. Any suggestions on how to get rid of them?  Apparently my 
> Amavis-new and SpamAssassin only tags them from 0 to 1.6 points.

FuzzyOCR, ImageInfo, SARE, sa-update.


I'm getting an error when attempting to run sa-update 

Can't locate Archive/Tar.pm in @INC (@INC contains:
/usr/local/lib/perl5/site_perl/5.8.8 /usr/local/lib/perl5/5.8.8/BSDPAN
/usr/local/lib/perl5/site_perl/5.8.8/mach
/usr/local/lib/perl5/site_perl/5.8.7 /usr/local/lib/perl5/site_perl/5.8.6
/usr/local/lib/perl5/site_perl /usr/local/lib/perl5/5.8.8/mach
/usr/local/lib/perl5/5.8.8) at /usr/local/bin/sa-update line 78.
BEGIN failed--compilation aborted at /usr/local/bin/sa-update line 78.

Here is line 78

eval { use Archive::Tar; };


Re: MySQL, DBI, transactions problem

2006-08-21 Thread Benny Pedersen
On Mon, August 21, 2006 03:02, Mark Martinec wrote:
>> On Sun, August 20, 2006 20:21, Dimitar G. Katerinski wrote:
>> > I'm trying to setup Spamassassin to use mysql for bayes storage. However
>> > I'm experiencing problems with DBI complaining about "Transactions not
>> > supported by database at /usr/lib/perl5/DBI.pm line 670."
>>
>> yep see bug
>> http://bugs.gentoo.org/show_bug.cgi?id=143107
>
> In my case it helped reinstalling DBD::mysql, after upgrading DBI.
> (Mail::SpamAssassin::BayesStore::MySQL, DBD::mysql 3.0006, DBI 1.52)
> Didn't investigate further.

thanks, it solved it here aswell

mysql 4.1.21
DBD-mysql 3.0004
DBI 1.50

the magic must be to recompile

i will close the bug on gentoo

-- 
Benny



Re: a new kind of spam (with images)

2006-08-21 Thread John D. Hardin
On Mon, 21 Aug 2006, Spamassassin List wrote:

> > Have a look at FuzzyOCR
> > http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
> >
> > Works very well for me - I'm using it in conjuction with ImageInfo and 
> > since I'm using them those image spams get through VERY rarely
> 
> They will also block off legit emails too 

Care to back up that contention with a reasoned argument in a
non-anonymous message?

A naked snipe like that, with no supporting data, no signature, and
from a mailbox with no real name, will be taken with a *big* grain of
salt... 

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  America is an amazing country. We spend billions of dollars sending
  troops halfway around the world to evict a sadistic dictator and
  bring freedom to the people of a foreign country, while at the same
  time working to build a police state at home.
---
 29 days until Talk Like a Pirate day



sa-update and Proxy firewall

2006-08-21 Thread leonard . gray

Sorry to be "late to the dance",
but is there a relatively simple way to have sa-update traverse a proxy
based firewall?  We'd need to provide username / password to get out.

Thanks!
Leonard Gray
Groupware and Email Administration
Information Technology Services, Washington Savannah River Company
Internet: [EMAIL PROTECTED]
Phone: (803) 725-6022


Re: a new kind of spam (with images)

2006-08-21 Thread Spamassassin List

Spamassassin List wrote:

Stephane Bentebba wrote:

hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go
throught it :
spam which has poor text, poor token, or none, and a subject
always changing
the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news
concerning..." or "we have a runner !"
would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and
recognize the image ?
i can send you samples of this spam if you like... (prefer not to
attach them)

Have a look at FuzzyOCR
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin

Works very well for me - I'm using it in conjuction with ImageInfo
and since I'm using them those image spams get through VERY rarely

They will also block off legit emails too

How so?


I wouldn't expect any from FuzzyOCR but ImageInfo certainly has the chance 
to block legit mail.


Sorry, I meant ImageInfo plugin.. I have many legit emails blocked by this 
plugin. 



RE: sa-update and VirusScannerTypeUpdates

2006-08-21 Thread Randal, Phil
sa-update last updated the rules here on August 14th.

Theo on this list a short while back recommended daily checking.

Cheers,

Phil

--
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK  

> -Original Message-
> From: Andreas Pettersson [mailto:[EMAIL PROTECTED] 
> Sent: 21 August 2006 16:46
> To: SpamAssassin
> Subject: sa-update and VirusScannerTypeUpdates
> 
> Hi.
> 
> I keep seeing suggestions to use sa-update quite often on 
> this list, but 
> I thought it was no use doing so between releases according 
> to this page:
> http://wiki.apache.org/spamassassin/VirusScannerTypeUpdates
> with these exact words in the end:
> 
> "Daily and/or weekly updates aren't practical, because it 
> takes weeks to 
> evolve a scoreset for a release."
> 
> So, how often are there new rules available via sa-update?
> 
> 
> Regards,
> Andreas
> 


Re: sa-update and VirusScannerTypeUpdates

2006-08-21 Thread Theo Van Dinter
On Mon, Aug 21, 2006 at 05:46:19PM +0200, Andreas Pettersson wrote:
> I keep seeing suggestions to use sa-update quite often on this list, but 
> I thought it was no use doing so between releases according to this page:
> http://wiki.apache.org/spamassassin/VirusScannerTypeUpdates
> with these exact words in the end:
> 
> "Daily and/or weekly updates aren't practical, because it takes weeks to 
> evolve a scoreset for a release."
> 
> So, how often are there new rules available via sa-update?

This is unfortunately not a completely easy answer.  Here's my short version:

- I've never seen that wiki page before, and it's out of date wrt what we're
  doing now.  The line you quoted also says "evolve a scoreset" which is GA
  terminology, so I think the page is probably further outdated than the date
  on the page (we haven't used the GA in a long time).

- sa-update is a generic tool used to download configs/rules/scores/etc, so
  "how often are ..." is really dependent on what channel you're talking
  about.

- If you're talking about the default updates.spamassassin.org channel, and
  you're talking about 3.1, new updates are made available ... well, generally
  anytime we (usually I) get around to making a new update available.  There
  is no specific schedule, and new updates can be created for various reasons,
  not just to make new rules available.

  Generally speaking, updates could occur ever 15m w/ the current config, but
  realistically it may be once a week or so.  If setting up a cronjob to do
  updates, I'd probably go daily for now.

-- 
Randomly Generated Tagline:
"Before marriage, a man yearns for the woman he loves.  After marriage,
 the 'Y' becomes silent."- Unknown


pgpJRNgkEFEdI.pgp
Description: PGP signature


Re: How can I (we) get rid of this?

2006-08-21 Thread Theo Van Dinter
On Mon, Aug 21, 2006 at 11:52:04AM -0400, Jean-Paul Natola wrote:
> I'm getting an error when attempting to run sa-update 
> 
> Can't locate Archive/Tar.pm in @INC (@INC contains:
[...]

Did you install the sa-update required modules, as listed in the INSTALL doc?
(Archive::Tar, LWP, IO::Zlib)

-- 
Randomly Generated Tagline:
"First they ignore you, then they laugh at you, then they fight you,
 then you win."  - Gandhi


pgpKGlrYb8tPA.pgp
Description: PGP signature


RE: howto let SA find more spam

2006-08-21 Thread Bowie Bailey
Pitmaster wrote:
> Hi there,
> I am a total newbie. Have SA installed and done nothing sinds.
> SA is working and catches +- 5% of the spam. How can I simply help SA
> to catch more and keep the speed up?

1) Get the latest version of SA
2) Run sa-update to get the latest rules
3) Enable the network tests
   - Make sure the Perl module Net::DNS is up to date.
   - Make sure spamd is not being run with the -L option.
4) Fix your trusted_networks configuration in local.cf.
   - See the wiki entry at:
 http://wiki.apache.org/spamassassin/TrustPath
5) Enable Razor2 and DCC
   - Download and install the programs:
 http://razor.sourceforge.net
 http://www.rhyolite.com/anti-spam/dcc/
   - Uncomment the plugin lines in v310.pre 
6) Install some add-on rules from SARE
   http://www.rulesemporium.com/rules.htm

The network tests will catch more spam that any other single thing you
can do with SA.  They take advantage of online blacklists to increase
the score of mail which has passed through a known spam relay, or
which includes a known spam URL in the text.

The trusted_networks setting allows SA to determine which headers in
the message came from your network (or at least networks that you
trust not to falsify their received headers).

Razor2 tries to determine if this message has been classified as spam
by others on the net.  DCC simply looks for messages that have been
distributed to lots of people.

The SARE rules are extra rules that have been developed to help with
certain types of spam.  Just read through the descriptions and see
which ones you think would be helpful.  SARE_STOCKS is probably the
most useful for me at the moment.

Speed is generally a balancing act between the size of the spamd
processes and the available RAM.  The more rules and add-ons you use,
the bigger each spamd process will be and the fewer of them you can
run.  If you notice your server getting VERY slow, chances are you
have too many spamd children and you should reduce the -m setting for
spamd.

-- 
Bowie


Re: sa-update and Proxy firewall

2006-08-21 Thread Theo Van Dinter
On Mon, Aug 21, 2006 at 11:53:40AM -0400, [EMAIL PROTECTED] wrote:
> Sorry to be "late to the dance", but is there a relatively simple way to 
> have sa-update traverse a proxy based firewall?  We'd need to provide 
> username / password to get out.

Hrm.  sa-update uses LWP for doing http requests, and setting things
like http_proxy is allowed.  However, looking at the docs quickly,
there doesn't appear to be a way to authenticate to the proxy using LWP, so
there won't be a way via sa-update either.

-- 
Randomly Generated Tagline:
I'd love to, but there's a disturbance in the Force.


pgpBXHslim4HO.pgp
Description: PGP signature


Re: sa-update and VirusScannerTypeUpdates

2006-08-21 Thread Andreas Pettersson

Theo Van Dinter wrote:


On Mon, Aug 21, 2006 at 05:46:19PM +0200, Andreas Pettersson wrote:
 

I keep seeing suggestions to use sa-update quite often on this list, but 
I thought it was no use doing so between releases according to this page:

http://wiki.apache.org/spamassassin/VirusScannerTypeUpdates
with these exact words in the end:

"Daily and/or weekly updates aren't practical, because it takes weeks to 
evolve a scoreset for a release."


So, how often are there new rules available via sa-update?
   


...

 Generally speaking, updates could occur ever 15m w/ the current config, but
 realistically it may be once a week or so.  If setting up a cronjob to do
 updates, I'd probably go daily for now.



Thank you very much. Now I have a better idea of what to expect from 
sa-update.


Regards,
Andreas



Re: sa-update and VirusScannerTypeUpdates

2006-08-21 Thread Justin Mason

Andreas Pettersson writes:
>I keep seeing suggestions to use sa-update quite often on this list, but 
>I thought it was no use doing so between releases according to this page:
>http://wiki.apache.org/spamassassin/VirusScannerTypeUpdates
>with these exact words in the end:
>
>"Daily and/or weekly updates aren't practical, because it takes weeks to 
>evolve a scoreset for a release."
>
>So, how often are there new rules available via sa-update?

well, the scoreset isn't yet being evolved for each sa-update release;
instead the scores are statically assigned by hand.  

Other than that, though, we should probably update that page to note that
we are doing part of that anyway ;)

--j.


OCR plugin doesn't seem to work

2006-08-21 Thread Mike Pepe

Hey guys,

Running SA 3.1.1, on Fedora Core 3, with Perl 5.8.5

I installed gocr and imagemagick packages, copied the Ocr.pm and cf 
files into /etc/mail/spamassassin


The tests don't seem to run, the pump 'n dump GIFs are still arriving 
and I don't see that the test is being run in the headers. Other SARE 
and custom rules in that directory are running though. The permissions 
are the same, etc. Anyone have any ideas?


# ls
70_sare_adult.cf 70_sare_uri1.cf   spamassassin-default.rc
70_sare_obfu0.cf 99_sare_fraud_post25x.cf  spamassassin-helper.sh
70_sare_obfu1.cf 99_sare_fraud_pre25x.cf   spamassassin-spamc.rc
70_sare_oem.cf   cathy_caparula.cf tripwire.cf
70_sare_random.cfinit.pre  v310.pre
70_sare_specific.cf  local.cf  WebRedirect.cf
70_sare_spoof.cf Ocr.cfWebRedirect.pm
70_sare_stocks.cfOcr.pm
70_sare_uri0.cf  RulesDuJour

-Mike


Re: OCR plugin doesn't seem to work

2006-08-21 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mike Pepe wrote:
> Hey guys,
>
> Running SA 3.1.1, on Fedora Core 3, with Perl 5.8.5
>
> I installed gocr and imagemagick packages, copied the Ocr.pm and cf
>  files into /etc/mail/spamassassin
>
> The tests don't seem to run, the pump 'n dump GIFs are still
> arriving and I don't see that the test is being run in the headers.
>  Other SARE and custom rules in that directory are running though.
> The permissions are the same, etc. Anyone have any ideas?
>
> # ls 70_sare_adult.cf 70_sare_uri1.cf
> spamassassin-default.rc 70_sare_obfu0.cf
> 99_sare_fraud_post25x.cf  spamassassin-helper.sh 70_sare_obfu1.cf
> 99_sare_fraud_pre25x.cf   spamassassin-spamc.rc 70_sare_oem.cf
> cathy_caparula.cf tripwire.cf 70_sare_random.cfinit.pre
> v310.pre 70_sare_specific.cf  local.cf
> WebRedirect.cf 70_sare_spoof.cf Ocr.cf
> WebRedirect.pm 70_sare_stocks.cfOcr.pm 70_sare_uri0.cf
> RulesDuJour
>
Which OCR plugin are you using there? If it is the original OcrPlugin,
then you might try FuzzyOcr instead. The original OcrPlugin was more
proof-of-concept, and will cause you lots of headaches with the
current image spam...


Chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE6e/4JQIKXnJyDxURAlpSAJwInsGumasFgOK0ZOGp6M5W5Atw1ACeMqpx
QKBndV7iGnXOuxQJVip/ox4=
=GpHQ
-END PGP SIGNATURE-



Re: MISSING_SUBJECT always matching

2006-08-21 Thread Justin Mason

hi Jessica --

I would suggest checking line endings -- that's a classic symptom
of \r\n being used where other parts of the mail delivery pipeline
are expecting \n.

--j.

Jessica Perry Hekman writes:
>Hi all. I just started using spamassassin for the first time. It's 
>marking everything as spam, because MISSING_SUBJECT is always matching, 
>although the mail does have Subject: lines.
>
>I searched the archives, and found mail from someone else in 2005 with 
>this problem. The suggestion there was that perhaps the headers were 
>garbled so that SA wasn't parsing them properly. I looked at the 
>incoming headers and see nothing wrong -- here are the headers for the 
>mail confirming my subscription to this list (which was marked as spam 
>due to MISSING_SUBJECT and MISSING_TO and other things):
>
>>>From [EMAIL PROTECTED] Mon Aug 21 13:54:02 2006
>Return-Path: <[EMAIL PROTECTED]>
>Delivered-To: jphekman-arborius:[EMAIL PROTECTED]
>X-Envelope-To: [EMAIL PROTECTED]
>>>From [EMAIL PROTECTED] Mon Aug 21 13:54:02 2006
>Return-Path: <[EMAIL PROTECTED]>
>Delivered-To: jphekman-arborius:[EMAIL PROTECTED]
>X-Envelope-To: [EMAIL PROTECTED]
>Received: (qmail 88561 invoked from network); 21 Aug 2006 13:54:02 -
>Received: from localhost.pair.com (HELO misilay.pair.com) (127.0.0.1)
>  by localhost.pair.com with SMTP; 21 Aug 2006 13:54:02 -
>Received: from mail.apache.org (hermes.apache.org [209.237.227.199])
>by misilay.pair.com (Postfix) with SMTP id 2F108C9397
>for <[EMAIL PROTECTED]>; Mon, 21 Aug 2006 09:54:02 -0400 
>(EDT)
>Received: (qmail 59925 invoked by uid 500); 21 Aug 2006 13:54:01 -
>Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
>List-Help: 
>List-Post: 
>List-Subscribe: 
>Date: 21 Aug 2006 13:54:01 -
>Message-ID: <[EMAIL PROTECTED]>
>From: [EMAIL PROTECTED]
>To: [EMAIL PROTECTED]
>Delivered-To: responder for users@spamassassin.apache.org
>Received: (qmail 59916 invoked by uid 99); 21 Aug 2006 13:54:01 -
>Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49)
>by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Aug 2006 06:54:01 
>-0700
>Received: from [66.92.76.133] (HELO pendaran.arborius.net) 
>(66.92.76.133)
>by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Aug 2006 06:53:59 
>-0700
>Received: by pendaran.arborius.net (Postfix, from userid 1001)
>id 0B547F2D21; Mon, 21 Aug 2006 09:53:39 -0400 (EDT)
>Received: by pendaran.arborius.net (tmda-sendmail, from uid 1001);
>Mon, 21 Aug 2006 09:53:38 -0400 (EDT)
>MIME-Version: 1.0
>Content-type: text/plain; charset=us-ascii
>List-Unsubscribe: 
>
>Subject: WELCOME to users@spamassassin.apache.org
>
>
>SA's headers after processing were:
>
>X-Spam-Flag: YES
>X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on 
>misilay.pair.com
>X-Spam-Level: 
>X-Spam-Status: Yes, score=4.4 required=3.5 tests=MISSING_HB_SEP,
>MISSING_SUBJECT,NO_RECEIVED,NO_RELAYS,TO_CC_NONE autolearn=no
>version=3.1.3
>X-Spam-Report:
>* -0.0 NO_RELAYS Informational: message was not relayed via SMTP
>*  2.5 MISSING_HB_SEP Missing blank line between message header and 
> body
>*  1.7 MISSING_SUBJECT Missing Subject: header
>* -0.0 NO_RECEIVED Informational: message has no Received headers
>*  0.1 TO_CC_NONE No To: or Cc: header
>
>Thanks for any help you can give!
>
>Jessica
>
>


Re: MISSING_SUBJECT always matching

2006-08-21 Thread Jessica Perry Hekman
Well, that got me going in the right direction -- it sounded reasonable 
so I started mucking with some messages that had come in, and what I 
discovered is that all incoming messages were getting two copies of 
their "From:" lines written, one with a preceding ">". I imagine SA 
reaches that ">" and decides it's all done with headers. My next step 
would be to figure out who's putting in that ">From" line -- qmail? 
procmail? But that wouldn't seem to be this list's problem.

Thanks very much!

Jessica

On Mon, Aug 21, 2006 at 06:57:27PM +0100, Justin Mason wrote:
> 
> hi Jessica --
> 
> I would suggest checking line endings -- that's a classic symptom
> of \r\n being used where other parts of the mail delivery pipeline
> are expecting \n.
> 
> --j.
> 
> Jessica Perry Hekman writes:
> >Hi all. I just started using spamassassin for the first time. It's 
> >marking everything as spam, because MISSING_SUBJECT is always matching, 
> >although the mail does have Subject: lines.
> >
> >I searched the archives, and found mail from someone else in 2005 with 
> >this problem. The suggestion there was that perhaps the headers were 
> >garbled so that SA wasn't parsing them properly. I looked at the 
> >incoming headers and see nothing wrong -- here are the headers for the 
> >mail confirming my subscription to this list (which was marked as spam 
> >due to MISSING_SUBJECT and MISSING_TO and other things):
> >
> From [EMAIL PROTECTED] Mon Aug 21 13:54:02 2006
> >Return-Path: <[EMAIL PROTECTED]>
> >Delivered-To: jphekman-arborius:[EMAIL PROTECTED]
> >X-Envelope-To: [EMAIL PROTECTED]
> From [EMAIL PROTECTED] Mon Aug 21 13:54:02 2006
> >Return-Path: <[EMAIL PROTECTED]>
> >Delivered-To: jphekman-arborius:[EMAIL PROTECTED]
> >X-Envelope-To: [EMAIL PROTECTED]
> >Received: (qmail 88561 invoked from network); 21 Aug 2006 13:54:02 -
> >Received: from localhost.pair.com (HELO misilay.pair.com) (127.0.0.1)
> >  by localhost.pair.com with SMTP; 21 Aug 2006 13:54:02 -
> >Received: from mail.apache.org (hermes.apache.org [209.237.227.199])
> >by misilay.pair.com (Postfix) with SMTP id 2F108C9397
> >for <[EMAIL PROTECTED]>; Mon, 21 Aug 2006 09:54:02 -0400 
> >(EDT)
> >Received: (qmail 59925 invoked by uid 500); 21 Aug 2006 13:54:01 -
> >Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
> >List-Help: 
> >List-Post: 
> >List-Subscribe: 
> >Date: 21 Aug 2006 13:54:01 -
> >Message-ID: <[EMAIL PROTECTED]>
> >From: [EMAIL PROTECTED]
> >To: [EMAIL PROTECTED]
> >Delivered-To: responder for users@spamassassin.apache.org
> >Received: (qmail 59916 invoked by uid 99); 21 Aug 2006 13:54:01 -
> >Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49)
> >by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Aug 2006 06:54:01 
> >-0700
> >Received: from [66.92.76.133] (HELO pendaran.arborius.net) 
> >(66.92.76.133)
> >by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Aug 2006 06:53:59 
> >-0700
> >Received: by pendaran.arborius.net (Postfix, from userid 1001)
> >id 0B547F2D21; Mon, 21 Aug 2006 09:53:39 -0400 (EDT)
> >Received: by pendaran.arborius.net (tmda-sendmail, from uid 1001);
> >Mon, 21 Aug 2006 09:53:38 -0400 (EDT)
> >MIME-Version: 1.0
> >Content-type: text/plain; charset=us-ascii
> >List-Unsubscribe: 
> >
> >Subject: WELCOME to users@spamassassin.apache.org
> >
> >
> >SA's headers after processing were:
> >
> >X-Spam-Flag: YES
> >X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on 
> >misilay.pair.com
> >X-Spam-Level: 
> >X-Spam-Status: Yes, score=4.4 required=3.5 tests=MISSING_HB_SEP,
> >MISSING_SUBJECT,NO_RECEIVED,NO_RELAYS,TO_CC_NONE autolearn=no
> >version=3.1.3
> >X-Spam-Report:
> >* -0.0 NO_RELAYS Informational: message was not relayed via SMTP
> >*  2.5 MISSING_HB_SEP Missing blank line between message header and 
> > body
> >*  1.7 MISSING_SUBJECT Missing Subject: header
> >* -0.0 NO_RECEIVED Informational: message has no Received headers
> >*  0.1 TO_CC_NONE No To: or Cc: header
> >
> >Thanks for any help you can give!
> >
> >Jessica
> >
> >


Re: MISSING_SUBJECT always matching

2006-08-21 Thread Mark Martinec
Jessica Perry Hekman writes:
> Hi all. I just started using spamassassin for the first time. It's
> marking everything as spam, because MISSING_SUBJECT is always matching,
> although the mail does have Subject: lines.

Btw, the last time the very same thing happened to me was because of an
unrelated mistake in the local.cf file. Always check the configuration with:

  spamassassin --lint


Mark


Re: FuzzyOcr mailing list

2006-08-21 Thread Bill Maidment

Hi
Thanks for a great addition to Spamassassin. Please keep it on the 
Spamassassin list (except for configuration help). I have found it very 
useful so far using the 2.2 beta1.
I have made a small change for my purposes, which was needed because the 
temporary files were not unique using mimedefang. I added a date/time 
stamp to the file names in the perl  module:


my $firstline = ($p->decode())[0];
-->my $ts = time;
-->my $tempfile = $tmppath . "/" . "spamassassin.$$" . $ts . ".focr";
-->my $errfile = $tmppath . "/" . "spamassassin.$$" . $ts . ".focr.err";
if ($firstline =~ /^\x47\x49\x46/) {

This is useful if the temp files get left behind after a failure like this:
Aug 21 20:38:21 b090lx4 mimedefang-multiplexor[26383]: Slave 1 stderr: 
giftopnm:
Aug 21 20:38:21 b090lx4 mimedefang-multiplexor[26383]: Slave 1 stderr: 
error reading magic number

Aug 21 20:38:21 b090lx4 mimedefang-multiplexor[26383]: Slave 1 stderr:
Aug 21 20:38:21 b090lx4 mimedefang-multiplexor[26383]: Slave 1 stderr: 
(null):
Aug 21 20:38:21 b090lx4 mimedefang-multiplexor[26383]: Slave 1 stderr: 
EOF / read error reading magic number

Aug 21 20:38:21 b090lx4 mimedefang-multiplexor[26383]: Slave 1 stderr:

BTW What causes this type of error?
Cheers
Bill

--
Bill Maidment
Maidment Enterprises Pty Ltd
www.maidment.com.au

si hoc non legere potes tu asinus es



SA logging options wrong uid Debian-exim sa-stats

2006-08-21 Thread Stefan Bauer

Hello List,

iam using Debian with Spamassasin 3.1.1-1 and exim 4.62.

Iam looking forward to use sa-stats[1] with the stats from spamassasin 
from /var/log/exim4/mainlog.log like:


Aug 21 17:58:51 main spamd[4064]: spamd: result: . -1 - AWL,BAYES_00
scantime=2.3,size=5146,user=Debian-exim,uid=104,required_score=3.0,rhost=localhost.
localdomain,raddr=127.0.0.1,rport=49475,mid=<[EMAIL PROTECTED]>,rmid=
<[EMAIL PROTECTED]>,bayes=1.11668452262847e-11,autolearn=no

this works but not very well. Spamassasin logs to the file above but 
the user=Debian-exim part is always Debian-exim. How can i setup 
Spamamsassin to log the files or deliver the files under the uid of 
the user who received the mails?


Running sa-stats only let me get stats[2] for the user Debian-exim 
which lists all mails.


So my question is how can i negotiate SA to deliver the mails under 
the UID of the users to get usable logs?


[1] http://david.hexstream.co.uk/scripts/sa-stats/sa-stats.pl.html
[2] http://www.plzk.de/stats/spam

--
thanks in advance

Stefan Bauer

-->
www.plzk.de - www.plzk.com
---<


Bayes SQL Errors

2006-08-21 Thread Ryan Kather
I am having a few problems converting from Berkely DB to MySQL w/ InnoDB.  I 
have created the DB, Tables, and updated the local.cf.  Everything appears ok, 
but when I attempt to restore my Berkely DB backup with sa-learn --restore 
filename.  I get the following errors.  

[21508] dbg: bayes: using username: somuser
[21508] dbg: bayes: database connection established
[21508] dbg: bayes: found bayes db version 3
[21508] dbg: bayes: unable to initialize database for someuser, aborting!
[21508] dbg: config: score set 1 chosen.
[21508] dbg: bayes: database connection established
[21508] dbg: bayes: found bayes db version 3
[21508] dbg: bayes: unable to initialize database for someuser, aborting!
[21508] dbg: bayes: database connection established
[21508] dbg: bayes: found bayes db version 3
[21508] dbg: bayes: using userid: 1
[21508] dbg: bayes: _put_token: SQL error: Duplicate entry '1-' for key 1
[21508] dbg: bayes: error inserting token for line: t 648 899 1156175812 
c0614089c0
[21508] dbg: bayes: _put_token: SQL error: Duplicate entry '1-' for key 1
[21508] dbg: bayes: error inserting token for line: t 253 160 1156151124 
90775ea219
[21508] dbg: bayes: _put_token: SQL error: Duplicate entry '1-' for key 1
.
bayes: encountered too many errors (20) while parsing token line, reverting to 
empty database and exiting
ERROR: Bayes restore returned an error, please re-run with -D for more 
information

I'm using SpamAssassin 3.1.4 with MySQL 5.0.24 on Intel EM64T (Gentoo 2006.0).  
I've tried reverting to MyISAM with no change.  Any ideas?
AWL w/ SQL seems to be working.  Thanks for any guidance.

Configuration:
-
local.cf settings:
# SpamAssassin SQL Based Bayesian
bayes_store_module  Mail::SpamAssassin::BayesStore::SQL
bayes_sql_dsn   DBI:mysql:somedatabase:localhost
bayes_sql_username  someuser
bayes_sql_password  somepassword
bayes_sql_override_username someuser

# SpamAssassin SQL Based AWL
auto_whitelist_factory  Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsnDBI:mysql:somedatabase:localhost
user_awl_sql_username   someuser
user_awl_sql_password   somepassword

SQL Schema:
CREATE TABLE awl (
  username varchar(100) NOT NULL default '',
  email varchar(200) NOT NULL default '',
  ip varchar(10) NOT NULL default '',
  count int(11) default '0',
  totscore float default '0',
  PRIMARY KEY  (username,email,ip)
) TYPE=InnoDB;

CREATE TABLE bayes_expire (
  id int(11) NOT NULL default '0',
  runtime int(11) NOT NULL default '0',
  KEY bayes_expire_idx1 (id)
) TYPE=MyInnoDB;

CREATE TABLE bayes_global_vars (
  variable varchar(30) NOT NULL default '',
  value varchar(200) NOT NULL default '',
  PRIMARY KEY  (variable)
) TYPE=InnoDB;

INSERT INTO bayes_global_vars VALUES ('VERSION','3');

CREATE TABLE bayes_seen (
  id int(11) NOT NULL default '0',
  msgid varchar(200) binary NOT NULL default '',
  flag char(1) NOT NULL default '',
  PRIMARY KEY  (id,msgid)
) TYPE=InnoDB;

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=InnoDB;

CREATE TABLE bayes_vars (
  id int(11) NOT NULL AUTO_INCREMENT,
  username varchar(200) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  token_count int(11) NOT NULL default '0',
  last_expire int(11) NOT NULL default '0',
  last_atime_delta int(11) NOT NULL default '0',
  last_expire_reduce int(11) NOT NULL default '0',
  oldest_token_age int(11) NOT NULL default '2147483647',
  newest_token_age int(11) NOT NULL default '0',
  PRIMARY KEY  (id),
  UNIQUE bayes_vars_idx1 (username)
) TYPE=InnoDB;



Re: Bayes SQL Errors

2006-08-21 Thread Ryan Kather
Interesting.. if I try to import my old Berkely DB it fails right out.  If I 
process a message auto learn complains about the duplicate key again, but it 
actually learns the tokens and I can see them with sa-learn --dump magic.  

What key is duplicate?  I'm still unclear on how to resolve this?  I took the 
schema syntax directly from the bayes mysql readme for 3.1.4.  

>>> "Ryan Kather" <[EMAIL PROTECTED]> 08/21/06 04:28PM >>>
I am having a few problems converting from Berkely DB to MySQL w/ InnoDB.  I 
have created the DB, Tables, and updated the local.cf.  Everything appears ok, 
but when I attempt to restore my Berkely DB backup with sa-learn --restore 
filename.  I get the following errors.  

[21508] dbg: bayes: using username: somuser
[21508] dbg: bayes: database connection established
[21508] dbg: bayes: found bayes db version 3
[21508] dbg: bayes: unable to initialize database for someuser, aborting!
[21508] dbg: config: score set 1 chosen.
[21508] dbg: bayes: database connection established
[21508] dbg: bayes: found bayes db version 3
[21508] dbg: bayes: unable to initialize database for someuser, aborting!
[21508] dbg: bayes: database connection established
[21508] dbg: bayes: found bayes db version 3
[21508] dbg: bayes: using userid: 1
[21508] dbg: bayes: _put_token: SQL error: Duplicate entry '1-' for key 1
[21508] dbg: bayes: error inserting token for line: t 648 899 1156175812 
c0614089c0
[21508] dbg: bayes: _put_token: SQL error: Duplicate entry '1-' for key 1
[21508] dbg: bayes: error inserting token for line: t 253 160 1156151124 
90775ea219
[21508] dbg: bayes: _put_token: SQL error: Duplicate entry '1-' for key 1
..
bayes: encountered too many errors (20) while parsing token line, reverting to 
empty database and exiting
ERROR: Bayes restore returned an error, please re-run with -D for more 
information

I'm using SpamAssassin 3.1.4 with MySQL 5.0.24 on Intel EM64T (Gentoo 2006.0).  
I've tried reverting to MyISAM with no change.  Any ideas?
AWL w/ SQL seems to be working.  Thanks for any guidance.

Configuration:
-
local.cf settings:
# SpamAssassin SQL Based Bayesian
bayes_store_module  Mail::SpamAssassin::BayesStore::SQL
bayes_sql_dsn   DBI:mysql:somedatabase:localhost
bayes_sql_username  someuser
bayes_sql_password  somepassword
bayes_sql_override_username someuser

# SpamAssassin SQL Based AWL
auto_whitelist_factory  Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsnDBI:mysql:somedatabase:localhost
user_awl_sql_username   someuser
user_awl_sql_password   somepassword

SQL Schema:
CREATE TABLE awl (
  username varchar(100) NOT NULL default '',
  email varchar(200) NOT NULL default '',
  ip varchar(10) NOT NULL default '',
  count int(11) default '0',
  totscore float default '0',
  PRIMARY KEY  (username,email,ip)
) TYPE=InnoDB;

CREATE TABLE bayes_expire (
  id int(11) NOT NULL default '0',
  runtime int(11) NOT NULL default '0',
  KEY bayes_expire_idx1 (id)
) TYPE=MyInnoDB;

CREATE TABLE bayes_global_vars (
  variable varchar(30) NOT NULL default '',
  value varchar(200) NOT NULL default '',
  PRIMARY KEY  (variable)
) TYPE=InnoDB;

INSERT INTO bayes_global_vars VALUES ('VERSION','3');

CREATE TABLE bayes_seen (
  id int(11) NOT NULL default '0',
  msgid varchar(200) binary NOT NULL default '',
  flag char(1) NOT NULL default '',
  PRIMARY KEY  (id,msgid)
) TYPE=InnoDB;

CREATE TABLE bayes_token (
  id int(11) NOT NULL default '0',
  token char(5) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  atime int(11) NOT NULL default '0',
  PRIMARY KEY  (id, token),
  INDEX bayes_token_idx1 (token),
  INDEX bayes_token_idx2 (id, atime)
) TYPE=InnoDB;

CREATE TABLE bayes_vars (
  id int(11) NOT NULL AUTO_INCREMENT,
  username varchar(200) NOT NULL default '',
  spam_count int(11) NOT NULL default '0',
  ham_count int(11) NOT NULL default '0',
  token_count int(11) NOT NULL default '0',
  last_expire int(11) NOT NULL default '0',
  last_atime_delta int(11) NOT NULL default '0',
  last_expire_reduce int(11) NOT NULL default '0',
  oldest_token_age int(11) NOT NULL default '2147483647',
  newest_token_age int(11) NOT NULL default '0',
  PRIMARY KEY  (id),
  UNIQUE bayes_vars_idx1 (username)
) TYPE=InnoDB;



Re: FuzzyOcr mailing list

2006-08-21 Thread Theo Van Dinter
On Mon, Aug 21, 2006 at 08:48:10PM +1000, Bill Maidment wrote:
> I have made a small change for my purposes, which was needed because the 
> temporary files were not unique using mimedefang. I added a date/time 
> stamp to the file names in the perl  module:
> 
> -->my $ts = time;
> -->my $tempfile = $tmppath . "/" . "spamassassin.$$" . $ts . ".focr";
> -->my $errfile = $tmppath . "/" . "spamassassin.$$" . $ts . ".focr.err";

FWIW, there's a function in M::SA::Util (secure_tmpfile) to create temp
files which would avoid this issue.

-- 
Randomly Generated Tagline:
"Engineering does not require science.  Science helps a lot but people
 built perfectly good brick walls long before they knew why cement works."
 - Alan Cox


pgpJjHKk1zhPu.pgp
Description: PGP signature


Re: SA and MTA message filtering

2006-08-21 Thread Kenneth Porter
--On Friday, August 18, 2006 11:17 AM -0400 Sanford Whiteman 
<[EMAIL PROTECTED]> wrote:



Three  out  of your four objectives are markedly off-topic: there's no
reason  for  SA  to  ever see mail for unknown local recipients. Those
messages should be rejected by the MTA, using either your text file or
direct  LDAP  lookup:  you  should  Google  or  post elsewhere for the
specifics.  There's  a large archive of envelope-rejection methods for
every popular MTA.


I like to use "tracking addresses" when registering with websites, by 
adding "+websitename" after my username (eg. 
[EMAIL PROTECTED]). I can then tell how a spammer 
got my address. Alas, a lot of web coders exclude "+" as a valid character 
in an email address, so I end up using a dot instead, and using sendmail's 
wildcard recipient feature to route unknown addresses to me.


Anyone know how to get sendmail to recognize more than the "+" for this 
feature?





Custom Rule Filtering on X-Mailer Header Not Working

2006-08-21 Thread Kyle Harris




I'm having some difficulty getting a simple custom 
rule towork based on the X-Mailer used.  Here is the custom 
rule:
header 
SPAM_BAT X-Mailer =~ /The 
Bat!/idescribe SPAM_BAT   Potential spam 
client (The Bat)score 
SPAM_BAT  
8.0
Here is a portion of an example header that goes 
right through with nospam hits, and by the way, I have a bunch of 
these:
Microsoft Mail Internet Headers Version 
2.0Received: {content removed for security reasons}Received: {content 
removed for security reasons}Received: {content removed for security 
reasons}Received: {content removed for security reasons}Date: Sat, 19 
Aug 2006 15:52:23 -0100From: "asdf" <[EMAIL PROTECTED]>X-Mailer: 
The Bat! (v2.12.00) PersonalX-Priority: 3Message-ID: <[EMAIL PROTECTED]>To: [EMAIL PROTECTED]Subject: This 
transmission is an eyeopenerMIME-Version: 1.0Content-Type: 
multipart/alternative;    
boundary="--O4VYWC6U0DPTOV4HHOAO5"X-TRC-MailScanner-Information: 
Please contact the ISP for moreinformationX-TRC-MailScanner: Found to be 
cleanX-MailScanner-From: [EMAIL PROTECTED]Return-Path: [EMAIL PROTECTED]X-OriginalArrivalTime: 19 Aug 2006 18:10:19.0424 
(UTC)FILETIME=[BAEFAA00:01C6C3BA]
O4VYWC6U0DPTOV4HHOAO5Content-Type: 
text/html; charset=us-asciiContent-Transfer-Encoding: 7bit
O4VYWC6U0DPTOV4HHOAO5--
My Setup:SpamAssassin:  v3.14Called 
via MailScanner:  v4.55.10Perl:  5.8.5
Troubleshooting steps thus far:*  Ran 
spamassassin -D --lint.  No errors*  Confirmed that the file with 
the above rule is being read by puttingother rules in the file.  The 
other rules are found and executed.*  Attempted to search the entire 
hearder using ALL instead ofX-Mailer.  Same results.*  
Confirmed that the message in question is indeed getting routedthrough the 
MailScanner/Spamassassin filter.*  Confirmed that Spamassassin doesn't 
show as timed out.*  Stopped/Restearted and even rebooted the computer 
(yes, I know thatshould be necessary but wanted to make sure) to make sure 
the rules isbeing read.
I know someone will mention that it is not 
necessarily a good idea tofilter on a particular mailer, but I have 
confirmed that none of myclients use "The Bat!" and I am willing to take the 
chance.
Any ideas? 
 


FW: Bayes SQL Errors

2006-08-21 Thread Michael Grey


Ryan,

I just did this myself the first time in the last week;

Be sure that all your operations on the DB are done as the user who is going
to be accessing it; ie: Spamassassin spamuser etc.

Not knowing the history of your install; 

In your Spamassassin local.cf file you should have these lines, COMMENTED OUT
for now... You want spamassassin to use the berkely db for the moment.
#   bayes_store_module  Mail::SpamAssassin::BayesStore::MySQL
#   bayes_sql_dsn   DBI:mysql:bayes:localhost:3306
#   bayes_sql_username  spamassassin
#   bayes_sql_password  spampassword
#   bayes_sql_override_username spamassassin

First be sure that your B-DB is actually a vs 3.x by doing 
'sa-learn --sync'
This will ensure that the b-db format is 3.x compatible.

Next, do a sa-learn --backup >backup.txt.

Create the bayes DB in mysql, and then apply the tables using the templates (
that you obviously have ).

In mysql (as root) be sure to do :
-   grant all privileges on bayes.* to [EMAIL PROTECTED] identified
by 'spampassword'

Uncomment the bayse_* lines from Spamassassin local.cf, then su back as
spamassassin ( or whatever user is going to be accessing the db )

run 'sa-learn --restore ./backup.txt'  this places all the entries from
backup.txt into the mysql db. This should take a few minutes.

>From your errors, it looks like the import process into your db got messed
up. As root, go into mysql> and drop the bayes db and start again

Good luck...

Michael Grey


-Original Message-
From: Ryan Kather [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 21, 2006 1:29 PM
To: 

expire_old_tokens: "Out of memory"/"called with negative strlen"

2006-08-21 Thread Fabian Peters

Hi,

I'm using the p5-Mail-SpamAssassin-3.1.4 on FreeBSD 6 with perl  
v5.8.8. I've got a "virtualmail" user that handles all mail on the  
server. So there is one single .spamassassin directory:


total 7304
-rw---  1 virtualmail  virtualmail   1.3M Aug 21 21:06 auto- 
whitelist

-rw---  1 virtualmail  virtualmail   4.6K Aug 21 21:06 bayes_journal
-rw---  1 virtualmail  virtualmail   2.5M Aug 21 21:06 bayes_seen
-rw---  1 virtualmail  virtualmail   5.4M Aug 21 21:06 bayes_toks
-rw-r--r--  1 virtualmail  virtualmail   1.1K Sep 27  2005 user_prefs

Since a couple of weeks this setup has started acting up:

spamd[26764]: bayes: expire_old_tokens: Out of memory during  
ridiculously large request at /usr/local/lib/perl5/site_perl/5.8.8/ 
Mail/SpamAssassin/BayesStore/DBM.pm line 625


The same error occurs when I manually run sa-learn -D --force-expire:

[16376] dbg: bayes: expiry starting
[16376] dbg: locker: refresh_lock: refresh /var/maildirs/ 
virtual/.spamassassin/bayes.lock
[16376] dbg: locker: refresh_lock: refresh /var/maildirs/ 
virtual/.spamassassin/bayes.lock

[16376] dbg: bayes: expiry check keep size, 0.75 * max: 112500
[16376] dbg: bayes: token count: 231691, final goal reduction size:  
119191
[16376] dbg: bayes: first pass? current: 1156194876, Last:  
1155016110, atime: 2764800, count: 2729, newdelta: 63302, ratio:  
43.6757053865885, period: 43200
[16376] dbg: bayes: can't use estimation method for expiry,  
unexpected result, calculating optimal atime delta (first pass)

[16376] dbg: bayes: expiry max exponent: 9
bayes: expire_old_tokens: Out of memory during ridiculously large  
request at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/ 
BayesStore/DBM.pm line 625.


I've tried sa-learn -D --rebuild, but this results in the same error:

[16456] dbg: bayes: expiry starting
[16456] dbg: locker: refresh_lock: refresh /var/maildirs/ 
virtual/.spamassassin/bayes.lock
[16456] dbg: locker: refresh_lock: refresh /var/maildirs/ 
virtual/.spamassassin/bayes.lock
[16456] dbg: bayes: DB expiry: tokens in DB: 231734, Expiry max size:  
15, Oldest atime: 1152252402, Newest atime: 1156194971, Last  
expire: 1155016110, Current time: 1156195117

[16456] dbg: bayes: expiry check keep size, 0.75 * max: 112500
[16456] dbg: bayes: token count: 231734, final goal reduction size:  
119234
[16456] dbg: bayes: first pass? current: 1156195117, Last:  
1155016110, atime: 2764800, count: 2729, newdelta: 63280, ratio:  
43.6914620740198, period: 43200
[16456] dbg: bayes: can't use estimation method for expiry,  
unexpected result, calculating optimal atime delta (first pass)

[16456] dbg: bayes: expiry max exponent: 9
bayes: expire_old_tokens: Out of memory during ridiculously large  
request at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/ 
BayesStore/DBM.pm line 625.


I've searched the archive and saw that others had the same problem  
before and also on FreeBSD, but there were no solutions to be  
found... Is there any way to fix this - short of throwing away the  
bayes_* files?


TIA

Fabian


Re: expire_old_tokens: "Out of memory"/"called with negative strlen"

2006-08-21 Thread Theo Van Dinter
On Mon, Aug 21, 2006 at 11:22:30PM +0200, Fabian Peters wrote:
> Since a couple of weeks this setup has started acting up:
> 
> bayes: expire_old_tokens: Out of memory during ridiculously large  
> request at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/ 
> BayesStore/DBM.pm line 625.
> 
> I've searched the archive and saw that others had the same problem  
> before and also on FreeBSD, but there were no solutions to be  
> found... Is there any way to fix this - short of throwing away the  
> bayes_* files?

Interesting.  Do you have a memory limit in place ala limit or ulimit?

I took a look at DBM.pm line 625, and there actually isn't any DB-request
there, though that line is in a loop going through each of the DB entries
which should be a very straightforward operation.

My only thought, other than a low memory limit, is that you may want to try
upgrading your libdb/DB_File installation.

-- 
Randomly Generated Tagline:
You're everywhere.  You're omnivorous.
 
-- Homer Simpson, to God
   There's No Disgrace Like Home


pgpQLQ2ej465u.pgp
Description: PGP signature


Re: a new kind of spam (with images)

2006-08-21 Thread jdow

From: "Spamassassin List" <[EMAIL PROTECTED]>


Stephane Bentebba wrote:

hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go throught it :
spam which has poor text, poor token, or none, and a subject always 
changing

the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news concerning..." 
or "we have a runner !"

would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and 
recognize the image ?
i can send you samples of this spam if you like... (prefer not to attach 
them)

Have a look at FuzzyOCR
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin

Works very well for me - I'm using it in conjuction with ImageInfo and 
since I'm using them those image spams get through VERY rarely


They will also block off legit emails too


That's not likely with the Fuzzy OCR plugin unless you get a lot of
images in email with marketing words in them.

{^_^}


Re: a new kind of spam (with images)

2006-08-21 Thread jdow

From: "Spamassassin List" <[EMAIL PROTECTED]>


Spamassassin List wrote:

Stephane Bentebba wrote:

hi all,

i am more or less happy with my spamassassin configuration
works good for one year
but i have problem with a new kind of spam which easylly go
throught it :
spam which has poor text, poor token, or none, and a subject
always changing
the only thing which remain the same is the image incoporated in it
it get always very low hit (bellow 3)
subject on the image in the body is either "breaking news
concerning..." or "we have a runner !"
would it be possible to find a solution ?
add / modify a test to look at first bytes of an attachement and
recognize the image ?
i can send you samples of this spam if you like... (prefer not to
attach them)

Have a look at FuzzyOCR
http://wiki.apache.org/spamassassin/FuzzyOcrPlugin

Works very well for me - I'm using it in conjuction with ImageInfo
and since I'm using them those image spams get through VERY rarely

They will also block off legit emails too

How so?


I wouldn't expect any from FuzzyOCR but ImageInfo certainly has the chance 
to block legit mail.


Sorry, I meant ImageInfo plugin.. I have many legit emails blocked by this 
plugin.


Reduce its score and perhaps use it in meta rules.
{^_^}


RE: Bayes SQL Errors

2006-08-21 Thread Ryan Kather
Michael, 

Thanks for the help, but it didn't seem to make a difference.  Though I have 
noticed some strange things.. See steps taken and responses inline.  

I have dropped the database completely, commented out the local.cf bayes_sql 
and bayes_store settings, and performed an sa-learn --sync.  At this point 
sa-learn --dump magic shows zero items in the berkely db Bayesian database.  I 
then restored my previous dump sa-learn --restore backup.  Now I am able to see 
270,000 tokens.  I performed another sa-learn --sync without issue.  

At this point I created the spam database; 

 - msyql -u root -h localhost -p
 - create database bayes;
 - grant all privileges on bayes.* to [EMAIL PROTECTED] identified by 
'spampassword';
 - flush privileges;
 - quit;

I then imported my tables as MyISAM.

 - mysql -u root -h localhost -D bayes -p < bayes_mysql.sql

I verified I could login as amavis and see the tables as MyISAM.

 - msyql -u amavis -p 
 - use bayes;
 - show table status;

+---++-++--++-+--+--+---++-+-++-+--++-+
| Name  | Engine | Version | Row_format | Rows | Avg_row_length | 
Data_length | Max_data_length  | Index_length | Data_free | Auto_increment | 
Create_time | Update_time | Check_time | Collation   | 
Checksum | Create_options | Comment |
+---++-++--++-+--+--+---++-+-++-+--++-+
| bayes_expire  | MyISAM |  10 | Fixed  |0 |  0 |   
0 | 2533274790395903 | 1024 | 0 |   NULL | 
2006-08-21 17:44:49 | 2006-08-21 17:44:49 | NULL   | utf8_general_ci | 
NULL || |
| bayes_global_vars | MyISAM |  10 | Dynamic|1 | 20 |   
   20 |  281474976710655 | 2048 | 0 |   NULL | 
2006-08-21 17:44:49 | 2006-08-21 17:44:49 | NULL   | utf8_general_ci | 
NULL || |
| bayes_seen| MyISAM |  10 | Dynamic|0 |  0 |   
0 |  281474976710655 | 1024 | 0 |   NULL | 
2006-08-21 17:44:49 | 2006-08-21 17:44:49 | NULL   | utf8_general_ci | 
NULL || |
| bayes_token   | MyISAM |  10 | Fixed  |0 |  0 |   
0 | 9007199254740991 | 1024 | 0 |   NULL | 
2006-08-21 17:44:49 | 2006-08-21 17:44:49 | NULL   | utf8_general_ci | 
NULL || |
| bayes_vars| MyISAM |  10 | Dynamic|0 |  0 |   
0 |  281474976710655 | 1024 | 0 |  1 | 
2006-08-21 17:44:49 | 2006-08-21 17:44:49 | NULL   | utf8_general_ci | 
NULL || |
+---++-++--++-+--+--+---++-+-++-+--++-+
5 rows in set (0.00 sec)

I archived the tokens I have previously imported sa-learn --backup newbackup

I re-enabled the settings for bayes_sql and bayes_store in local.cf 

 - bayes_store_module  Mail::SpamAssassin::BayesStore::SQL
 - bayes_sql_dsn   DBI:mysql:bayes:localhost
 - bayes_sql_username  amavis
 - bayes_sql_password  spampassword
 - bayes_sql_override_username amavis

If I run an sa-learn -D --dump magic I get the following failure;

 - [24375] dbg: bayes: using username: amavis
 - [24375] dbg: bayes: database connection established
 - [24375] dbg: bayes: found bayes db version 3
 - [24375] dbg: bayes: unable to initialize database for amavis user, aborting!
 - [24375] dbg: config: score set 1 chosen.
 - [24375] dbg: bayes: database connection established
 - [24375] dbg: bayes: found bayes db version 3
 - [24375] dbg: bayes: unable to initialize database for amavis user, aborting!

I can process a mesage and learn it as ham, but there are still errors reported 
during the learning proess.  

 - [24312] dbg: learn: auto-learn: currently using scoreset 1
 - [24312] dbg: learn: auto-learn: message score: 7.431, computed score for 
autolearn: 7.431
 - [24312] dbg: learn: auto-learn? ham=11, spam=12, body-points=2.798, 
head-points=6.086, learned-points=0
 - [24312] dbg: learn: auto-learn? yes, ham (7.431 < 11)
 - [24312] dbg: learn: initializing learner
 - [24312] dbg: learn: learning ham
 - [24312] dbg: eval: all '*From' addrs: [EMAIL PROTECTED] 
 - [24312] dbg: eval: all '*To' addrs: [EMAIL PROTECTED] 
 - [24312] dbg

Re: Blocking based on ALL IPs in the header

2006-08-21 Thread jdow

From: "Rob McEwen" <[EMAIL PROTECTED]>

Magnus Holmgren said:

It depends on the blacklist. Some, like Spamhaus SBL,
only list IP addresses known to be operated
by spammers (and not unsuspecting home users with
hijacked computers). SA scores mail with such IP
addresses in ANY Received line. For other lists, the
first hop is ignored unless it's the *only* hop.


The software this company uses for spam filtering is **not** SA. And it
treats all RBLs the same... ANY RBL that is set up in a user's system will
check against ALL IPs in the header.

Therefore, in this situation, if the solution is to disable RBLs which
target zombies, and ONLY keep RBLs like SBL, then that is like getting a
lobotomy to fix a headache.

Sure, the FP problem would go away, but the spam caught by RBL lookups would
decrease dramatically.

In contrast, if ONLY the sending server's IP were checked... and RBLs like
XBL were ALSO used, then the FP problem would ALSO go away, but without any
noticeable decrease in the percent of spam caught by RBL lookups.

You might ask, why did I post this in the first place... forgive me for
being so off-topic... but I have these guys at this big software company and
this big bank who seem to think I'm the one who has lost his mind... So I
was hoping for to feedback to make sure that I'm not the one who is crazy
here!

Rob McEwen

<>
TOP SPAM RULES FIRED

RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

18 HOST_EQ_D_D_D_D 45 0.010.06   19.320.08
42 HOST_EQ_D_D_D_DB  1341 0.461.898.230.07

The first is the header rule. It's fairly good with few false alarms.
The second is the body rule. It's not as effective with the same
small false alarm rate.

If you mean what this rule catches (an all numeric helo):
header   HOST_EQ_D_D_D_D  X-Spam-Relays-Untrusted =~ /^[^\]]+ 
rdns=[^ ]+\d{1,3}[^0-9]\d{1,3}[^0-9]\d{1,3}[^0-9]\d{1,3}[^ ]+ /


then I'd say it was "pretty good but not perfect."

I'd also say that a company using a nest of BLs with no ranking on their
performance is a dumb company. Banks are known to be dumber than dirt in
many instances. Greylisting on the basis if the BLs makes some sense.
Blocking outright makes no sense. Scoring systems such as SpamAssassin
give you a soft fail from which you can recover. It's like the commercial
power to the bank's computers failing. The SpamAssassin scoring allows you
to continue after a first line or even second or third line failure. Sort
by scores and check the low scores for potential mis-marked ham. Otherwise
the bank may be throwing away money.

{^_^}   Joanne Dow



Re: Custom Rule Filtering on X-Mailer Header Not Working

2006-08-21 Thread jdow

From: "Kyle Harris" <[EMAIL PROTECTED]>


I'm having some difficulty getting a simple custom rule to
work based on the X-Mailer used.  Here is the custom rule:

header SPAM_BAT X-Mailer =~ /The Bat!/i

header SPAM_BAT X-Mailer =~ /The Bat\!/i

Try that.
{^_^}



Re: Enumerating the robots?

2006-08-21 Thread jdow

From: "DAve" <[EMAIL PROTECTED]>


Loren Wilton wrote:
It was mentioned that several people are getting hammered by world-wide 
robot attacks.  I see from the little spam I get that there is a new 
spam sending tool for robots that is running a stock spam.  I suspect 
the traffic is a combination of distributing the new spam tool and 
sending out the new spam.


With all this traffic from robots, lots of people here must be getting 
quite a lot of information in their logs about connections from robots.  
I wonder if there would be value in a central database that attempts to 
enumerater the robots?


Most of them are probably on dynamic ip.  But if the sending IP and 
attempted connect time could be logged at many sites and combined, there 
would be fairly conclusive evidence that a given IP had been sending 
spam at a particular time.  Perhaps that could be submitted to at least 
some of the more responsible service providers, and they could do 
something to track it back to a customer and send them an email that 
their machine is infected. (Or possibly be even more proactive, I suppose.)


The database might also be usable in front door spam blocking.  Most 
people probably shouldn't be accepting direct connections from dynamic 
ips on someone else's network, especially if that ip has a recent 
history of sending spam (say in the last 6 hours or so).  It might be 
possible to make a server that could provide yes/no answers on whether 
the IP has sent spam in the last minute/hour/6 hours/day or so.


I'd think that such a database could be built almost automatically.  For 
instance, if you log the IPs of connection attempts that you reject for 
various problems, you could just harvest those IPs once an hour or so to 
some central site, no human judgement calls required.  If the mail is 
accepted and gets a high SA score, and you can still determine the 
sending IP, then that might be automatically harvested also.


Thoughts?  Does somethign like this have any value?

   Loren


Something like http://dhsield.org, but limited to email instead of all 
ports?


Don't know. (Not going to click on THAT link. It looks like it might
lead to a typo squatter potentially with malware. {^_-}) But I suspect
the answer is yes.

{^_^}


Re: Feeding bayes outbounds

2006-08-21 Thread jdow

From: "Joe Zitnik" <[EMAIL PROTECTED]>


Our scanning program has the ability to archive all e-mail, both inbound
and outbound, which we have been doing for months now.  Given that your
outbound mail is almost certainly ham, the majority of it's content is
going to be specific to our business sector, wouldn't feeding outbounds
through bayes manually be a win win situation?  Am I oversimplifying
things, or am I missing something with that logic?


If the terms in the outbound mail are likely to be the same as
acceptable terms on the inbound mail that may be true. If your
outbound mail you have captured is not all pure business it might
reduce the Bayes accuracy somewhat.

It might introduce a huge mismatch between ham and spam, also.

And it might introduce potential issues with email privacy on the
outgoing emails if you save them for a mass feed.

{^_^}


RE: Feeding bayes outbounds

2006-08-21 Thread Kurt Buff
| From: Joe Zitnik [mailto:[EMAIL PROTECTED]
|
| Our scanning program has the ability to archive all e-mail, 
| both inbound
| and outbound, which we have been doing for months now.  Given 
| that your
| outbound mail is almost certainly ham, the majority of it's content is
| going to be specific to our business sector, wouldn't feeding 
| outbounds
| through bayes manually be a win win situation?  Am I oversimplifying
| things, or am I missing something with that logic?


1) As long as you're sure that you don't have trojaned/zombied machines
pushing out stuff through your gateways, yes. Since you've been archiving
for months, this sounds likely. However, it might do to have something in
place that checks for this kind of threat - don't know what that would look
like, but it's a thought.

2) Manually? What do you mean by that? Initiating the learning process on a
directory/mbox manually, or manually inspecting all of the messages before
pulling the trigger on the learning script, both, or something else? Given
my druthers, I'd prefer to do something automated...

Kurt


  



Re: How can I (we) get rid of this?

2006-08-21 Thread jdow

From: "Anders Norrbring" <[EMAIL PROTECTED]>


Hiya all!
I'm getting really sick on recieving 10-100 of the attached mails every 
day. Any suggestions on how to get rid of them?  Apparently my 
Amavis-new and SpamAssassin only tags them from 0 to 1.6 points.


FuzzyOcr
FuzzyOcr
FuzzyOcr

I tell you three times so it must be true. (Sorry Reverend Dodgson.)

Seriously visit http://wiki.apache.org/spamassassin/FuzzyOcrPlugin.
Downloading and installing is a breeze. (I'm using the one from the
14th. A mouse told me, accurately or not, the one from the 17th might
be troublesome.)

{^_^}



Re: How can I (we) get rid of this?

2006-08-21 Thread jdow

From: "Anders Norrbring" <[EMAIL PROTECTED]>


Stuart Johnston skrev:

Anders Norrbring wrote:

Hiya all!
I'm getting really sick on recieving 10-100 of the attached mails 
every day. Any suggestions on how to get rid of them?  Apparently my 
Amavis-new and SpamAssassin only tags them from 0 to 1.6 points.


FuzzyOCR, ImageInfo, SARE, sa-update.


I haven't looked at FuzzyOCR or ImageInfo at all, are they compatible 
with SA 2.64?


The world is not compatible with 2.64. Update if at ALL possible.
(Note the special issues that may exist with bayes files.)

{o.o}


Re: expire_old_tokens: "Out of memory"/"called with negative strlen"

2006-08-21 Thread Fabian Peters

Am 21.08.2006 um 23:30 schrieb Theo Van Dinter:


On Mon, Aug 21, 2006 at 11:22:30PM +0200, Fabian Peters wrote:

Since a couple of weeks this setup has started acting up:

bayes: expire_old_tokens: Out of memory during ridiculously large
request at /usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/
BayesStore/DBM.pm line 625.

I've searched the archive and saw that others had the same problem
before and also on FreeBSD, but there were no solutions to be
found... Is there any way to fix this - short of throwing away the
bayes_* files?


Interesting.  Do you have a memory limit in place ala limit or ulimit?


No.

I took a look at DBM.pm line 625, and there actually isn't any DB- 
request
there, though that line is in a loop going through each of the DB  
entries

which should be a very straightforward operation.


Exactly.

My only thought, other than a low memory limit, is that you may  
want to try

upgrading your libdb/DB_File installation.


The DB_File is version 1.814 - which seems to be up to date?

Any more ideas?

cheers

Fabian

P.S.: One thing I didn't include in my original post: I copied  
the .spamassassin directory to my OSX machine (SA 3.0.4, perl 5.8.8)  
and ran sa-learn -D --force-expire:


debug: bayes: expiry check keep size, 0.75 * max: 112500
debug: bayes: token count: 230845, final goal reduction size: 118345
debug: bayes: First pass?  Current: 1156199218, Last: 1155016110,  
atime: 2764800, count: 2729, newdelta: 63755, ratio:  
43.3657017222426, period: 43200
debug: bayes: Can't use estimation method for expiry, something  
fishy, calculating optimal atime delta (first pass)

debug: bayes: expiry max exponent: 9
bayes expire_old_tokens: panic: sv_setpvn called with negative strlen  
at /opt/local/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin/ 
BayesStore/DBM.pm line 582.


This is the same place in the same loop as line 625 for version 3.1.4...


Re: How can I (we) get rid of this?

2006-08-21 Thread Richard
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

On 8/21/06 jdow wrote:
> Downloading and installing is a breeze

well, unless you're on OSX where you've got to find/build/install the
netpbm & gocr prereqs ...

 working 

- --

/"\
\ /  ASCII Ribbon Campaign
 X   against HTML email, vCards
/ \  & micro$oft attachments

[GPG] OpenMacNews at gmail dot com
fingerprint: 50C9 1C46 2F8F DE42 2EDB  D460 95F7 DDBD 3671 08C6
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Darwin)

iEYEAREDAAYFAkTqNqUACgkQlffdvTZxCMZeNQCfdA07qoMayjPk2XGYgw7xOWmY
nk8An0E+fvm8L3IxeF5jLiwVWpzTs+uF
=F2PC
-END PGP SIGNATURE-


Re: How can I (we) get rid of this?

2006-08-21 Thread John Rudd


On Aug 21, 2006, at 3:41 PM, Richard wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

On 8/21/06 jdow wrote:

Downloading and installing is a breeze


well, unless you're on OSX where you've got to find/build/install the
netpbm & gocr prereqs ...

 working 



Do post details of how it goes along.  Considering that's what I'll be 
doing for my home system...




Scanning forwarded email

2006-08-21 Thread Chris Mills (Chrysalis)
Hi everyone, I have a problem and would appreciate any help possible. I'm a novice to SA, but have been trying to find information on this without success.I am currently running SA 3.1.4 on a Fedora Linux server, hosting almost 100 domains and a couple thousand email accounts.
The server also runs sendmail, mailscanner, ensim, and I see procmail (none of which I am really that familiar with, with perhaps the exception of mailscanner which I have modified before).Problem: when i first got the server, it had an older version of SA installed, 
2.x. At the time, messages sent either directly to mailboxes on the server, or forwarded by the server to external email addresse via aliases, were all being scanned.Here's an example:Message sent to 
[EMAIL PROTECTED] (an alias)Which forwards to [EMAIL PROTECTED] (not scanned or flagged by SA)and forwards to [EMAIL PROTECTED] (not scanned or flagged by SA, and flooded with junk!)
and forwards to [EMAIL PROTECTED] (scanned and flagged, and downloaded by my office mail client)What I would like to do is get back the functionality that I used to have, where any alias on our server (any domain) is automatically subject to being scanned by SA before being forwarded.
The ONLY limitation with this is that there was a problem with Sendmail on the same server which saw the deliverymode option resulting in undelivered messages if set to "queueonly". It has to be set to "background". I don't know if this is something that will be an issue preventing this functionality from being activated.
Can someone please guide me through exactly what I need to do to get this back online? My customers are giving me hell!! Thanks,Chris


Re: Enumerating the robots?

2006-08-21 Thread DAve

jdow wrote:

From: "DAve" <[EMAIL PROTECTED]>


Loren Wilton wrote:
It was mentioned that several people are getting hammered by 
world-wide robot attacks.  I see from the little spam I get that 
there is a new spam sending tool for robots that is running a stock 
spam.  I suspect the traffic is a combination of distributing the new 
spam tool and sending out the new spam.


With all this traffic from robots, lots of people here must be 
getting quite a lot of information in their logs about connections 
from robots.  I wonder if there would be value in a central database 
that attempts to enumerater the robots?


Most of them are probably on dynamic ip.  But if the sending IP and 
attempted connect time could be logged at many sites and combined, 
there would be fairly conclusive evidence that a given IP had been 
sending spam at a particular time.  Perhaps that could be submitted 
to at least some of the more responsible service providers, and they 
could do something to track it back to a customer and send them an 
email that their machine is infected. (Or possibly be even more 
proactive, I suppose.)


The database might also be usable in front door spam blocking.  Most 
people probably shouldn't be accepting direct connections from 
dynamic ips on someone else's network, especially if that ip has a 
recent history of sending spam (say in the last 6 hours or so).  It 
might be possible to make a server that could provide yes/no answers 
on whether the IP has sent spam in the last minute/hour/6 hours/day 
or so.


I'd think that such a database could be built almost automatically.  
For instance, if you log the IPs of connection attempts that you 
reject for various problems, you could just harvest those IPs once an 
hour or so to some central site, no human judgement calls required.  
If the mail is accepted and gets a high SA score, and you can still 
determine the sending IP, then that might be automatically harvested 
also.


Thoughts?  Does somethign like this have any value?

   Loren


Something like http://dhsield.org, but limited to email instead of all 
ports?


Don't know. (Not going to click on THAT link. It looks like it might
lead to a typo squatter potentially with malware. {^_-}) But I suspect
the answer is yes.

{^_^}




Hmmm, dsheild, dhsield, dshield, six of one half dozen of the other ;^)

DAve

--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


Re: How can I (we) get rid of this?

2006-08-21 Thread DAve

jdow wrote:

From: "Anders Norrbring" <[EMAIL PROTECTED]>


Hiya all!
I'm getting really sick on recieving 10-100 of the attached mails 
every day. Any suggestions on how to get rid of them?  Apparently my 
Amavis-new and SpamAssassin only tags them from 0 to 1.6 points.


FuzzyOcr
FuzzyOcr
FuzzyOcr

I tell you three times so it must be true. (Sorry Reverend Dodgson.)

Seriously visit http://wiki.apache.org/spamassassin/FuzzyOcrPlugin.
Downloading and installing is a breeze. 


I really don't want to install X on my mailgateways. It would have to be 
as good as URIBL and SURBL before I would consider that.


Is there a way around the dependencies? The FreeBSD port shows the 
following, xorg-libraries-6.9.0, ghostscript-gnu-7.07_15, teTeX-3.0_1, 
tcl-8.4.13_1 (TCL?), and all their dependancies. Plus a lot more.


I'd need to look at configure --help.

DAve

--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


Formatting plugin report

2006-08-21 Thread John D. Hardin
Coders (if any):

Can anybody point me at a code sample showing how to get details into
the report SUMMARY tag from within a plugin?

Like the [IP address etc.] in this:

*  1.0 RBL_PSBL_01 RBL: Mail client listed by psbl.surriel.com
*  [64.8.111.2 listed in psbl.surriel.com]

I can't seem to figure it out.

Thanks!

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The difference is that Unix has had thirty years of technical
  types demanding basic functionality of it. And the Macintosh has
  had fifteen years of interface fascist users shaping its progress.
  Windows has the hairpin turns of the Microsoft marketing machine
  and that's all.-- Red Drag Diva
---
 29 days until Talk Like a Pirate day



Re: How can I (we) get rid of this?

2006-08-21 Thread Richard
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

>>> Downloading and installing is a breeze
>>
>> well, unless you're on OSX where you've got to find/build/install the
>> netpbm & gocr prereqs ...
>>
>>  working 
>>
> 
> Do post details of how it goes along.  Considering that's what I'll be
> doing for my home system...

well, holy toledo!

atm/imho, not worth the effort :-/

netpbm is *not* a typical auto-tools build ... and the config is not
darwin/.dylib-ready.  yes, it's apparently available as part of fink &
darwinports, as well as perhaps (? dunno ...) a precompiled lib in the
Gallery distro.

that said, I get good enough results with ImageInfo + the rest of by
env, without adding this complexity.

so, at least for now, i'm SOL.

richard

- --

/"\
\ /  ASCII Ribbon Campaign
 X   against HTML email, vCards
/ \  & micro$oft attachments

[GPG] OpenMacNews at gmail dot com
fingerprint: 50C9 1C46 2F8F DE42 2EDB  D460 95F7 DDBD 3671 08C6
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Darwin)

iEYEAREDAAYFAkTqc8wACgkQlffdvTZxCMan9wCgqEwnYGzW0L4CGmpKxdHjU2WZ
NcAAn3xUtm2MreTsblOKrjDsFbgirziJ
=nSKf
-END PGP SIGNATURE-


Re: Scanning forwarded email

2006-08-21 Thread jdow

From: "Chris Mills (Chrysalis)" <[EMAIL PROTECTED]>


Hi everyone, I have a problem and would appreciate any help possible. I'm a
novice to SA, but have been trying to find information on this without
success.

I am currently running SA 3.1.4 on a Fedora Linux server, hosting almost 100
domains and a couple thousand email accounts.
The server also runs sendmail, mailscanner, ensim, and I see procmail (none
of which I am really that familiar with, with perhaps the exception of
mailscanner which I have modified before).

Problem: when i first got the server, it had an older version of SA
installed, 2.x. At the time, messages sent either directly to mailboxes on
the server, or forwarded by the server to external email addresse via
aliases, were all being scanned.

Here's an example:

Message sent to [EMAIL PROTECTED] (an alias)
Which forwards to [EMAIL PROTECTED] (not scanned or flagged by SA)
and forwards to [EMAIL PROTECTED] (not scanned or flagged by SA, and flooded
with junk!)
and forwards to [EMAIL PROTECTED] (scanned and flagged, and downloaded
by my office mail client)

What I would like to do is get back the functionality that I used to have,
where any alias on our server (any domain) is automatically subject to being
scanned by SA before being forwarded.

The ONLY limitation with this is that there was a problem with Sendmail on
the same server which saw the deliverymode option resulting in undelivered
messages if set to "queueonly". It has to be set to "background". I don't
know if this is something that will be an issue preventing this
functionality from being activated.

Can someone please guide me through exactly what I need to do to get this
back online? My customers are giving me hell!!


That is not a SpamAssassin problem. That is your MTA or MDA screwing
you up. You made a change in that area and need to "unchange" it.

{^_^}


Re: How can I (we) get rid of this?

2006-08-21 Thread jdow

From: "DAve" <[EMAIL PROTECTED]>

jdow wrote:

From: "Anders Norrbring" <[EMAIL PROTECTED]>


Hiya all!
I'm getting really sick on recieving 10-100 of the attached mails 
every day. Any suggestions on how to get rid of them?  Apparently my 
Amavis-new and SpamAssassin only tags them from 0 to 1.6 points.


FuzzyOcr
FuzzyOcr
FuzzyOcr

I tell you three times so it must be true. (Sorry Reverend Dodgson.)

Seriously visit http://wiki.apache.org/spamassassin/FuzzyOcrPlugin.
Downloading and installing is a breeze. 


I really don't want to install X on my mailgateways. It would have to be 
as good as URIBL and SURBL before I would consider that.


Is there a way around the dependencies? The FreeBSD port shows the 
following, xorg-libraries-6.9.0, ghostscript-gnu-7.07_15, teTeX-3.0_1, 
tcl-8.4.13_1 (TCL?), and all their dependancies. Plus a lot more.


I have no idea. I'd install X, block it with the firewall and all
that, and run init level 3. It eats hard disk. But it doesn't touch
the run time system.


I'd need to look at configure --help.

DAve


{^_^}


animated GIF spam

2006-08-21 Thread Chip M.
While skimming thru my daily rejected spam pile, did a double take when a
GIF spam seemed to "blink" at me.  Thought it was a sw glitch at first...
then realized the sneaky Borg had adapted again.

Took a look at the frames in PaintShopPro's AnimationShop, and the first 
three are all but blank (wee bit of noise), followed by the payload.

Below are links to the raw message, and the extracted GIF:
http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
http://Puffin.net/software/spam/samples/0001b_been.gif

Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)

The good news is that ImageInfo should have no problem with this particular 
instance, as the initial width x height are "correct".

Time to recalibrate those phaser frequencies!  :)
- "Chip"




Re: animated GIF spam

2006-08-21 Thread John Rudd


On Aug 21, 2006, at 10:13 PM, Chip M. wrote:

While skimming thru my daily rejected spam pile, did a double take 
when a
GIF spam seemed to "blink" at me.  Thought it was a sw glitch at 
first...

then realized the sneaky Borg had adapted again.

Took a look at the frames in PaintShopPro's AnimationShop, and the 
first

three are all but blank (wee bit of noise), followed by the payload.

Below are links to the raw message, and the extracted GIF:
http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
http://Puffin.net/software/spam/samples/0001b_been.gif

Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)

The good news is that ImageInfo should have no problem with this 
particular

instance, as the initial width x height are "correct".

Time to recalibrate those phaser frequencies!  :)
- "Chip"



I also heard that interlaced gif spam is appearing now.

It'd be interesting to see how to counter them.

For animated, is there a clean break between "frames" of animation, 
something that netpbm or whatever can easily identify and break out 
into individual images?  It would be CPU intensive, but the right way 
to fight it might be to run the FuzzyOCR on each frame.  And/or have a 
setting for "maximum frames to process", and if the GIF goes over that 
number of frames, give it a huge spam score.  Or "add this score per 
frame", so that the number of frames increases the spam score directly, 
and automatically bail out if they cross a certain threshold (score 
from number of animation frames alone >= 20, then just return 20 ... or 
something; which saves you on processing the frames themselves).


For interlaced ... I have no idea.  Depends a lot on how the interlaced 
images are stored, I guess.  And whether or not netpbm can generate the 
final image for processing, instead of having to work on the interlaced 
data.






Re: animated GIF spam

2006-08-21 Thread Spamassassin List

While skimming thru my daily rejected spam pile, did a double take when a
GIF spam seemed to "blink" at me.  Thought it was a sw glitch at first...
then realized the sneaky Borg had adapted again.

Took a look at the frames in PaintShopPro's AnimationShop, and the first
three are all but blank (wee bit of noise), followed by the payload.

Below are links to the raw message, and the extracted GIF:
http://Puffin.net/software/spam/samples/0001a_animated_gif.eml
http://Puffin.net/software/spam/samples/0001b_been.gif

Decoder/Chris, I'd view this as a compliment to your FuzzyOCR.  ;)

The good news is that ImageInfo should have no problem with this 
particular

instance, as the initial width x height are "correct".


Yes ImageInfo got them well.



Re: How can I (we) get rid of this?

2006-08-21 Thread Anders Norrbring

jdow skrev:

From: "Anders Norrbring" <[EMAIL PROTECTED]>


Stuart Johnston skrev:

Anders Norrbring wrote:

Hiya all!
I'm getting really sick on recieving 10-100 of the attached mails 
every day. Any suggestions on how to get rid of them?  Apparently my 
Amavis-new and SpamAssassin only tags them from 0 to 1.6 points.


FuzzyOCR, ImageInfo, SARE, sa-update.


I haven't looked at FuzzyOCR or ImageInfo at all, are they compatible 
with SA 2.64?


The world is not compatible with 2.64. Update if at ALL possible.
(Note the special issues that may exist with bayes files.)

{o.o}



I've noticed.. :)
I had been hoping I'd be able to put together a completely new mail 
server for quite some time, but haven't been able to find the time 
needed.  As long as the running one is "working", I prefer not to mess 
too much with it.

I'll look into the SA upgrade though.

Thanks for all answers!

--

Anders Norrbring
Norrbring Consulting


Re: animated GIF spam

2006-08-21 Thread Chip M.
At 10:26 PM 8/21/2006 -0700, John Rudd wrote:
>I also heard that interlaced gif spam is appearing now.

Yes, I saw that post, however there wasn't a publicly available sample.
Any such would be much appreciated.

>It'd be interesting to see how to counter them.

Should be easy.  One approach is "pixel density".  What I've been doing is
reading JUST enough of the header to calculate the area (just like Dallas'
excellent ImageInfo plugin), then dividing by the total raw file size of
just the image (i.e. what one gets after base64 decoding just the GIF part),
less the size of the obvious parts of the header.  Works well, and is
blindingly fast.

Ham generally have a much LOWER density, because it's typically clipart,
whereas spam is generally text, which compresses extremely well, resulting
in a much HIGHER density.  It's not fool proof, so I use a sliding scale,
and have had only one FP this month (from an idiot (redundant) recruiter to
one of my testers - the PNG misfiring was only half the points required to
reject, and the able idiot managed to do several other things rare in Ham).

The beauty is that the spammer can "easily" foil this by lowerering the
density by adding more complexity, which increases the file size, so more
bandwidth is consumed. :)

Some stock spams do use a fancier font which scores lower, so I'm still 
considering other types of analysis as a backup.


Specifically to address animated GIFs, it would be very easy to "walk" the 
raw image, calculating each frame's pixel density, simply ignoring the 
obvious chaff frames.

Tomorrow, I'll write some code to decompose the frames and see what sort of 
numbers I get.

>For interlaced ... I have no idea.  Depends a lot on how the interlaced 
>images are stored, I guess.

Yes, exactly.  Until there's samples, I'm not going to worry about it.

What we also need is a diverse Ham GIF corpus.  Does anyone know of one?
- "Chip"

P.S.  Dallas:  it never occurred to me to _JUST_ score the area.  My pixel 
density approach fails on multi-GIFs, so you saved my bacon there. ;)