Re: getting Bayes token data from spamassassin

2007-01-16 Thread Justin Mason

Michael Parker writes:
 Stuart Robinson wrote:
  Hello, all.
  
  On Mon, Jan 15, 2007 at 01:54:07AM -0800, Stuart Robinson wrote:
  I've searched around a bit, both on gmane and Google, but I haven't
  found much more information regarding your two points. What IS
  stored in the token field of the table bayes_token? And how is the
  SHA1 hash involved?
  A SHA1 hash is taken of the original token value, and the bottom 40
  bits are used as the token from then-on.  There is a plugin call
  which can be used to store raw token - hash value data, but
  otherwise the raw token information is lost after the message is
  processed.
  
  Where could I find more information about the plugin call that allows
  me to do this? 
 
 perldoc Mail::SpamAssassin::Plugin

In particular:

http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_scan
http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_learn

 You should also search the dev list from a couple of years ago at least.
 Lots of discussion about the change and why it was done including, if
 memory serves me correctly, a proof of concept plugin to save off the
 token values.

by the way, a nice, working plugin that does this would be quite useful on
the CustomPlugins wiki page, or contributed as an optional plugin...

--j.


restart sequence after changing configuration

2007-01-16 Thread Mike Kenny

As I understand the situation when using amavis/spamassassin/postfix the
flow of a messages is that it is received by postfix, passed to amavisd,
from there to clamav, then to spamassassin and then back to postfix. My
questions are:

1. is this correct?
2. if I change spamassasin's local.cf or user_prefs, what needs to be
restarted and in what sequence/
3. if I change amavisd.conf what needs restarting/reloading, etc and in what
sequence?

Thanks,

mike


Re: restart sequence after changing configuration

2007-01-16 Thread Robert Brooks

Mike Kenny wrote:
As I understand the situation when using amavis/spamassassin/postfix the 
flow of a messages is that it is received by postfix, passed to amavisd, 
from there to clamav, then to spamassassin and then back to postfix. My 
questions are:


1. is this correct?
2. if I change spamassasin's local.cf http://local.cf or user_prefs, 
what needs to be restarted and in what sequence/
3. if I change amavisd.conf what needs restarting/reloading, etc and in 
what sequence?


amavisd-new loads the spamassassin perl modules itself, if you are 
running amavisd-new you don't need to run spamd.


Restarting amavisd-new will cause local.cf to be re-read, however much 
of user_prefs is not used by amavis, so you should look at the settings 
in amavisd.conf first.


You can restart amavisd-new without restarting postfix.


--
Robert Brooks,   Network Manager,  Cable  Wireless UK
[EMAIL PROTECTED]   http://wtg.cw.com/
Tel: +44 (0)20 7339 8600  Fax: +44 (0)20 7339 8601
-  What was your username again? - BOFH-


spamc -d failover question

2007-01-16 Thread Marc Perkel
If you use spamc -d host1,host2,host3 and suppose that it connects to 
host1 but you have hit the max number of connections, will it fail over 
to hots2 or will it just fail?


Thanks in advance.



Re: sa-update failing - I think dev just went live again?

2007-01-16 Thread Justin Mason

Jason Haar writes:
 Similar to the issue found Jan 1 2007, I am currently seeing
 
 # sa-update
 error: GPG validation failed!
 The update downloaded successfully, but it was not signed with a trusted GPG
 key.  Instead, it was signed with the following keys:
 
 24F434CE
 
 Perhaps you need to import the channel's GPG key?
 
 
 There is no 24F434CE key that I can find, and the thread titled
 SA-UPDATE and recent branches/3.1 rules? seems to imply this is a
 fault that can happen with some process on updates.spamassassin.org?

24F434CE is the active subkey of 5244EC45:

: jm 899...; gpg -v 494178.tar.gz.asc
gpg: armor header: Version: GnuPG v1.4.2 (SunOS)
gpg: assuming signed data in `494178.tar.gz'
gpg: Signature made Mon Jan  8 19:47:19 2007 GMT using RSA key ID 24F434CE
gpg: using subkey 24F434CE instead of primary key 5244EC45
gpg: using classic trust model
gpg: Good signature from updates.spamassassin.org Signing Key [EMAIL 
PROTECTED]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 5E54 1DC9 59CB 8BAC 7C78  DFDC 4056 A61A 5244 EC45
 Subkey fingerprint: 0C2B 1D71 75B8 52C6 4B3C  DC71 6C55 3978 24F4 34CE
gpg: binary signature, digest algorithm SHA1

sounds like your sa-update key info got lost somehow?

--j.


Re: getting Bayes token data from spamassassin

2007-01-16 Thread Theo Van Dinter
On Tue, Jan 16, 2007 at 10:21:14AM +, Justin Mason wrote:
 by the way, a nice, working plugin that does this would be quite useful on
 the CustomPlugins wiki page, or contributed as an optional plugin...

The plugin itself is pretty trivial -- the question is: what to do with
the token information?  Should it be sent out to a flat file, kept in
a DBM, etc?  That's where the non-trivial stuff happens.

-- 
Randomly Selected Tagline:
... then you'll excuse me, but I'm in the middle of fifteen things, all of
 them annoying.
 - Ivonova, Babylon 5 (Midnight on the Firing Line)


pgp47jf7m7BT4.pgp
Description: PGP signature


RE: Checking for PTR record

2007-01-16 Thread Robert Swan
I use this header rule for the same thing:

header  LOCAL_INVALID_PTR2  Received =~ /from \S+ \(unknown /
score  LOCAL_INVALID_PTR2 0.8
describe LOCAL_INVALID_PTR2   Header contains no PTR2




Robert
 
 
 
 
 
 
Peace he would say instead of goodbyepeace my brother.
-Original Message-
From: Peter Smith [mailto:[EMAIL PROTECTED] 
Sent: Saturday, January 13, 2007 6:05 PM
To: users@spamassassin.apache.org
Subject: Checking for PTR record

Hi,

I'm interested in filtering mail relayed from hosts which have no
reverse
dns (PTR) record. A lot of MTAs support rejection of mail from such
hosts,
but I feel this will reject too much genuine mail, so I'm looking to
approach the problem via Spam Assassin - perhaps score 1 or 2 for such
mail.

I was suprised that Spam Assassin doesn't already have a rule for this,
or
that a plugin has not been written; or perhaps I'm looking in the wrong
place? I'm aware of the tests Spam Assassin performs for PRT records for
hotmail, excite, mail.com etc, but there don't seem to be any general
rules.

Thanks,
Pete


Re: A bit OT Question

2007-01-16 Thread Dave Williss



Chris wrote:

On Monday 15 January 2007 7:07 pm, John D. Hardin wrote:
  

On Mon, 15 Jan 2007, Chris wrote:


I keep seeing the below bounces in spam reports I'm sending out. I know
the ordb is dead, who is using it, Earthlink or corp.mailsecurity.net.au?

 [EMAIL PROTECTED]
SMTP error from remote mailer after RCPT
TO:[EMAIL PROTECTED]:
host corp.mailsecurity.net.au [67.18.110.234]:
451 4.7.1 Temporary lookup failure of 209.86.89.61 at
relays.ordb.org: retry timeout exceeded
  

corp.mailsecurity.net.au is using it.

They should notice it pretty quickly now that *all* of their inbound
mail is being TMPFAILed... :)

spam is way down this week. Did you do something to the server?


Thanks John, I shot an email off to their tech contact letting them know about 
relays.ordb.org being shut down, hopefully it will be fixed.
  
You sent them an email to tell them their email is down?  Good luck with 
that :-)




Re: sa-update failing - I think dev just went live again?

2007-01-16 Thread Jason Haar
Justin Mason wrote:

 24F434CE is the active subkey of 5244EC45:

 ...
 sounds like your sa-update key info got lost somehow?

   
Yeah. This is a CentOS-4 server I installed yesterday. Looks like
something went wrong with it.

The error message says:

Perhaps you need to import the channel's GPG key?  For example:

wget http://spamassassin.apache.org/updates/GPG.KEY
gpg --import GPG.KEY


That doesn't fix the problem. However, replacing the last line with:

sa-update --import GPG.KEY

does fix it. Perhaps that should be changed?

Anyway, all better now - thanks!

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



ClamAV plugin with 3.1.17

2007-01-16 Thread Brent Kennedy
I just upgraded to 3.1.7 and I just wanted to make sure that clamav is still
run via the pm file and not some how enabled via a .pre file.

Thanks

-Brent





RE: ClamAV plugin with 3.1.17

2007-01-16 Thread Brent Kennedy
Woops I meant .cf file not pm file.

I just wanted to make sure this is still accurate:
http://wiki.apache.org/spamassassin/ClamAVPlugin

 

-Original Message-
From: Brent Kennedy [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 16, 2007 3:18 PM
To: users@spamassassin.apache.org
Subject: ClamAV plugin with 3.1.17

I just upgraded to 3.1.7 and I just wanted to make sure that clamav is still
run via the pm file and not some how enabled via a .pre file.

Thanks

-Brent






Re: sa-update failing - I think dev just went live again?

2007-01-16 Thread Daryl C. W. O'Shea

Jason Haar wrote:

Justin Mason wrote:

24F434CE is the active subkey of 5244EC45:

...
sounds like your sa-update key info got lost somehow?

  

Yeah. This is a CentOS-4 server I installed yesterday. Looks like
something went wrong with it.

The error message says:

Perhaps you need to import the channel's GPG key?  For example:

wget http://spamassassin.apache.org/updates/GPG.KEY
gpg --import GPG.KEY


That doesn't fix the problem. However, replacing the last line with:

sa-update --import GPG.KEY

does fix it. Perhaps that should be changed?


It was fixed a while ago, there just hasn't been a release since it was 
fixed.


Daryl


Re: getting Bayes token data from spamassassin

2007-01-16 Thread Stuart Robinson

 On Tue, Jan 16, 2007 at 10:21:14AM +, Justin Mason wrote:
  by the way, a nice, working plugin that does this would be quite useful on
  the CustomPlugins wiki page, or contributed as an optional plugin...
 
 The plugin itself is pretty trivial -- the question is: what to do with
 the token information?  Should it be sent out to a flat file, kept in
 a DBM, etc?  That's where the non-trivial stuff happens.

Couldn't the raw tokens just be kept in the same database by adding an
additional column to the table bayes_token that isn't indexed? That
wouldn't affect performance too much, would it?

++
| Stuart Robinson|
| Email: stuart at zapata dot org|
| Homepage: http://www.zapata.org/stuart |
++



Re: getting Bayes token data from spamassassin

2007-01-16 Thread Theo Van Dinter
On Tue, Jan 16, 2007 at 02:02:01PM -0800, Stuart Robinson wrote:
 Couldn't the raw tokens just be kept in the same database by adding an
 additional column to the table bayes_token that isn't indexed? That
 wouldn't affect performance too much, would it?

Besides requiring a new data layout for the DBM files, it would be space-wise
inefficient.

This issue was discussed to death a few years ago.  If you want this kind of
thing on your specific server, the plugin call allows you to write whatever
you want to deal with it.  I'd probably keep a new table w/ hash-raw token
mappings, or some kind of DBM.

-- 
Randomly Selected Tagline:
Do not meddle in the affairs of wizards, for you are crunchy and good
 with ketchup.  - Unknown


pgpjPNMjnp5oV.pgp
Description: PGP signature


Re: getting Bayes token data from spamassassin

2007-01-16 Thread Stuart Robinson
Thanks. Once I have this all figured out, I will write up something and
put it on my homepage and post a link to it here.

   A SHA1 hash is taken of the original token value, and the bottom 40
   bits are used as the token from then-on.  There is a plugin call
   which can be used to store raw token - hash value data, but
   otherwise the raw token information is lost after the message is
   processed.
   
   Where could I find more information about the plugin call that allows
   me to do this? 
  
  perldoc Mail::SpamAssassin::Plugin
 
 In particular:
 
 http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_scan
 http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_learn
 
  You should also search the dev list from a couple of years ago at least.
  Lots of discussion about the change and why it was done including, if
  memory serves me correctly, a proof of concept plugin to save off the
  token values.
 
 by the way, a nice, working plugin that does this would be quite useful on
 the CustomPlugins wiki page, or contributed as an optional plugin...

++
| Stuart Robinson|
| Email: stuart at zapata dot org|
| Homepage: http://www.zapata.org/stuart |
++



looking for repository of spam email content

2007-01-16 Thread Andrea Lisabeth Civan
Can anyone recommend a site that provides examples of content from spam 
email? I am a researcher looking for
content from the 50 most recent or most prevalent spams. Is a site 
like http://spamosphere.wordpress.com/ the best source for this? Thanks! 
Andrea


Re: bayes poisoning

2007-01-16 Thread Chris Purves

maillist wrote:
I see a few emails every-now-and-then about bayes poisoning, and am 
wondering what is means.  From what I understand, it is some message 
that gets learned (only through autolearn?) that has certain 
characteristics that throw the bayes system off.




From what I've seen there are generally two ways it is referred to:

1. random text or phrases thrown into spam to make it look like spam and 
ham look more alike:  This is an imagined problem.


2. spam incorrectly leanred as ham or ham incorrectly learned as spam: 
Enough of these (either from manual or auto learning) and your Bayes 
database will be useless.


--
Chris



RE: looking for repository of spam email content

2007-01-16 Thread Michael Scheidell
Don't worry, you just posted to a mailing list that is mirrored
everywhere.

You will get your spam shortly. ;-)

-
This email has been scanned and certified safe by SpammerTrap(tm)
For Information please see http://www.spammertrap.com


This score makes no sense

2007-01-16 Thread Chris
I just got this in my inbox. If clamav scores 10, how can this be marked as 
not spam?

X-Spam-Virus: Yes (Html.Phishing.Fake.Sanesecurity.06080201)
 X-Spam-Seen: Tokens 75
 X-Spam-New: Tokens 126
 X-Spam-ASN: AS14492 66.70.0.0/17
 X-Spam-Remote: Host localhost.localdomain
 X-Spam-Checker-Version: SpamAssassin 3.1.7 (2006-10-05) on 
cpollock.localdomain
 X-Spam-Hammy: Tokens 8
 X-Spam-Status: No, score=4.5 required=5.0 tests=BAYES_05,CLAMAV,
HTML_IMAGE_ONLY_16,HTML_MESSAGE,MIME_HTML_ONLY,REPLY_TO_EMPTY 
autolearn=disabled version=3.1.7
 X-Spam-Spammy: Tokens 1
 X-Spam-Pyzor: Reported 0 times.
 X-Spam-DCC: sonic.net cpollock 1156; Body=65 Fuz1=65 Fuz2=216
 X-Spam-Untrusted: Relays [ ip=66.70.114.30 rdns=postmaster.costapacific.net 
helo=postmaster.costapacific.net by=mx-nebolish.atl.sa.earthlink.net 
ident= envfrom= intl=0 id=1h6ZgC39e3Nl3490 auth= ]
 X-Spam-Level: 
 X-Spam-RBL: Results dns:www.costapacific.net [66.70.114.35]
 Status: U
 Return-Path: [EMAIL PROTECTED]
 Received: from pop.earthlink.net [209.86.93.201]
by localhost with POP3 (fetchmail-6.2.5)
for [EMAIL PROTECTED] (single-drop); Tue, 16 Jan 2007 19:07:36 -0600 
(CST)
 Received: from postmaster.costapacific.net ([66.70.114.30])
by mx-nebolish.atl.sa.earthlink.net (EarthLink SMTP Server) with SMTP 
id 1h6ZgC39e3Nl3490
for [EMAIL PROTECTED]; Tue, 16 Jan 2007 20:07:06 -0500 (EST)
 Received: (qmail 6432 invoked by uid 80); 17 Jan 2007 00:56:51 -
 Date: 17 Jan 2007 00:56:51 -
 Message-ID: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Your TD Canada EasyWeb Online Account is about to expire!
 From: TD Canada [EMAIL PROTECTED]
 Reply-To: 
 MIME-Version: 1.0
 Content-Type: text/html
 Content-Transfer-Encoding: 8bit
 X-ELNK-Info: sbv=0; sbrc=.0; sbf=00; sbw=000;
 X-SenderIP: 66.70.114.30
 X-ASN: ASN-14492
 X-CIDR: 66.70.0.0/17
 X-UID: 25573
 X-Length: 3148

-- 
Chris
KeyID 0xE372A7DA98E6705C
http://learn.to/quote


pgputze5RaaE3.pgp
Description: PGP signature


Which is more efficient: two regexp's or one regexp with alternation?

2007-01-16 Thread Kelly Jones

If I want to block subjects matching foo or bar, is it more
efficient to write two regexps or a single foo|bar regexp?

I'd think a single regexp is more efficient, but SpamAssassin ships w/
rule-sets that have multiple rules. Given how many spams people get,
even a small improvement in efficiency would help?

Or are there multiple rules because the scores are different and have
to be optimized differently?

--
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.


Re: This score makes no sense

2007-01-16 Thread Theo Van Dinter
On Tue, Jan 16, 2007 at 07:14:05PM -0600, Chris wrote:
 I just got this in my inbox. If clamav scores 10, how can this be marked as 
 not spam?
 
  X-Spam-Status: No, score=4.5 required=5.0 tests=BAYES_05,CLAMAV,
 HTML_IMAGE_ONLY_16,HTML_MESSAGE,MIME_HTML_ONLY,REPLY_TO_EMPTY 
 autolearn=disabled version=3.1.7

My guess is that CLAMAV doesn't score 10.  Doing a quick calculation, it seems
to score ~4.5.

-- 
Randomly Selected Tagline:
Programming is like sex. One mistake and you have to support it for
 the rest of your life. - Michael Sinz


pgpDFbTY3tJsV.pgp
Description: PGP signature


Re: Which is more efficient: two regexp's or one regexp with alternation?

2007-01-16 Thread Theo Van Dinter
On Tue, Jan 16, 2007 at 06:23:48PM -0700, Kelly Jones wrote:
 If I want to block subjects matching foo or bar, is it more
 efficient to write two regexps or a single foo|bar regexp?

It'll be more efficient to do a single regexp.

 Or are there multiple rules because the scores are different and have
 to be optimized differently?

Yes.  Rules have different hit rates, so they need to get different
scores because it's better for overall efficacy.

-- 
Randomly Selected Tagline:
Illiterate?  Write to us for a free brochure.


pgpsojGLfbjEm.pgp
Description: PGP signature


Re: This score makes no sense

2007-01-16 Thread Chris
On Tuesday 16 January 2007 8:29 pm, Theo Van Dinter wrote:
 On Tue, Jan 16, 2007 at 07:14:05PM -0600, Chris wrote:
  I just got this in my inbox. If clamav scores 10, how can this be marked
  as not spam?
 
   X-Spam-Status: No, score=4.5 required=5.0 tests=BAYES_05,CLAMAV,
  HTML_IMAGE_ONLY_16,HTML_MESSAGE,MIME_HTML_ONLY,REPLY_TO_EMPTY
  autolearn=disabled version=3.1.7

 My guess is that CLAMAV doesn't score 10.  Doing a quick calculation, it
 seems to score ~4.5.
Odd, because if I run it through sa-learn --spam, spamassassin -r and 
spamassasin -t it comes out as:

Content analysis details:   (19.3 points, 5.0 required)

 pts rule name  description
 -- --
 0.6 REPLY_TO_EMPTY Reply-To: is empty
 0.0 HTML_MESSAGE   BODY: HTML included in message
 5.0 BAYES_99   BODY: Bayesian spam probability is 99 to 100%
[score: 1.]
 0.0 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.5 HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of words
 2.2 DCC_CHECK  Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
  10 CLAMAV Clam AntiVirus detected a virus
 1.0 SAGREY Adds 1.0 to spam from first-time senders

Guess its just another mystery of life as to why it didn't get the 10points 
the first time for the clamav hit.

-- 
Chris
KeyID 0xE372A7DA98E6705C
http://learn.to/quote


pgp4WeCifojxp.pgp
Description: PGP signature


Re: A bit OT Question

2007-01-16 Thread Chris
On Tuesday 16 January 2007 8:57 am, Dave Williss wrote:

  corp.mailsecurity.net.au is using it.
 
  They should notice it pretty quickly now that *all* of their inbound
  mail is being TMPFAILed... :)
 
  spam is way down this week. Did you do something to the server?
 
  Thanks John, I shot an email off to their tech contact letting them know
  about relays.ordb.org being shut down, hopefully it will be fixed.

 You sent them an email to tell them their email is down?  Good luck with
 that :-)

Yep, and they not only replied, but the problem is fixed. I sent the link to 
the slashdot article about relays.ordb.org going down.

-- 
Chris
KeyID 0xE372A7DA98E6705C
http://learn.to/quote


pgpfTj9LrJfSq.pgp
Description: PGP signature