Re: ver 3.0 opinions

2004-10-31 Thread Tuc at Beach House
> > But shouldn't it have carried my database over from my previous install? 
> > I'd been using it for atleast 6 months on different versions before this
> > upgrade.  Did it 'forget' it all?  Do I need to totally retrain it?  I'm
> > using a stock install from FreeBSD ports, no local/global overrides.
> 
> Looks like somebody didn't read the UPGRADE doc...
> 
>  Due to the database format change, you will want to do something like
>   this when upgrading:
> 
I did see that, but I also saw :

- The Bayesian storage modules have been completely re-written and now
  include Berkeley DB (DBM) storage as well as SQL based storage (see
  sql/README.bayes for more information).  In addition, a new format
  has been introduced for the bayes database that stores tokens in fixed
  length hashes (Bayes v3).

**
  All DBM databases should be automatically
  converted to this new format the first time they are opened for write.
**

  You can manually perform the upgrade by running "sa-learn --sync"
  from the command line.

So I thought the rest of the instructions were if I just wanted to
do it before it did it itself.

Thanks, Tuc


Re: ver 3.0 opinions

2004-10-31 Thread Faisal N. Jawdat
Looks like somebody didn't read the UPGRADE doc...
 Due to the database format change, you will want to do something like
  this when upgrading:
I read it and followed the directions and didn't see any problem for a 
couple days and then suddenly the spam level jumped substantially.  
Upon further investigation it looked like the bayes dbs had gotten 
corrupted and I was seeing low tok counts like the original poster had 
reported.  Another friend saw this happen when he first upgraded, 
although I can't speak for his direction-following on the upgrade.

If I see more of this I'll try and isolate it and file an appropriately 
detailed bug report, but I don't think it's necessarily accurate to 
immediately write this off as user error.

-faisal


Re: ver 3.0 opinions

2004-10-31 Thread Ed Kasky
On Sat, 30 Oct 2004, Tuc at Beach House wrote:

> > > himinbjorg% sa-learn --dump magic
> > > 0.000  0  3  0  non-token data: bayes db version
> > > 0.000  0175  0  non-token data: nspam
> > > 0.000  0  73501  0  non-token data: nham
> > > 0.000  01027341  0  non-token data: ntokens
> > 
> > I think 175 spam messages is not nearly enough for Bayes to be
> > adequately trained.  Also, the ratio of ham to spam (~0.3%) looks a
> > bit odd.  If it reflects roughly equal time periods over which you
> > received the messages, it suggests you might be missing (or perhaps
> > misclassifying) some of your spam.
> > 
> But shouldn't it have carried my database over from my previous install? 
> I'd been using it for atleast 6 months on different versions before this
> upgrade.  Did it 'forget' it all?  Do I need to totally retrain it?  I'm
> using a stock install from FreeBSD ports, no local/global overrides.

Looks like somebody didn't read the UPGRADE doc...

 Due to the database format change, you will want to do something like
  this when upgrading:

  - stop running spamassassin/spamd (ie: you don't want it to be running
during the upgrade)
  - run "sa-learn --rebuild", this will sync your journal.  if you skip
this step, any data from the journal will be lost when the DB is
upgraded.
  - upgrade SA to 3.0.0
  - run "sa-learn --sync", which will cause the db format to be upgraded.
if you want to see what is going on, you can add the "-D" option.
  - test the new database by running some sample mails through
SpamAssassin, and/or at least running "sa-learn --dump" to make sure
the data looks valid.
  - start running spamassassin/spamd again
 
. . . . . . . . . . . . . . .
Randomly generated quote:
"My great concern is not whether you have failed, but whether you
are content with your failure." - Abraham Lincoln



Re: ver 3.0 opinions

2004-10-31 Thread Tuc at Beach House
> > himinbjorg% sa-learn --dump magic
> > 0.000  0  3  0  non-token data: bayes db version
> > 0.000  0175  0  non-token data: nspam
> > 0.000  0  73501  0  non-token data: nham
> > 0.000  01027341  0  non-token data: ntokens
> 
> I think 175 spam messages is not nearly enough for Bayes to be
> adequately trained.  Also, the ratio of ham to spam (~0.3%) looks a
> bit odd.  If it reflects roughly equal time periods over which you
> received the messages, it suggests you might be missing (or perhaps
> misclassifying) some of your spam.
> 
But shouldn't it have carried my database over from my previous install? 
I'd been using it for atleast 6 months on different versions before this
upgrade.  Did it 'forget' it all?  Do I need to totally retrain it?  I'm
using a stock install from FreeBSD ports, no local/global overrides.

Thanks Tuc



Re: ver 3.0 opinions

2004-10-30 Thread Theodore Heise


On Fri, 29 Oct 2004, "Tuc at Beach House" <[EMAIL PROTECTED]> said:
> > > The reason I joined was that I recently upgraded my FreeBSD box from 2.64
> > > to 3.0.1_1 (Not sure what about it makes it _1, but thats ok)
> > >
> > >   As soon as I did, the amount of spam I started getting as
> > > good deliveries SKYROCKETED. I thought maybe it was just me, but it
> > > appears at least *1* other person has seen this.


> > If you're logged in as the same user SpamAssassin's running as,
> > what does "sa-learn --dump magic" tell you?


On Sat, 30 Oct 2004, Tuc at Beach House wrote:
>   The one I notice getting the worst spam (Since its my personal
> primary email address):
>
> himinbjorg% sa-learn --dump magic
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0175  0  non-token data: nspam
> 0.000  0  73501  0  non-token data: nham
> 0.000  01027341  0  non-token data: ntokens

I think 175 spam messages is not nearly enough for Bayes to be
adequately trained.  Also, the ratio of ham to spam (~0.3%) looks a
bit odd.  If it reflects roughly equal time periods over which you
received the messages, it suggests you might be missing (or perhaps
misclassifying) some of your spam.

-- 
Theodore (Ted) Heise <[EMAIL PROTECTED]> Bloomington, IN, USA


Re: ver 3.0 opinions

2004-10-30 Thread Tuc at Beach House
> 
> 
> On Fri, 29 Oct 2004 17:49:07 -0400 (EDT), "Tuc at Beach House"
> <[EMAIL PROTECTED]> said:
> > The reason I joined was that I recently upgraded my FreeBSD box from 2.64
> > to 3.0.1_1 (Not sure what about it makes it _1, but thats ok)
> > 
> > As soon as I did, the amount of spam I started getting as
> > good deliveries SKYROCKETED. I thought maybe it was just me, but it
> > appears at least *1* other person has seen this. 
> 
> :-) well, it depends on a *lot* of possibilities, and you haven't given
> us a lot of information to work with.
>
Sorry, I was just joining into the thread and wasn't sure if people
were saying alot about what the problem was, or just saying "In general I 
see this".
>
> You should probably look at the
> scores that SA is giving your spam and try to figure out what's
> different.
>
Just seems like its less. Hitting 4.6/4.7 out of 5, instead of 
higher numbers I'm used to.
>
> Is Bayes working for you?
>
Looks like it, yes. I'm seeing BAYES_ at times.
>
> You might also want to adjust the
> default scores - version 3.0 lowered the Bayes scores quite a bit. More
> than it should have, IMHO.
>
Is there somewhere I can find what the old ones are? Do I just
find the CVS of 50_scores.cf or re-download the 2.64 and copy the BAYES_
into my local.cf ?
>
> Are your network tests still working? 
>
I think so. I see RCVD_IN_BL_SPAMCOP, RCVD_IN_SBL type things.
>
> If
> you're logged in as the same user SpamAssassin's running as, what does
> "sa-learn --dump magic" tell you? 
>
The one I notice getting the worst spam (Since its my personal
primary email address):

himinbjorg% sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0175  0  non-token data: nspam
0.000  0  73501  0  non-token data: nham
0.000  01027341  0  non-token data: ntokens
0.000  0 1076532302  0  non-token data: oldest atime
0.000  0 1099138682  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync atime
0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count


This is an email address I used to use and just got too much spam
so I stopped using it :

himinbjorg% sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   2292  0  non-token data: nspam
0.000  0 288387  0  non-token data: nham
0.000  0 135841  0  non-token data: ntokens
0.000  0 1098946805  0  non-token data: oldest atime
0.000  0 1099139048  0  non-token data: newest atime
0.000  0 1099138692  0  non-token data: last journal sync atime
0.000  0 1099088153  0  non-token data: last expiry atime
0.000  0 141462  0  non-token data: last expire atime delta
0.000  0  54603  0  non-token data: last expire reduction 
count

Thanks, Tuc


Re: ver 3.0 opinions

2004-10-29 Thread snowjack

On Fri, 29 Oct 2004 17:49:07 -0400 (EDT), "Tuc at Beach House"
<[EMAIL PROTECTED]> said:
> The reason I joined was that I recently upgraded my FreeBSD box from 2.64
> to 3.0.1_1 (Not sure what about it makes it _1, but thats ok)
> 
>   As soon as I did, the amount of spam I started getting as
> good deliveries SKYROCKETED. I thought maybe it was just me, but it
> appears at least *1* other person has seen this. 

:-) well, it depends on a *lot* of possibilities, and you haven't given
us a lot of information to work with. You should probably look at the
scores that SA is giving your spam and try to figure out what's
different. Is Bayes working for you? You might also want to adjust the
default scores - version 3.0 lowered the Bayes scores quite a bit. More
than it should have, IMHO. Are your network tests still working? If
you're logged in as the same user SpamAssassin's running as, what does
"sa-learn --dump magic" tell you? 
--
  
  snowjack(a)fastmail.fm



Re: ver 3.0 opinions

2004-10-29 Thread Tuc at Beach House
> 
> On Thu, 28 Oct 2004, Jeff Ramsey wrote:
> 
> > Is version 3 really any better at stopping spam that 2.63? I'm running 2.63 
> > and my friend who owns an ISP just upgraded to ver 3, and he claims that 
> > 2.63 
> > stopped more spam.
> >
> 
>   My experience is that (aside from spamd memory issues), v3 is much 
> better at catching spam than 2.63 was (out of the box).
> 
>   With 2.64, we avaraged about 10 or so spams below threshold (5.0). 
> Now it's about 1, and some days, none :)  Worth the upgrade IMO.
> 
Sorry to jump in the middle of this convo. I just joined this list
after using it for a few months. The reason I joined was that I recently
upgraded my FreeBSD box from 2.64 to 3.0.1_1 (Not sure what about it makes
it _1, but thats ok)

As soon as I did, the amount of spam I started getting as
good deliveries SKYROCKETED. I thought maybe it was just me, but it appears
atleast *1* other person has seen this. 

Was there a major change in the rules or something? My install is/was
out of the ports collection of FreeBSD which I didn't think had been "tainted"
any. Should I maybe clear my db files?

Any help is appreciated, otherwise I'll fall back to the previous
version.

Thanks, Tuc


Re: ver 3.0 opinions

2004-10-29 Thread Jon Trulson
On Thu, 28 Oct 2004, Bart Schaefer wrote:
On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey <[EMAIL PROTECTED]> wrote:
Is version 3 really any better at stopping spam that 2.63?

[...]
Using it in local only mode, though, I've found it not very different.
The spams that get through 3.x that do not get through 2.6x are
generally (a) those that match BAYES_99, which by itself in the
default configuration is no longer a large enough score to make me
happy, or
	True.  Some spam we get is soley BAYES_99.  I've bumped it back up 
to 5.2 (like in 2.6x).

--
Jon Trulsonmailto:[EMAIL PROTECTED]
ID: 1A9A2B09, FP: C23F328A721264E7 B6188192EC733962
PGP keys at http://radscan.com/~jon/PGPKeys.txt
#include 
"I am Nomad." -Nomad


Re: ver 3.0 opinions

2004-10-29 Thread Jon Trulson
On Thu, 28 Oct 2004, Jeff Ramsey wrote:
Is version 3 really any better at stopping spam that 2.63? I'm running 2.63 
and my friend who owns an ISP just upgraded to ver 3, and he claims that 2.63 
stopped more spam.

	My experience is that (aside from spamd memory issues), v3 is much 
better at catching spam than 2.63 was (out of the box).

	With 2.64, we avaraged about 10 or so spams below threshold (5.0). 
Now it's about 1, and some days, none :)  Worth the upgrade IMO.

--
Jon Trulsonmailto:[EMAIL PROTECTED]
ID: 1A9A2B09, FP: C23F328A721264E7 B6188192EC733962
PGP keys at http://radscan.com/~jon/PGPKeys.txt
#include 
"I am Nomad." -Nomad


Re: ver 3.0 opinions

2004-10-29 Thread Jeff Ramsey
Thanks for the help and info. I'll tell my friend why his 3.0 install 
is letting more spam through and he has autolearn turned on, so his 
should get better.

As for me, I'll upgrade to at least version 2.64. I use lots of the 
custom rules - antidrug.cf, etc. If I upgrade SA to 3.0, can I still 
use my SQL custom rules score table to alter the scoring for the rules, 
which I think would give me the best of both worlds?

Thanks,
Jeff Ramsey
Tubafor Mill, Inc.


Re: ver 3.0 opinions

2004-10-29 Thread snowjack
On Thu, 28 Oct 2004 16:19:13 -0700, "Bart Schaefer"
<[EMAIL PROTECTED]> said:
> On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey <[EMAIL PROTECTED]>
> wrote:
> > Is version 3 really any better at stopping spam that 2.63?
> 
> Version 3 stops different spam than 2.63, in my experience so far. 
> E.g. it's better at catching the drug spam but not as good at the
> "earn cash for making phone calls" spam.

I would say that the *default* config of 3.X is significantly more
effective than the default config of 2.63 or 2.64. But I think after
some tweaking of 2.64 it's probably just as effective as 3.0, once
you've added in the SpamCopURI patch, antidrug.cf, some of the other
SARE custom rulesets. Both of them depend to a large extent on how much
care you put into setting it up, training BAYES properly, etc.

> Using it in local only mode, though, I've found it not very different.
>  The spams that get through 3.x that do not get through 2.6x are
> generally (a) those that match BAYES_99, which by itself in the
> default configuration is no longer a large enough score to make me
> happy, or (b) would have been tagged as spam except that the AWL
> "smoothed" them down to just below the threshhold.

Yup. But I wouldn't turn off AWL if I were you. I think it's a very nice
feature, and has probably prevented a few false positives for me. Yes,
occasionally it will pull the score the wrong way across the threshold,
but if it's doing that, you're better off figuring out why this person's
messages get an *average* score on the wrong side of your threshold
anyway. If you get that fixed, then the AWL will stop pulling messages
the wrong way across your threshold. 

I do customize my BAYES scores because I'm not very happy with the
defaults. I find that a significant portion of spam manages to reduce
its Bayes probability to 40-60% by including large chunks of innocent
text at the end of the message. So I add the below to my lines to
local.cf, because most of my ham scores below 10%, while the messages
that hit Bayes' 40-60% range are more than 95% spam. This still catches
those spams which hit BAYES_99, and for that rare ham that hits BAYES_99
(I've never seen one, but I suppose they must be out there), the AWL or
another whitelist rule will hopefully pull it back under five points.

score BAYES_00 -4.9
score BAYES_01 -2.0
score BAYES_10 -1.5
score BAYES_20 -1.0
score BAYES_30 -0.5
score BAYES_40 0.1
score BAYES_44 0.7
score BAYES_50 1.0
score BAYES_56 1.5
score BAYES_60 2.1
score BAYES_70 3.1
score BAYES_80 4.2
score BAYES_90 4.9
score BAYES_99 5.4
--
  
  snowjack(a)fastmail.fm



Re: ver 3.0 opinions

2004-10-28 Thread Matt Kettler
At 06:21 PM 10/28/2004, Jeff Ramsey wrote:
Is version 3 really any better at stopping spam that 2.63? I'm running 
2.63 and my friend who owns an ISP just upgraded to ver 3, and he claims 
that 2.63 stopped more spam.
As far as an "out of the box" configuration goes, I'd say 3.0 is orders of 
magnitude better than the 2.63 version. (Actually why ANYONE is still 
running 2.63 is a mystery to me. 2.63 is vulnerable to a DoS attack, and 
you should be running 2.64 if you want your servers to stay in the 2.6x 
series.)

I'd also say that if you're a default-rules non-bayes user, 3.0 is going to 
be a HUGE improvement.

However, for someone who's got bayes going and all the add-on rules and 
packages they can find (antidrug, surbl, rulesemporium, etc) they're likely 
to experience some reduced hits when they upgrade to 3.x.

The main reason here is that SA 3.0 has a lot of these add-ons integrated, 
and some of the scores that came out of the GA are a less aggressive than 
the ones made by some authors as they hand hand-score their rules.

The less aggressive scoring is going to cause more spam misses, but it's 
also going to reel in the FP rate. The scores are now balanced with the 
scores of the other rules, not adding score on top of an already balanced 
system.

Go figure, if you add rules to a balanced system without rebalancing, the 
average score of all messages, spam and nonspam, goes up. You catch more 
spam, and you catch more FPs.

SA 3.0 is also significantly less aggressive in bayes scoring, and I think 
this is largely a reflection of the increased accuracy of the rules picking 
up a lot of the slack which 2.5x and 2.6x left for bayes to take care. 
Let's face it. 2.5x and 2.6x had pretty lousy default rule sets, but the 
power of bayes made the system still catch spam pretty well.

2.6x without any bayes or add-on rules is pretty much hopelessly 
ineffective against current spam. With bayes or add-ons it works pretty 
well. With both it works very well at catching spam, but also has a much 
higher FP ratio than the SA devs are willing to accept for a shipping 
stable release. (The general rule for the scoring system is FPs are 100 
times worse than FNs)




Re: ver 3.0 opinions

2004-10-28 Thread Bart Schaefer
On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey <[EMAIL PROTECTED]> wrote:
> Is version 3 really any better at stopping spam that 2.63?

Version 3 stops different spam than 2.63, in my experience so far. 
E.g. it's better at catching the drug spam but not as good at the
"earn cash for making phone calls" spam.

If you use full network tests, I suspect 3.001 (did I get enough
zeroes? too many?) is actually better than 2.63/2.64.

Using it in local only mode, though, I've found it not very different.
 The spams that get through 3.x that do not get through 2.6x are
generally (a) those that match BAYES_99, which by itself in the
default configuration is no longer a large enough score to make me
happy, or (b) would have been tagged as spam except that the AWL
"smoothed" them down to just below the threshhold.

I confess with some embarrassment that I haven't yet looked into how
to turn off the AWL in spamd.  Statement (b) above comes from running
the same messages through "spamassassin -t" and having them marked as
spam with the only difference in the latter case being the absence of
an AWL hit.


Re: ver 3.0 opinions

2004-10-28 Thread Gavin Cato
I noticed a significant improvement with 3.0 - especially with drugs related
messages.


On 29/10/04 8:21 AM, "Jeff Ramsey" <[EMAIL PROTECTED]> wrote:

> Is version 3 really any better at stopping spam that 2.63? I'm running
> 2.63 and my friend who owns an ISP just upgraded to ver 3, and he
> claims that 2.63 stopped more spam.
> 
> Thanks,
> Jeff Ramsey
> Tubafor Mill, Inc.
> 




ver 3.0 opinions

2004-10-28 Thread Jeff Ramsey
Is version 3 really any better at stopping spam that 2.63? I'm running 
2.63 and my friend who owns an ISP just upgraded to ver 3, and he 
claims that 2.63 stopped more spam.

Thanks,
Jeff Ramsey
Tubafor Mill, Inc.