Re: ver 3.0 opinions
> > But shouldn't it have carried my database over from my previous install? > > I'd been using it for atleast 6 months on different versions before this > > upgrade. Did it 'forget' it all? Do I need to totally retrain it? I'm > > using a stock install from FreeBSD ports, no local/global overrides. > > Looks like somebody didn't read the UPGRADE doc... > > Due to the database format change, you will want to do something like > this when upgrading: > I did see that, but I also saw : - The Bayesian storage modules have been completely re-written and now include Berkeley DB (DBM) storage as well as SQL based storage (see sql/README.bayes for more information). In addition, a new format has been introduced for the bayes database that stores tokens in fixed length hashes (Bayes v3). ** All DBM databases should be automatically converted to this new format the first time they are opened for write. ** You can manually perform the upgrade by running "sa-learn --sync" from the command line. So I thought the rest of the instructions were if I just wanted to do it before it did it itself. Thanks, Tuc
Re: ver 3.0 opinions
Looks like somebody didn't read the UPGRADE doc... Due to the database format change, you will want to do something like this when upgrading: I read it and followed the directions and didn't see any problem for a couple days and then suddenly the spam level jumped substantially. Upon further investigation it looked like the bayes dbs had gotten corrupted and I was seeing low tok counts like the original poster had reported. Another friend saw this happen when he first upgraded, although I can't speak for his direction-following on the upgrade. If I see more of this I'll try and isolate it and file an appropriately detailed bug report, but I don't think it's necessarily accurate to immediately write this off as user error. -faisal
Re: ver 3.0 opinions
On Sat, 30 Oct 2004, Tuc at Beach House wrote: > > > himinbjorg% sa-learn --dump magic > > > 0.000 0 3 0 non-token data: bayes db version > > > 0.000 0175 0 non-token data: nspam > > > 0.000 0 73501 0 non-token data: nham > > > 0.000 01027341 0 non-token data: ntokens > > > > I think 175 spam messages is not nearly enough for Bayes to be > > adequately trained. Also, the ratio of ham to spam (~0.3%) looks a > > bit odd. If it reflects roughly equal time periods over which you > > received the messages, it suggests you might be missing (or perhaps > > misclassifying) some of your spam. > > > But shouldn't it have carried my database over from my previous install? > I'd been using it for atleast 6 months on different versions before this > upgrade. Did it 'forget' it all? Do I need to totally retrain it? I'm > using a stock install from FreeBSD ports, no local/global overrides. Looks like somebody didn't read the UPGRADE doc... Due to the database format change, you will want to do something like this when upgrading: - stop running spamassassin/spamd (ie: you don't want it to be running during the upgrade) - run "sa-learn --rebuild", this will sync your journal. if you skip this step, any data from the journal will be lost when the DB is upgraded. - upgrade SA to 3.0.0 - run "sa-learn --sync", which will cause the db format to be upgraded. if you want to see what is going on, you can add the "-D" option. - test the new database by running some sample mails through SpamAssassin, and/or at least running "sa-learn --dump" to make sure the data looks valid. - start running spamassassin/spamd again . . . . . . . . . . . . . . . Randomly generated quote: "My great concern is not whether you have failed, but whether you are content with your failure." - Abraham Lincoln
Re: ver 3.0 opinions
> > himinbjorg% sa-learn --dump magic > > 0.000 0 3 0 non-token data: bayes db version > > 0.000 0175 0 non-token data: nspam > > 0.000 0 73501 0 non-token data: nham > > 0.000 01027341 0 non-token data: ntokens > > I think 175 spam messages is not nearly enough for Bayes to be > adequately trained. Also, the ratio of ham to spam (~0.3%) looks a > bit odd. If it reflects roughly equal time periods over which you > received the messages, it suggests you might be missing (or perhaps > misclassifying) some of your spam. > But shouldn't it have carried my database over from my previous install? I'd been using it for atleast 6 months on different versions before this upgrade. Did it 'forget' it all? Do I need to totally retrain it? I'm using a stock install from FreeBSD ports, no local/global overrides. Thanks Tuc
Re: ver 3.0 opinions
On Fri, 29 Oct 2004, "Tuc at Beach House" <[EMAIL PROTECTED]> said: > > > The reason I joined was that I recently upgraded my FreeBSD box from 2.64 > > > to 3.0.1_1 (Not sure what about it makes it _1, but thats ok) > > > > > > As soon as I did, the amount of spam I started getting as > > > good deliveries SKYROCKETED. I thought maybe it was just me, but it > > > appears at least *1* other person has seen this. > > If you're logged in as the same user SpamAssassin's running as, > > what does "sa-learn --dump magic" tell you? On Sat, 30 Oct 2004, Tuc at Beach House wrote: > The one I notice getting the worst spam (Since its my personal > primary email address): > > himinbjorg% sa-learn --dump magic > 0.000 0 3 0 non-token data: bayes db version > 0.000 0175 0 non-token data: nspam > 0.000 0 73501 0 non-token data: nham > 0.000 01027341 0 non-token data: ntokens I think 175 spam messages is not nearly enough for Bayes to be adequately trained. Also, the ratio of ham to spam (~0.3%) looks a bit odd. If it reflects roughly equal time periods over which you received the messages, it suggests you might be missing (or perhaps misclassifying) some of your spam. -- Theodore (Ted) Heise <[EMAIL PROTECTED]> Bloomington, IN, USA
Re: ver 3.0 opinions
> > > On Fri, 29 Oct 2004 17:49:07 -0400 (EDT), "Tuc at Beach House" > <[EMAIL PROTECTED]> said: > > The reason I joined was that I recently upgraded my FreeBSD box from 2.64 > > to 3.0.1_1 (Not sure what about it makes it _1, but thats ok) > > > > As soon as I did, the amount of spam I started getting as > > good deliveries SKYROCKETED. I thought maybe it was just me, but it > > appears at least *1* other person has seen this. > > :-) well, it depends on a *lot* of possibilities, and you haven't given > us a lot of information to work with. > Sorry, I was just joining into the thread and wasn't sure if people were saying alot about what the problem was, or just saying "In general I see this". > > You should probably look at the > scores that SA is giving your spam and try to figure out what's > different. > Just seems like its less. Hitting 4.6/4.7 out of 5, instead of higher numbers I'm used to. > > Is Bayes working for you? > Looks like it, yes. I'm seeing BAYES_ at times. > > You might also want to adjust the > default scores - version 3.0 lowered the Bayes scores quite a bit. More > than it should have, IMHO. > Is there somewhere I can find what the old ones are? Do I just find the CVS of 50_scores.cf or re-download the 2.64 and copy the BAYES_ into my local.cf ? > > Are your network tests still working? > I think so. I see RCVD_IN_BL_SPAMCOP, RCVD_IN_SBL type things. > > If > you're logged in as the same user SpamAssassin's running as, what does > "sa-learn --dump magic" tell you? > The one I notice getting the worst spam (Since its my personal primary email address): himinbjorg% sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0175 0 non-token data: nspam 0.000 0 73501 0 non-token data: nham 0.000 01027341 0 non-token data: ntokens 0.000 0 1076532302 0 non-token data: oldest atime 0.000 0 1099138682 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count This is an email address I used to use and just got too much spam so I stopped using it : himinbjorg% sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 2292 0 non-token data: nspam 0.000 0 288387 0 non-token data: nham 0.000 0 135841 0 non-token data: ntokens 0.000 0 1098946805 0 non-token data: oldest atime 0.000 0 1099139048 0 non-token data: newest atime 0.000 0 1099138692 0 non-token data: last journal sync atime 0.000 0 1099088153 0 non-token data: last expiry atime 0.000 0 141462 0 non-token data: last expire atime delta 0.000 0 54603 0 non-token data: last expire reduction count Thanks, Tuc
Re: ver 3.0 opinions
On Fri, 29 Oct 2004 17:49:07 -0400 (EDT), "Tuc at Beach House" <[EMAIL PROTECTED]> said: > The reason I joined was that I recently upgraded my FreeBSD box from 2.64 > to 3.0.1_1 (Not sure what about it makes it _1, but thats ok) > > As soon as I did, the amount of spam I started getting as > good deliveries SKYROCKETED. I thought maybe it was just me, but it > appears at least *1* other person has seen this. :-) well, it depends on a *lot* of possibilities, and you haven't given us a lot of information to work with. You should probably look at the scores that SA is giving your spam and try to figure out what's different. Is Bayes working for you? You might also want to adjust the default scores - version 3.0 lowered the Bayes scores quite a bit. More than it should have, IMHO. Are your network tests still working? If you're logged in as the same user SpamAssassin's running as, what does "sa-learn --dump magic" tell you? -- snowjack(a)fastmail.fm
Re: ver 3.0 opinions
> > On Thu, 28 Oct 2004, Jeff Ramsey wrote: > > > Is version 3 really any better at stopping spam that 2.63? I'm running 2.63 > > and my friend who owns an ISP just upgraded to ver 3, and he claims that > > 2.63 > > stopped more spam. > > > > My experience is that (aside from spamd memory issues), v3 is much > better at catching spam than 2.63 was (out of the box). > > With 2.64, we avaraged about 10 or so spams below threshold (5.0). > Now it's about 1, and some days, none :) Worth the upgrade IMO. > Sorry to jump in the middle of this convo. I just joined this list after using it for a few months. The reason I joined was that I recently upgraded my FreeBSD box from 2.64 to 3.0.1_1 (Not sure what about it makes it _1, but thats ok) As soon as I did, the amount of spam I started getting as good deliveries SKYROCKETED. I thought maybe it was just me, but it appears atleast *1* other person has seen this. Was there a major change in the rules or something? My install is/was out of the ports collection of FreeBSD which I didn't think had been "tainted" any. Should I maybe clear my db files? Any help is appreciated, otherwise I'll fall back to the previous version. Thanks, Tuc
Re: ver 3.0 opinions
On Thu, 28 Oct 2004, Bart Schaefer wrote: On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey <[EMAIL PROTECTED]> wrote: Is version 3 really any better at stopping spam that 2.63? [...] Using it in local only mode, though, I've found it not very different. The spams that get through 3.x that do not get through 2.6x are generally (a) those that match BAYES_99, which by itself in the default configuration is no longer a large enough score to make me happy, or True. Some spam we get is soley BAYES_99. I've bumped it back up to 5.2 (like in 2.6x). -- Jon Trulsonmailto:[EMAIL PROTECTED] ID: 1A9A2B09, FP: C23F328A721264E7 B6188192EC733962 PGP keys at http://radscan.com/~jon/PGPKeys.txt #include "I am Nomad." -Nomad
Re: ver 3.0 opinions
On Thu, 28 Oct 2004, Jeff Ramsey wrote: Is version 3 really any better at stopping spam that 2.63? I'm running 2.63 and my friend who owns an ISP just upgraded to ver 3, and he claims that 2.63 stopped more spam. My experience is that (aside from spamd memory issues), v3 is much better at catching spam than 2.63 was (out of the box). With 2.64, we avaraged about 10 or so spams below threshold (5.0). Now it's about 1, and some days, none :) Worth the upgrade IMO. -- Jon Trulsonmailto:[EMAIL PROTECTED] ID: 1A9A2B09, FP: C23F328A721264E7 B6188192EC733962 PGP keys at http://radscan.com/~jon/PGPKeys.txt #include "I am Nomad." -Nomad
Re: ver 3.0 opinions
Thanks for the help and info. I'll tell my friend why his 3.0 install is letting more spam through and he has autolearn turned on, so his should get better. As for me, I'll upgrade to at least version 2.64. I use lots of the custom rules - antidrug.cf, etc. If I upgrade SA to 3.0, can I still use my SQL custom rules score table to alter the scoring for the rules, which I think would give me the best of both worlds? Thanks, Jeff Ramsey Tubafor Mill, Inc.
Re: ver 3.0 opinions
On Thu, 28 Oct 2004 16:19:13 -0700, "Bart Schaefer" <[EMAIL PROTECTED]> said: > On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey <[EMAIL PROTECTED]> > wrote: > > Is version 3 really any better at stopping spam that 2.63? > > Version 3 stops different spam than 2.63, in my experience so far. > E.g. it's better at catching the drug spam but not as good at the > "earn cash for making phone calls" spam. I would say that the *default* config of 3.X is significantly more effective than the default config of 2.63 or 2.64. But I think after some tweaking of 2.64 it's probably just as effective as 3.0, once you've added in the SpamCopURI patch, antidrug.cf, some of the other SARE custom rulesets. Both of them depend to a large extent on how much care you put into setting it up, training BAYES properly, etc. > Using it in local only mode, though, I've found it not very different. > The spams that get through 3.x that do not get through 2.6x are > generally (a) those that match BAYES_99, which by itself in the > default configuration is no longer a large enough score to make me > happy, or (b) would have been tagged as spam except that the AWL > "smoothed" them down to just below the threshhold. Yup. But I wouldn't turn off AWL if I were you. I think it's a very nice feature, and has probably prevented a few false positives for me. Yes, occasionally it will pull the score the wrong way across the threshold, but if it's doing that, you're better off figuring out why this person's messages get an *average* score on the wrong side of your threshold anyway. If you get that fixed, then the AWL will stop pulling messages the wrong way across your threshold. I do customize my BAYES scores because I'm not very happy with the defaults. I find that a significant portion of spam manages to reduce its Bayes probability to 40-60% by including large chunks of innocent text at the end of the message. So I add the below to my lines to local.cf, because most of my ham scores below 10%, while the messages that hit Bayes' 40-60% range are more than 95% spam. This still catches those spams which hit BAYES_99, and for that rare ham that hits BAYES_99 (I've never seen one, but I suppose they must be out there), the AWL or another whitelist rule will hopefully pull it back under five points. score BAYES_00 -4.9 score BAYES_01 -2.0 score BAYES_10 -1.5 score BAYES_20 -1.0 score BAYES_30 -0.5 score BAYES_40 0.1 score BAYES_44 0.7 score BAYES_50 1.0 score BAYES_56 1.5 score BAYES_60 2.1 score BAYES_70 3.1 score BAYES_80 4.2 score BAYES_90 4.9 score BAYES_99 5.4 -- snowjack(a)fastmail.fm
Re: ver 3.0 opinions
At 06:21 PM 10/28/2004, Jeff Ramsey wrote: Is version 3 really any better at stopping spam that 2.63? I'm running 2.63 and my friend who owns an ISP just upgraded to ver 3, and he claims that 2.63 stopped more spam. As far as an "out of the box" configuration goes, I'd say 3.0 is orders of magnitude better than the 2.63 version. (Actually why ANYONE is still running 2.63 is a mystery to me. 2.63 is vulnerable to a DoS attack, and you should be running 2.64 if you want your servers to stay in the 2.6x series.) I'd also say that if you're a default-rules non-bayes user, 3.0 is going to be a HUGE improvement. However, for someone who's got bayes going and all the add-on rules and packages they can find (antidrug, surbl, rulesemporium, etc) they're likely to experience some reduced hits when they upgrade to 3.x. The main reason here is that SA 3.0 has a lot of these add-ons integrated, and some of the scores that came out of the GA are a less aggressive than the ones made by some authors as they hand hand-score their rules. The less aggressive scoring is going to cause more spam misses, but it's also going to reel in the FP rate. The scores are now balanced with the scores of the other rules, not adding score on top of an already balanced system. Go figure, if you add rules to a balanced system without rebalancing, the average score of all messages, spam and nonspam, goes up. You catch more spam, and you catch more FPs. SA 3.0 is also significantly less aggressive in bayes scoring, and I think this is largely a reflection of the increased accuracy of the rules picking up a lot of the slack which 2.5x and 2.6x left for bayes to take care. Let's face it. 2.5x and 2.6x had pretty lousy default rule sets, but the power of bayes made the system still catch spam pretty well. 2.6x without any bayes or add-on rules is pretty much hopelessly ineffective against current spam. With bayes or add-ons it works pretty well. With both it works very well at catching spam, but also has a much higher FP ratio than the SA devs are willing to accept for a shipping stable release. (The general rule for the scoring system is FPs are 100 times worse than FNs)
Re: ver 3.0 opinions
On Thu, 28 Oct 2004 15:21:59 -0700, Jeff Ramsey <[EMAIL PROTECTED]> wrote: > Is version 3 really any better at stopping spam that 2.63? Version 3 stops different spam than 2.63, in my experience so far. E.g. it's better at catching the drug spam but not as good at the "earn cash for making phone calls" spam. If you use full network tests, I suspect 3.001 (did I get enough zeroes? too many?) is actually better than 2.63/2.64. Using it in local only mode, though, I've found it not very different. The spams that get through 3.x that do not get through 2.6x are generally (a) those that match BAYES_99, which by itself in the default configuration is no longer a large enough score to make me happy, or (b) would have been tagged as spam except that the AWL "smoothed" them down to just below the threshhold. I confess with some embarrassment that I haven't yet looked into how to turn off the AWL in spamd. Statement (b) above comes from running the same messages through "spamassassin -t" and having them marked as spam with the only difference in the latter case being the absence of an AWL hit.
Re: ver 3.0 opinions
I noticed a significant improvement with 3.0 - especially with drugs related messages. On 29/10/04 8:21 AM, "Jeff Ramsey" <[EMAIL PROTECTED]> wrote: > Is version 3 really any better at stopping spam that 2.63? I'm running > 2.63 and my friend who owns an ISP just upgraded to ver 3, and he > claims that 2.63 stopped more spam. > > Thanks, > Jeff Ramsey > Tubafor Mill, Inc. >
ver 3.0 opinions
Is version 3 really any better at stopping spam that 2.63? I'm running 2.63 and my friend who owns an ISP just upgraded to ver 3, and he claims that 2.63 stopped more spam. Thanks, Jeff Ramsey Tubafor Mill, Inc.