Re: dealing with SPF and external authenticated users
What would be the correct way of dealing with this situation ? As a workaround I have used whitelist_from_rvc [EMAIL PROTECTED], which seems to be a great workaround, because I have rules in postfix that do not allow external users that do NOT authenticate to send messages with my own domain, not even to my local users. There's nothing wrong with that solution since you have Postfix setup to refuse mail to local address from un-auth'd users. I implemented a similar setup a while ago, and it turned out that some legit (although suspiciously looking) mails from ebay were blocked. I had to whitelist ebay there.. This particular user is no longer there, so I dont know whether ebay have revised these mails since Wolfgang Hamann
Another URL obfuscation
I found this obfuscated URL in a drug spam: A href=3Dhttp://gozifo .upze5otbbutzanbb655k685ys5nn%2Eridgykh= comFONT SIZE=3D2/FONT Larry R.
Re: Another URL obfuscation
On Tuesday, January 10, 2006, 6:17:38 AM, Larry Rosenbaum wrote: I found this obfuscated URL in a drug spam: A href=3Dhttp://gozifo .upze5otbbutzanbb655k685ys5nn%2Eridgykh= comFONT SIZE=3D2/FONT Good grief, does any mail client actually parse that as a functional URI? Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
RE: rules better than bayes?
Title: RE: rules better than bayes? -Original Message- From: jo3 [mailto:[EMAIL PROTECTED]] Sent: Monday, January 09, 2006 2:28 PM To: users@spamassassin.apache.org Subject: rules better than bayes? Hi, This is an observation, please take it in the spirit in which it is intended, it is not meant to be flame bait. After using spamassassin for six solid months, it seems to me that the bayes process (sa-learn [--spam | --ham]) has only very limited success in learning about new spam. Regardless of how many spams and hams are submitted, the effectiveness never goes above the default level which, in our case here, is somewhere around 2 out of 3 spams correctly identified. By the same token, after adding the third party rule, airmax.cf, the effectiveness went up to 99 out of 100 spams correctly identified. I have long said that IMHO, I do not think bayes is worth it. Left unattended, it isn't as good. A simple rule can take out a lot of spam. Some may say rule writing is more complicated then training bayes. Maybe. Not so much the rule writing, but the figuring out what to look for and testing for FPs. I do not run Bayes for our company. Obviously I'm partial to URIBL.com and SARE rules ;) I get about 98% of spam caught, and little FPs. This is going to sound like tooting our own horn, but so be it. Before SARE, Bayes was cool. After SARE, I see no need. Chris Santerre SysAdmin and SARE/URIBL ninja http://www.uribl.com http://www.rulesemporium.com
Re: Another URL obfuscation
* Jeff Chan wrote (10/01/2006 15:42): On Tuesday, January 10, 2006, 6:17:38 AM, Larry Rosenbaum wrote: I found this obfuscated URL in a drug spam: A href=3Dhttp://gozifo .upze5otbbutzanbb655k685ys5nn%2Eridgykh= comFONT SIZE=3D2/FONT Good grief, does any mail client actually parse that as a functional URI? Yes. In your e-mail, my Thunderbird created a clickable link to http://gozifo My IE gives a DNS error when it tries that address. My FireFox redirects to http://www.google.com/search?btnI=I%27m+Feeling+Luckyie=UTF-8oe=UTF-8q=gozifo which in turn redirects to http://www.vojir.com/other/basic-myebol.html which gives a 404 error. It's probably possible to turn this (mis)feature off in FireFox, but there it is by default. I have no idea whether this is the original intention of the obfuscation. I would guess not - and if it's viewed as html to start with that might make a difference. Chris
RE: rules better than bayes?
At 10:50 AM 1/10/2006, Chris Santerre wrote: I have long said that IMHO, I do not think bayes is worth it. Left unattended, it isn't as good. A simple rule can take out a lot of spam. Some may say rule writing is more complicated then training bayes. Maybe. Not so much the rule writing, but the figuring out what to look for and testing for FPs. Interesting.. For me, BAYES_99 is right between SURBL and URIBL in terms of hits. (And has 98.91% of URIBL's total hits) I find it completely indispensable. I rarely train manually, except at initial setup where I feed it a good base learning. (the autolearner can sometimes go awry if you don't train some mail manually before letting it go.) On a day to day basis I mostly feed automatically with a cronjob that collects mail via spamtraps and hamtraps. I have that coupled with autolearning that's set a bit differently than the defaults. (IMNSHO, having a ham learning threshold that's positive is suicide, but I also have a large number of small negative-score rules so I can keep my threshold at -0.01 and actually autolearn some ham). This setup is near zero maintenance, and highly effective. I can't see why it wouldn't be worth it. It's almost as good as turning on URIBLs and not much more work. It's certainly much less work than rule writing. The last time I bothered to tinker with my bayes was before Christmas.
Re: SA 3.10 skipping some emails or errors in log??
On Mon, 09 Jan 2006 21:45:11 -0500, you wrote: On 09/01/2006 7:36 PM, George R. Kasica wrote: Jan 9 15:31:07 eagle spamd[8420]: spamd: processing message [EMAIL PROTECTED] for mail:561 Jan 9 15:34:55 eagle spamd[8715]: __alarm__ Jan 9 15:35:01 eagle spamd[8715]: __alarm__ Jan 9 15:35:01 eagle spamd[8311]: prefork: child states: BBBIB Jan 9 15:35:02 eagle spamd[8719]: spamd: processing message [EMAIL PROTECTED] for mail:561 Jan 9 15:35:12 eagle spamd[8311]: tcp timeout at /usr/local/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/SpamdForkScaling.pm line 195. Jan 9 15:35:12 eagle spamd[8311]: tcp timeout at /usr/local/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/SpamdForkScaling.pm line 195. Jan 9 15:35:12 eagle spamd[8311]: prefork: select returned undef! recovering Jan 9 15:35:48 eagle spamd[8712]: spamd: clean message (0.0/5.0) for mail:561 in 186.2 seconds, 14503 bytes. Jan 9 15:35:48 eagle spamd[8712]: spamd: result: . 0 - HTML_MESSAGE scantime=186.2,size=14503,user=mail,uid=561,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=54421,mid=[EMAIL PROTECTED],autolearn=disabled Please see, and comment on, bug 4696: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4696 Daryl Daryl: Just curious as to the estimate for how long it will be until the problem is corrected? Right now with the way SA 3.1 is operating here it is almost worthless, catching and scanning about 20% of the spam due to the bug causing difficulties I'm assuming? George ===[George R. Kasica]===+1 262 677 0766 President +1 206 374 6482 FAX Netwrx Consulting Inc. Jackson, WI USA http://www.netwrx1.com [EMAIL PROTECTED] ICQ #12862186
RE: rules better than bayes?
Hi Matt, I'm interested in how your setup compares to mine. I also find Bayes very useful, but I haven't gotten it to work as well as what you've described. Interesting.. For me, BAYES_99 is right between SURBL and URIBL in terms of hits. (And has 98.91% of URIBL's total hits) I find it completely indispensable. Are you using a single site-wide database, or is this a per-user setup? I rarely train manually, except at initial setup where I feed it a good base learning. (the autolearner can sometimes go awry if you don't train some mail manually before letting it go.) The trouble I had with the autolearner was that some spammers would send innocuous mail through to raise their scores until Bayes decided they were ok, then start spamming. That was a couple of versions back, does that sort of thing no longer work? On a day to day basis I mostly feed automatically with a cronjob that collects mail via spamtraps and hamtraps. I have that coupled with autolearning that's set a bit differently than the defaults. (IMNSHO, having a ham learning threshold that's positive is suicide, but I also have a large number of small negative-score rules so I can keep my threshold at -0.01 and actually autolearn some ham). I'd love to make my Bayesian database more effective, is there a doc somewhere that describes how you tuned it to your environment?
Re: rules better than bayes?
Aaron Grewell wrote: Hi Matt, I'm interested in how your setup compares to mine. I also find Bayes very useful, but I haven't gotten it to work as well as what you've described. Interesting.. For me, BAYES_99 is right between SURBL and URIBL in terms of hits. (And has 98.91% of URIBL's total hits) I find it completely indispensable. Are you using a single site-wide database, or is this a per-user setup? Single site-wide.. I use mailscanner which does not support per-user, but I'm not really looking for it. I rarely train manually, except at initial setup where I feed it a good base learning. (the autolearner can sometimes go awry if you don't train some mail manually before letting it go.) The trouble I had with the autolearner was that some spammers would send innocuous mail through to raise their scores until Bayes decided they were ok, then start spamming. That was a couple of versions back, does that sort of thing no longer work? Erm, that really shouldn't affect the bayes autolearner.. perhaps you are thinking of the AWL? I don't run the AWL for this very reason. On a day to day basis I mostly feed automatically with a cronjob that collects mail via spamtraps and hamtraps. I have that coupled with autolearning that's set a bit differently than the defaults. (IMNSHO, having a ham learning threshold that's positive is suicide, but I also have a large number of small negative-score rules so I can keep my threshold at -0.01 and actually autolearn some ham). I'd love to make my Bayesian database more effective, is there a doc somewhere that describes how you tuned it to your environment? Not really.. but it's not hard. Spamtraps and hamtraps: --- 1) create a secret hamtrap email account. Subscribe this account to newsletters and news feeds that your users typically subscribe to. Do not post this address around, and don't use hamtrap as the account name, it's too obvious. 2) create a spamtrap account, or several of them. Carefully seed this out in the body of some Usenet and mailing list postings. 3) create a cron-job that auto-feeds the above mail to sa-learn. Simple example fragment of the script I use (it keeps a rotating archive of the past 5 learning sessions): #!/bin/sh cd /var/spool/training/ if [ -f /var/spool/mail/spamtrap ]; then echo learning spam mailbox - spamtrap mv /var/spool/mail/spamtrap . /usr/bin/sa-learn --spam --mbox spamtrap rm spam/spamtrap.alearn5.gz mv spam/spamtrap.alearn4.gz spam/spamtrap.alearn5.gz mv spam/spamtrap.alearn3.gz spam/spamtrap.alearn4.gz mv spam/spamtrap.alearn2.gz spam/spamtrap.alearn3.gz gzip spam/spamtrap.alearn1 mv spam/spamtrap.alearn1.gz spam/spamtrap.alearn2.gz mv spamtrap spam/spamtrap.alearn1 fi 4) Carefully monitor the data being fed for a while (two weeks or so) to make sure there's no pollution. After it's established you can monitor it less often. Autolearn adjustment: 1) add bayes_auto_learn_threshold_nonspam -0.01 to your local.cf 2) create a bayes_hamlearning.cf file. Create several simple body text rules with catch phrases from your normal nonspam. Assign these rules very small negative scores (-0.01 to -0.1). This is generally easier in a corporate environment, but it can be done in academic too. body LOCAL_THESIS /\bThesis\b/i score LOCAL_THESIS -0.01 You have to keep the scores small, as you don't want to use these to whitelist spam mail. You merely want to make mail that would otherwise score 0 earn a small negative score if it's got some of these phrases in it. It's not perfect, but it's better than blindly learning everything under 0.5. I feel learning as ham should be earned, not a default for not hitting any rules at all. The problem is this requires some customization. This can't be a default setup of SA as the catch phrases vary from place to place, and if there was a default set of them spammers would be sure to always include them, making them pointless. You'd effectively have the same thing as the current default, by avoiding spam rules and existing bayes tokens they can get a message learned.
Re: rules better than bayes?
Aaron Grewell wrote: Hi Matt, I'm interested in how your setup compares to mine. I also find Bayes very useful, but I haven't gotten it to work as well as what you've described. Interesting.. For me, BAYES_99 is right between SURBL and URIBL in terms of hits. (And has 98.91% of URIBL's total hits) I find it completely indispensable. Are you using a single site-wide database, or is this a per-user setup? Im not matt, but running a very similar setup which works very well so i thought i would comment also. Im running a single sitewide database. All mail is processed under my spamd user. I rarely train manually, except at initial setup where I feed it a good base learning. (the autolearner can sometimes go awry if you don't train some mail manually before letting it go.) The trouble I had with the autolearner was that some spammers would send innocuous mail through to raise their scores until Bayes decided they were ok, then start spamming. That was a couple of versions back, does that sort of thing no longer work? I rarely train manually as well. The only ones i train (and its only because there is nothing else to train) are spam which are correctly identified as such but have autolearn=no because they did not meet the autolearn criteria. These almost always have BAYES_99 and a score of 20 or so but most likely did not have enough header points to autolearn it. I didnt even start training my database manually. I started from scratch and let the autolearner do its thing. I have never had to correct what it did because it was always always right. The poison that spammers like to include in messages doesnt appear to have any affect on the overall outcome of the bayes score. I dont really know why this is, it just works. NOTE: to operate in this fashion i believe it is imperative that you change the autolearn thresholds. The defaults are dangerous! (atleast in 2.64 which i still run). I have mine set as such: bayes_auto_learn_threshold_nonspam -0.1 bayes_auto_learn_threshold_spam 10.0 To this date (been running over 2 years) i have yet to see the autolearner misclassify. Most bayes hits are the far extremes (bayes_99 and bayes_0) with only a few in the 80-90 range. On a day to day basis I mostly feed automatically with a cronjob that collects mail via spamtraps and hamtraps. I have that coupled with autolearning that's set a bit differently than the defaults. (IMNSHO, having a ham learning threshold that's positive is suicide, but I also have a large number of small negative-score rules so I can keep my threshold at -0.01 and actually autolearn some ham). I'd love to make my Bayesian database more effective, is there a doc somewhere that describes how you tuned it to your environment? I doubt there is anything that specific and if there was, it most likely wouldnt help you in your situation. There are general tuning notes on the SA website and such but you really just have to try and see what works and what doesnt in your setup. What works well for 1 person may not work at all for someone else. -Jim
Re: rules better than bayes?
Aaron Grewell wrote: The trouble I had with the autolearner was that some spammers would send innocuous mail through to raise their scores until Bayes decided they were ok, then start spamming. That was a couple of versions back, does that sort of thing no longer work? Are you sure this is Bayes-related? Bayes looks at the entire message, not just the sender. All I'd expect this tactic to do would be to make future innocuous mail look more innocuous -- it shouldn't have any significant impact on spammy mail from the same source since the content will be different. -- Kelson Vibber SpeedGate Communications, www.speed.net
RE: Another URL obfuscation
Chris Lear wrote: http://gozifo My IE gives a DNS error when it tries that address. My FireFox redirects to http://www.google.com/search?btnI=I%27m+Feeling+Luckyie=UTF-8oe=UTF-8q=gozifo which in turn redirects to http://www.vojir.com/other/basic-myebol.html which gives a 404 error. It's probably possible to turn this (mis)feature off in FireFox, but there it is by default. about:config Set the keyword.enabled pref to false Or change the keyword.URL pref to another URL of your choosing -- Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer
OT: Using ldap_routing in sendmail to verify GroupWise Recipients before SA
I'm trying to configure sendmail to perform recipient verification by using ldap_routing in order to reduce the number of messages that need to be scanned by Guinevere and SpamAssassin. Our configuration is similar to the setup discussed in comp.mail.sendmail, readable here: http://www.issociate.de/board/post/266566/check_users_and_forward_to_an_other_mail_server.html I've basically been successful in setting this up in a test environment as follows: FEATURE(`ldap_routing',`null',`ldap -1 -TTMPF -v mail -k mail=%0',`bounce')dnlLDAPROUTE_DOMAIN(hfcc.edu)dnlLDAPROUTE_DOMAIN(hfcc.net)dnlLDAPROUTE_DOMAIN(henryford.cc.mi.us)dnlLDAPROUTE_DOMAIN(mail.henryford.cc.mi.us)dnldefine(`confLDAP_DEFAULT_SPEC', `-h hostname -b o=org -s sub')dnl There's only one problem. We have multiple domains (as you can see above) and yet each user only has one domain in their mail attribute. I don't need to route, just verify existance and drop non-matches. I can't find any documentation on the parameters for ldap_routing except that -v and -k are required fields and a couple of examples here and there. So here's my question: It's apparent that %0 is the recipient's email address. If there was an easy way to only check the lhs of the address, I could compare it against a different attribute and it would match all possible domains, and that would be good enough. I don't know enough about sendmail rule hacking to do this, but I'm sure it can be done.
RE: rules better than bayes?
Erm, that really shouldn't affect the bayes autolearner.. perhaps you are thinking of the AWL? I don't run the AWL for this very reason. Oh yeah. I was thinking of the AWL. NM. The problem is this requires some customization. This can't be a default setup of SA as the catch phrases vary from place to place, and if there was a default set of them spammers would be sure to always include them, making them pointless. You'd effectively have the same thing as the current default, by avoiding spam rules and existing bayes tokens they can get a message learned. That all makes sense. I'll give it a shot. Thanks! -Aaron
RE: rules better than bayes?
Im not matt, but running a very similar setup which works very well so i thought i would comment also. Im running a single sitewide database. All mail is processed under my spamd user. OK, that's basically what I'm doing too. I rarely train manually as well. NOTE: to operate in this fashion i believe it is imperative that you change the autolearn thresholds. The defaults are dangerous! (atleast in 2.64 which i still run). I have mine set as such: bayes_auto_learn_threshold_nonspam -0.1 bayes_auto_learn_threshold_spam 10.0 OK, Matt said something similar about the thresholds. Mine are default so that may be part of the issue. Thanks for the feedback! -Aaron
Re: rules better than bayes?
Bayes would be much good if not for the rules to create a basic compass as to what is spam and not spam. The rules in a large part is what makes bayes work.
Getting Exim to read SA MySQL AWL Database to reduce load
Has anyone tried to get Exim to read the MySQL database of SA? The reason I'm asking is that I'm thinking that under load conditions Exim could read the AWL database and bypass SA on matches with very high scores (just rejecting them) and messages with very low scores (just accepting them bypassing SA). The idea is to give overloaded servers better performance during rush hour and let the learner work when things are slower. Anyone done this? -- Marc Perkel - [EMAIL PROTECTED] Spam Filter: http://www.junkemailfilter.com My Blog: http://marc.perkel.com
Re: rules better than bayes?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matt Kettler writes: At 10:50 AM 1/10/2006, Chris Santerre wrote: I have long said that IMHO, I do not think bayes is worth it. Left unattended, it isn't as good. A simple rule can take out a lot of spam. Some may say rule writing is more complicated then training bayes. Maybe. Not so much the rule writing, but the figuring out what to look for and testing for FPs. Interesting.. For me, BAYES_99 is right between SURBL and URIBL in terms of hits. (And has 98.91% of URIBL's total hits) I find it completely indispensable. The thing is, Bayes is a tool for personalization -- and as such, its effectiveness varies widely depending on what *you* do with it. For what it's worth, I've *never* trained my current Bayes DB, and have been running with it for about 6 months I think. I get BAYES_00 on most ham, and BAYES_99 on most spam. But the 4 letters that matter with Bayes are: YMMV ;) - --j. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Exmh CVS iD8DBQFDxAWfMJF5cimLx9ARAvvfAJwIiQQpAzBPYNEKnQiWLw4NMmxZewCfTxEg qquh5FGGGQFwFU6TdOlIDi0= =CcrR -END PGP SIGNATURE-
Re: rules better than bayes?
Good evening, Justin, all, On Tue, 10 Jan 2006, Justin Mason wrote: -(Modified PGP heading)- Hash: SHA1 Matt Kettler writes: At 10:50 AM 1/10/2006, Chris Santerre wrote: I have long said that IMHO, I do not think bayes is worth it. Left unattended, it isn't as good. A simple rule can take out a lot of spam. Some may say rule writing is more complicated then training bayes. Maybe. Not so much the rule writing, but the figuring out what to look for and testing for FPs. Interesting.. For me, BAYES_99 is right between SURBL and URIBL in terms of hits. (And has 98.91% of URIBL's total hits) I find it completely indispensable. The thing is, Bayes is a tool for personalization -- and as such, its effectiveness varies widely depending on what *you* do with it. For what it's worth, I've *never* trained my current Bayes DB, and have been running with it for about 6 months I think. I get BAYES_00 on most ham, and BAYES_99 on most spam. But the 4 letters that matter with Bayes are: YMMV Isn't that an OTCBB Ticker symbol? I heard they're about to go through the _roof_!! /me ducks... Cheers, - Bill --- We don't want an election without a paper trail...all three owners of the companies who make these machines are donors to the Bush administration. Is this not corruption? -- Gore Vidal (Courtesy of http://www.laweekly.com/ink/03/52/features-cooper.php) -- William Stearns ([EMAIL PROTECTED]). Mason, Buildkernel, freedups, p0f, rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org --
Re: Another URL obfuscation
A href=3Dhttp://gozifo .upze5otbbutzanbb655k685ys5nn%2Eridgykh= comFONT SIZE=3D2/FONT Ooooh, cute! Breaks a lot of regex scanners that are looking for the end of the href record! First time I've seen those in html; I've been seeing them in plain text for a week or two. Loren
Re: rules better than bayes? Certainly better than mine.
Andrew Donkin wrote: Jim Maul [EMAIL PROTECTED] writes: NOTE: to operate in this fashion i believe it is imperative that you change the autolearn thresholds. The defaults are dangerous! (atleast in 2.64 which i still run). I have mine set as such: bayes_auto_learn_threshold_nonspam -0.1 bayes_auto_learn_threshold_spam 10.0 Matt agreed. Aaron was going to change to something similar. Before reading this thread, I did the opposite. I changed my nonspam threshold from -0.2 to the default 0.1 because Bayes I thought (mistakenly perhaps) that the Bayes database's spam:ham ratio was far too high. Incoming mail is about 3:1, but the Bayes database was more like 20:1. See: 3 bayes db version 1491805 nspam 75795 nham 1081029 ntokens 1136779207 oldest atime 1136925099 newest atime 1136925026 last journal sync atime 1136838312 last expiry atime 43200 last expire atime delta 25087 last expire reduction count I started autolearning with the defaults and then quickly changed my thresholds as mentioned before. Our server here doesnt see a lot of spam (hell it doesnt even see a lot of mail total) so our ratios are obviously going to be different. Mine shows: 2 0 non-token data: bayes db version 26378 0 non-token data: nspam 54313 0 non-token data: nham 147479 0 non-token data: ntokens 1134172970 0 non-token data: oldest atime 1136925620 0 non-token data: newest atime 1136925554 0 non-token data: last journal sync atime 1136232703 0 non-token data: last expiry atime 2060396 0 non-token data: last expire atime delta 34608 0 non-token data: last expire reduction count In particular, a message from James Keating of this list received this report from Bayes: X-Spam-Bayes-ham: 0.011-8--5h-0s--19d--SpamAssassin, 0.026-3--2h-0s--19d--autolearn, 0.029-203--156h-39s--19d--5.0, 0.031-7--5h-1s--19d--spamassassin, 0.050-4162--3796h-1707s--0d--i'm X-Spam-Bayes-spam: 1.000-149--0h-6920s--1d--HX-Accept-Language:en-us, 1.000-27--0h-1229s--18d--H*UA:Thunderbird, 1.000-24--0h-1083s--18d--H*u:Thunderbird, 1.000-16--0h-718s--0d--H*RU:sk:cpe-24-, 1.000-13--0h-594s--11d--H*r:sk:cpe-24- ...implying that User-agent: Thunderbird was in a thousand spams but no hams. And that Accept-Language:en-us was in 6900 spams and no hams. ! So, I'm thinking that my Bayes is hosed again. Will a hamtrap help me here? Im not sure, i've never seen this report before and i certainly dont have the same message to compare what it scored on my system here. Have you noticed bayes misclassifying messages because of this, or are you speaking theoretically? A huge ratio alone does not imply a problem, its the results that matter. I'm CCing you, Jim, because my last two posts to the list vanished without a trace. Not a problem. Just not sure how much help i am in this situation... -Jim
Re: SA 3.10 skipping some emails or errors in log??
On Tue, 10 Jan 2006 18:58:37 -0500, you wrote: On 10/01/2006 11:29 AM, George R. Kasica wrote: On Mon, 09 Jan 2006 21:45:11 -0500, you wrote: Please see, and comment on, bug 4696: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4696 Just curious as to the estimate for how long it will be until the problem is corrected? Right now with the way SA 3.1 is operating here it is almost worthless, catching and scanning about 20% of the spam due to the bug causing difficulties I'm assuming? If you can get a strace -ftp PID of the parent spamd process while this happens (along with a matching debug log) and *attach* it to the bug, I'm sure Justin would take a look at it. I haven't been able to reproduce it myself, so I haven't looked at it further. Daryl: Not a programmer here, but with a little direction I think I can get the info. I'm assuming the following here: strace -ftp PID where PID is the PID of the parent spamd process correct? As to debug log, how would I go about that? Is it the info I provided earlier just doing it over again to match with strace output? George George, MR. Tibbs, Nazarene, Ginger/The Beast Kasica(8/1/88-3/19/01, 1/17/02-) Jackson, WI USA [EMAIL PROTECTED] http://www.netwrx1.com/georgek ICQ #12862186 (`-''-/).___..--''`-._ `6_ 6 ) `-. ( ).`-.__.`) (_Y_.)' ._ ) `._ `. ``-..-' _..`--'_..-_/ /--'_.' ,' (il),-'' (li),' ((!.-'
Re: SA 3.10 skipping some emails or errors in log??
On 10/01/2006 8:17 PM, George R. Kasica wrote: On Tue, 10 Jan 2006 18:58:37 -0500, you wrote: If you can get a strace -ftp PID of the parent spamd process while this happens (along with a matching debug log) and *attach* it to the bug, I'm sure Justin would take a look at it. I haven't been able to reproduce it myself, so I haven't looked at it further. Daryl: Not a programmer here, but with a little direction I think I can get the info. I'm assuming the following here: strace -ftp PID where PID is the PID of the parent spamd process correct? Yeah PID is the process ID of the parent spamd process. Also, you can redirect the output to a file with normal redirection, or just specify an output file with the -o option, ala: strace -ftp PID -o /path/to/output/file As to debug log, how would I go about that? Is it the info I provided earlier just doing it over again to match with strace output? Yeah. You might want to add -Dprefork as one of the options to your spamd call though. Daryl
Re: rules better than bayes?
Chris Santerre a écrit : I have long said that IMHO, I do not think bayes is worth it. Left unattended, it isn't as good. A simple rule can take out a lot of spam. Some may say rule writing is more complicated then training bayes. Maybe. Not so much the rule writing, but the figuring out what to look for and testing for FPs. I do not run Bayes for our company. Obviously I'm partial to URIBL.com and SARE rules ;) I get about 98% of spam caught, and little FPs. This is going to sound like tooting our own horn, but so be it. Before SARE, Bayes was cool. After SARE, I see no need. I think SARE and bayes are complementary: - sare will detect new spam once ninjas have found the corresponding rules. - bayes will detect new spam if it resembles previous spam. That said, I don't use SA/Bayes (I use dspam on a per-user basis, while SA is site-wide).
Re: rules better than bayes?
From: Chris Santerre [EMAIL PROTECTED] -Original Message- From: jo3 [mailto:[EMAIL PROTECTED] Hi, This is an observation, please take it in the spirit in which it is intended, it is not meant to be flame bait. After using spamassassin for six solid months, it seems to me that the bayes process (sa-learn [--spam | --ham]) has only very limited success in learning about new spam. Regardless of how many spams and hams are submitted, the effectiveness never goes above the default level which, in our case here, is somewhere around 2 out of 3 spams correctly identified. By the same token, after adding the third party rule, airmax.cf, the effectiveness went up to 99 out of 100 spams correctly identified. I have long said that IMHO, I do not think bayes is worth it. Left unattended, it isn't as good. A simple rule can take out a lot of spam. Some may say rule writing is more complicated then training bayes. Maybe. Not so much the rule writing, but the figuring out what to look for and testing for FPs. I do not run Bayes for our company. Obviously I'm partial to URIBL.com and SARE rules ;) I get about 98% of spam caught, and little FPs. This is going to sound like tooting our own horn, but so be it. Before SARE, Bayes was cool. After SARE, I see no need. Autolearning Bayes is not really very good based on what people here seem to say. I do note that I raised by BAYES_99 score to 5. If BAYES_99 hits the odds that the message is spam are so high that it's silly to give BAYES_99 a low score, theoretical nonsense notwithstanding. If you apply the wrong statistical theory with the wrong conceptual criteria the math or theory may be good but the results are trash. For an existing spam database the rule setup that exists is probably quite good. If 99 hits then other rules probably hit as well. This leads to artificially lowering the 99 score. Then when a new technique hits that Bayes can recognize but nothing else does comes along the message floats on through. At least on this system 99 misses once in 2000 to 1 times. Most of those times other very light whitelisting rules let the messages come through. Probably the right score for more general use would be 4.95 or something such that if any other rule hits it's dinged as spam. It depends on your spam tolerance compared to your tolerance for sorting spam by score and looking at the few that are marginal. Anyway, making that ONE change made the already good results I was getting with SARE and BAYES combined quite a bit better. Missed spam went down almost a factor of 10 and tagged ham went up by about 1 in 10,000 or less. (I can't remember the last time I got a ham marked as spam on the sole basis of BAYES_99 with a score of 5 that I had to fetch out of the spam folder.) I take this as a proof of concept that penalizing a rule for being too good is ridiculous on its face, statistical theories notwithstanding. I maintain this is a positive indication that either the criteria, the chosen statistical approach, or both are wrong. It might be entertaining to setup stock BAYES on your system, Chris, with all BAYES scores being very very low, 0.01 or something. Then run the SARE version of sa_stats.pl to see what the goodness of each BAYES level really is. From that you can guesstimate some scores that might improve your system. I'd be really interested to see what the autolearn BAYES really can perform like when it's used in your sort of environment. I know for my environment it's silly to use it due to the automated mis-learning on marginal messages. (Either it learns wrong or not at all on the most critical portions of the email load, the marginal messages.) {^_^} Joanne steps down off her soapbox yet again.
Re[2]: OT Humor: was rules better than bayes?
Hello William, Tuesday, January 10, 2006, 11:37:35 AM, you wrote: But the 4 letters that matter with Bayes are: YMMV WS Isn't that an OTCBB Ticker symbol? I heard they're about to go WS through the _roof_!! Your Milage May Vary, Inc. I hear they're cornering the market on automotive fuel saving devices and technologies. Today's stock prices ranged from 2.159 to 2.259, or 2.359 for preferred stock.
Re: rules better than bayes?
From: Jim Maul [EMAIL PROTECTED] Chris Santerre wrote: -Original Message- From: jo3 [mailto:[EMAIL PROTECTED] Sent: Monday, January 09, 2006 2:28 PM To: users@spamassassin.apache.org Subject: rules better than bayes? Hi, This is an observation, please take it in the spirit in which it is intended, it is not meant to be flame bait. After using spamassassin for six solid months, it seems to me that the bayes process (sa-learn [--spam | --ham]) has only very limited success in learning about new spam. Regardless of how many spams and hams are submitted, the effectiveness never goes above the default level which, in our case here, is somewhere around 2 out of 3 spams correctly identified. By the same token, after adding the third party rule, airmax.cf, the effectiveness went up to 99 out of 100 spams correctly identified. I have long said that IMHO, I do not think bayes is worth it. Left unattended, it isn't as good. A simple rule can take out a lot of spam. Some may say rule writing is more complicated then training bayes. Maybe. Not so much the rule writing, but the figuring out what to look for and testing for FPs. I do not run Bayes for our company. Obviously I'm partial to URIBL.com and SARE rules ;) I get about 98% of spam caught, and little FPs. This is going to sound like tooting our own horn, but so be it. Before SARE, Bayes was cool. After SARE, I see no need. I always feel i have to point out the flip side to this just to offer another opinion. While i certainly dont have a NEED for bayes at our facility, i do run it, complete with autolearn. We have very low volume (5k msgs/day) but it works so well i rarely ever have to think about it. For us, 96% of the time bayes alone is enough to say whether a message is ham/spam. Add all the other tests on top of this (uribl, razor, a few sare, and theres easily a 20 point difference between ham and spam. Jim, can you back that up with a run of the SARE version of sa_stats.pl? I'd love to see your record with that setup for the highest and lowest ranking BAYES scores. {^_^}
Re: rules better than bayes?
From: Matt Kettler [EMAIL PROTECTED] At 10:50 AM 1/10/2006, Chris Santerre wrote: I have long said that IMHO, I do not think bayes is worth it. Left unattended, it isn't as good. A simple rule can take out a lot of spam. Some may say rule writing is more complicated then training bayes. Maybe. Not so much the rule writing, but the figuring out what to look for and testing for FPs. Interesting.. For me, BAYES_99 is right between SURBL and URIBL in terms of hits. (And has 98.91% of URIBL's total hits) I find it completely indispensable. It's number 1 here on scoring spam, 83.22 for 0.05 of ham with can't remember the last ham scoring on 99 that hit the spam folder. 99 has a score of 5 here because it does, all alone, tag spam that no other rule hits. XBL is the best BL here at the moment, 55.50% for 0.04% of hits on ham. I rarely train manually, except at initial setup where I feed it a good base learning. (the autolearner can sometimes go awry if you don't train some mail manually before letting it go.) I manually learn, particularly on spam not marked as spam that has a low BAYES score and some meat in it. (I don't bother with content free spam. Those very quickly score higher due to BL hits that pop up like magic.) {^_^}
Re: SA 3.10 skipping some emails or errors in log??
On Tue, 10 Jan 2006 20:56:48 -0500, you wrote: On 10/01/2006 8:17 PM, George R. Kasica wrote: On Tue, 10 Jan 2006 18:58:37 -0500, you wrote: If you can get a strace -ftp PID of the parent spamd process while this happens (along with a matching debug log) and *attach* it to the bug, I'm sure Justin would take a look at it. I haven't been able to reproduce it myself, so I haven't looked at it further. Daryl: Not a programmer here, but with a little direction I think I can get the info. I'm assuming the following here: strace -ftp PID where PID is the PID of the parent spamd process correct? Yeah PID is the process ID of the parent spamd process. Also, you can redirect the output to a file with normal redirection, or just specify an output file with the -o option, ala: strace -ftp PID -o /path/to/output/file As to debug log, how would I go about that? Is it the info I provided earlier just doing it over again to match with strace output? Yeah. You might want to add -Dprefork as one of the options to your spamd call though. It's running now. I will hopefully have some items to upload soon. George George, MR. Tibbs, Nazarene, Ginger/The Beast Kasica(8/1/88-3/19/01, 1/17/02-) Jackson, WI USA [EMAIL PROTECTED] http://www.netwrx1.com/georgek ICQ #12862186 (`-''-/).___..--''`-._ `6_ 6 ) `-. ( ).`-.__.`) (_Y_.)' ._ ) `._ `. ``-..-' _..`--'_..-_/ /--'_.' ,' (il),-'' (li),' ((!.-'