Re: uri(bl) checks don't detect URLs with capitalized Http
In an older episode (Thursday 14 April 2005 00:54), Theo Van Dinter wrote: In this case, however, it's not clear if he's running something like a Fedora RPM version of SpamAssassin where he could just go ahead and update at will, or if it's something like Barracuda/etc, where you really can't just go changing things on your own. The flip side of that of course is that you'll have vendor support who you can call and make requests of. ;) at home, i am running debian linux with SpamAssassin version 3.0.2 running on Perl version 5.8.4, or as debian puts it: Installed: 3.0.2-1 Candidate: 3.0.2-1 == nothing newer available in debian. at work, we do use a vendors' pre-installed SpamAssassin version 3.0.2 running on Perl version 5.8.5 and Fedora Core release 3 (Heidelberg), rpms built by the vendor: spamassassin-3.0.2-1 spamassassin-tools-3.0.2-1 i will have to find out exactly which modifications have been applied to which source by the vendor. anyway, further modifications are possible both from the vendor or us, the owners. learning at home how to apply the fix at all will make it easier to be able to judge / request / apply necessary changes at work.
Re: uri(bl) checks don't detect URLs with capitalized Http
[EMAIL PROTECTED] wrote: In an older episode (Thursday 14 April 2005 00:54), Theo Van Dinter wrote: In this case, however, it's not clear if he's running something like a Fedora RPM version of SpamAssassin where he could just go ahead and update at will, or if it's something like Barracuda/etc, where you really can't just go changing things on your own. The flip side of that of course is that you'll have vendor support who you can call and make requests of. ;) at home, i am running debian linux with SpamAssassin version 3.0.2 running on Perl version 5.8.4, or as debian puts it: Installed: 3.0.2-1 Candidate: 3.0.2-1 == nothing newer available in debian. at work, we do use a vendors' pre-installed SpamAssassin version 3.0.2 running on Perl version 5.8.5 and Fedora Core release 3 (Heidelberg), rpms built by the vendor: spamassassin-3.0.2-1 spamassassin-tools-3.0.2-1 i will have to find out exactly which modifications have been applied to which source by the vendor. anyway, further modifications are possible both from the vendor or us, the owners. learning at home how to apply the fix at all will make it easier to be able to judge / request / apply necessary changes at work. Here's the diff: http://svn.apache.org/viewcvs.cgi/spamassassin/trunk/lib/Mail/SpamAssassin/PerMsgStatus.pm?rev=148873r1=125891r2=148873makepatch=1diff_format=u
Re: Recommendation on SARE rules to add.
Hello Robert, Tuesday, April 12, 2005, 10:24:54 PM, you wrote: RM SA 3.0 RM I was wondering if anybody had a recommendation for a initial SARE set RM of rules to add. I am not exactly satisfied with my amount of FN's RM currently. Any ideas would be appreciated. First -- I'm in full agreement with all of the other suggestions/considerations offered that I've seen. And since I haven't seen any specific rule set files, I'll offer my suggestions there: 70_sare_evilnum0.cf 70_sare_genlsubj0.cf 70_sare_header0.cf 70_sare_html0.cf 70_sare_uri0.cf These above are created and selected and regularly rechecked to avoid any/all hits against ham. They should be safe for everyone. 70_sare_specific.cf 70_sare_oem.cf 70_sare_spoof.cf 70_sare_unsub.cf 70_sare_random.cf 72_sare_redirect_post3.0.0.cf 88_FVGT_Tripwire.cf These aren't quite as safe, but still should be suitable for the great majority of systems. 70_sare_adult.cf 70_sare_bayes_poison_nxm.cf 72_sare_bml_post25x.cf chickenpox.cf weeds_2.cf A little bit more risky, and might FP if one of your users runs an adult book store, is a mortgage broker, or likes to *em*pha*size* words, etc. 70_sare_evilnum1.cf 70_sare_genlsubj1.cf 70_sare_header1.cf 70_sare_html1.cf 70_sare_uri1.cf Like the first set, but a little bit more risky. Will hit ham, but should not cause FPs. If you are located in the USA/England/Canada/Australia, and do not receive foreign-language non-spam, then you can also benefit from 70_sare_genlsubj_eng.cf 70_sare_header_eng.cf 70_sare_html_eng.cf 70_sare_uri_eng.cf I guess we really should put SARE guidelines like this onto a page linked to http://wiki.apache.org/spamassassin/CustomRulesets -- I'll get that started, after as I've put my income taxes to bed... Bob Menschel
Re[4]: Arithmetic score for replaced O's and I's?
Hello mewolf1, Tuesday, April 12, 2005, 6:37:15 PM, you wrote: mgn In an older episode (Wednesday 13 April 2005 02:57), Robert Menschel wrote: Send me your t1r3d, h0m3|ess, hun6ry, un\/\/anted [EMAIL PROTECTED], and I'|| f1nd a 600D horme 4 them... (Not the entire spam emails, please -- just the obfuscations.) mgn Robert, I just sent you obfuscations privately off list, is that mgn what you meant? Perfect. I built rules for them last night and mass-checked them this morning. I'll run a few passes to refine them, then have other SARE ninja's mass-check to get broader results, and then we'll fine tune for performance, and hopefully have something published before end of month. Other contributions more than welcome. Bob Menschel
Re: report_safe doesn't seem to work since FC3 upgrade
The problem with your setup is with spamass-milter, not SpamAssassin. The problem with your lack of responses is that you started your thread by replying to a message in the thread titled --username flag. I don't know what your problem has to do with the --username option, but I guess the people reading that thread don't know about your problem. I'd suggest next time you send a new message (thus starting a new thread) rather than hijacking an existing thread. Daryl
Re: SA randomly sucking up huge amounts of memory
Hello Dennis, Wednesday, April 13, 2005, 1:24:27 PM, you wrote: DS A week or two ago, SA started randomly sucking up huge amounts of memory DS in one or more of the spamd children. ... DS I have managed to catch 3 of the messages that it has hung on. DS ... I've got some suspicions, but would need to actually have the emails to verify them. Any chance you can zip or tar.gz them up (so they don't tie up my SA system) and mail them to me? Bob Menschel
RE: Need for a new rule?
-Original Message- From: Stuart Johnston [mailto:[EMAIL PROTECTED] Sent: 13 April 2005 21:42 To: Andreas Davour Cc: users@spamassassin.apache.org Subject: Re: Need for a new rule? Andreas Davour wrote: The following message have many characteristics in common with much spam I've been getting lately. It's about investments, often shares, stock options or oil. One odd thing about those messages is that they all, like the one quoted below, have the letter 'l' substituted for the pipe character i.e. '|'. Here we have a large number of obfuscated word rules, including a number that are related to stocks and shares. We need to be careful as we do receive legitimate 'forrrward loooking statements' (obfuscated in case you don't like the phrase) so tend to have things like (?!millions?)m[1i|][l1|][l1|][l1|][0o]n[5s]? (not checked) The basic rule is that real people don't try to hide what they are saying. There does exist a problem with other companies who use profanity filters. The sender beats their profanity filter by obfuscating the word, and we catch it because they obfuscated! --- This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses. For further information contact [EMAIL PROTECTED]
Re: yet another Sendmail filter for SpamAssassin daemon spamd
On Wednesday 13 April 2005 09:57 am, Eugene Kurmanin wrote: 5. Copy SPAM to the defined mailbox; 6. Reject SPAM at the DATA stage, if SPAM score is greater than defined value; 7. Log all activities to syslog. Well if you are going to reject, why also accept and copy to mailbox. Is there more than one threshold, so that you can reject if it gets a really bad score (like 20 or 30) and reject but still copy to mailbox if the score is less? -- _ John Andersen pgp7jV5cb5j9D.pgp Description: signature
sa-learn doesn't learn
Hi, I am trying to set up Bayes classifying for the first time using sa-learn. It looks like it is working but doesn't actually seem to be... Here is the output [raph]$ sa-learn --showdots --mbox --spam .thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Junk . Learned from 870 message(s) (1025 message(s) examined). [raph]$ sa-learn --showdots --mbox --ham .thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Inbox .. Learned from 2390 message(s) (2578 message(s) examined). Now when I do spamassassin -D --lint I get [...] debug: bayes: 5790 tie-ing to DB file R/O /home/raph/.spamassassin/bayes_toks debug: bayes: 5790 tie-ing to DB file R/O /home/raph/.spamassassin/bayes_seen debug: bayes: found bayes db version 3 debug: using /home/raph/.spamassassin for user state dir debug: bayes: Not available for scanning, only 1 spam(s) in Bayes DB 200 debug: bayes: 5790 untie-ing debug: bayes: 5790 untie-ing db_toks debug: bayes: 5790 untie-ing db_seen debug: Score set 1 chosen. debug: MIME PARSER START debug: main message type: text/plain debug: parsing normal part debug: added part, type: text/plain debug: MIME PARSER END debug: bayes: 5790 tie-ing to DB file R/O /home/raph/.spamassassin/bayes_toks debug: bayes: 5790 tie-ing to DB file R/O /home/raph/.spamassassin/bayes_seen debug: bayes: found bayes db version 3 debug: bayes: Not available for scanning,
Re: sa-learn doesn't learn
Just to reply to my own message. It is seems to make a crucial difference which order to run the spam and ham tests in! I reran the spam test and it now says I have (from sa-learn dump magic) [...] 0.000 0881 0 non-token data: nspam 0.000 0 1524 0 non-token data: nham [...] So the number of spam has increased to roughly what it should be but the number of ham has decreased by 1000! Can anyone explain this? It looks like a bug as surely the order of execution shouldn't matter?! Raphael Raphael Clifford wrote: Hi, I am trying to set up Bayes classifying for the first time using sa-learn. It looks like it is working but doesn't actually seem to be... Here is the output [raph]$ sa-learn --showdots --mbox --spam .thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Junk . Learned from 870 message(s) (1025 message(s) examined). [raph]$ sa-learn --showdots --mbox --ham .thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Inbox .. Learned from 2390 message(s) (2578 message(s) examined). Now when I do spamassassin -D --lint I get [...] debug: bayes: 5790 tie-ing to DB file R/O /home/raph/.spamassassin/bayes_toks debug: bayes: 5790 tie-ing to DB file R/O /home/raph/.spamassassin/bayes_seen debug: bayes: found bayes db version 3 debug: using /home/raph/.spamassassin for user state dir debug: bayes: Not available for
Re: sa-learn doesn't learn
Raphael Clifford wrote: Just to reply to my own message. It is seems to make a crucial difference which order to run the spam and ham tests in! I reran the spam test and it now says I have Typo: spam test above should be sa-learn command for the spam folder (from sa-learn dump magic) [...] 0.000 0881 0 non-token data: nspam 0.000 0 1524 0 non-token data: nham [...] Raphael
RE: sa-learn doesn't learn
From past experience, I would suggest you checked the dependencies on the 3 files that are created by sa-learn. It sounds like it was able to update bayes_toks but not one of the other files. (Can't remember which) First off, run sa-learn --rebuild. I seem to recall this was needed after running sa-learn (may be wrong) What do you see when you type 'sa-learn --dump magic' execute this command after each learning stage to see the effect your learning has had. If you see no chance, run the rebuild command, then do it again and see if there is a change. If it doesn't work, post your results? R -Original Message- From: Raphael Clifford [mailto:[EMAIL PROTECTED] Sent: 14 April 2005 09:52 To: users@spamassassin.apache.org Subject: sa-learn doesn't learn Hi, I am trying to set up Bayes classifying for the first time using sa-learn. It looks like it is working but doesn't actually seem to be... Here is the output [raph]$ sa-learn --showdots --mbox --spam .thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Junk .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . Learned from 870 message(s) (1025 message(s) examined). [raph]$ sa-learn --showdots --mbox --ham .thunderbird/gmnjx6hf.default/Mail/mail.plus.net/Inbox .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Learned from 2390 message(s) (2578 message(s) examined).
RCVD_IN_SORBS_WEB
why is the weighting for RCVD_IN_SORBS_WEB scores 0 0 0 then 0.007... I know there is probably a good reason for this low a score but could someone explain it to me please as I have one very irate user who likes nothing better than to pick holes in spamassassin, which in turn is a headache for me. apparently 1 spam every week is still not good enought protection for him. thanks ronan begin:vcard fn:Ronan McGlue n:McGlue;Ronan email;internet:ronan(dot)mcglue(at)qub(dot)ac(dot)uk x-mozilla-html:FALSE version:2.1 end:vcard
Re: report_safe doesn't seem to work since FC3 upgrade
Chris Harvey wrote: The problem with your setup is with spamass-milter, not SpamAssassin. And people exclusively ask questions about SA on here? Never ever one on the milter? Perhaps I should have been a little more verbose -- I wasn't saying not to ask your question here. It's not a SpamAssassin config issue it's likely a config issue with the spamass-milter. Your maillog paste doesn't show the Subject: header being modified, it only shows the X-Spam headers being added. This may be a symptom of the '-m' option being present in the call to whatever the spamass-milter executable is. As for the lack of encapsulation, in the 60 seconds or so of looking through the spamass-milter documentation to find the -m option info, I didn't see any mention of encapsulation -- I don't think it's possible with this milter. Daryl
RE: report_safe doesn't seem to work since FC3 upgrade
Your maillog paste doesn't show the Subject: header being modified, it only shows the X-Spam headers being added. This may be a symptom of the '-m' option being present in the call to whatever the spamass-milter executable is. Yes exactly. I see the milter doing *some* work, i.e. adding x-header information, but not doing other work such as changing the subject line as it was doing. As for the lack of encapsulation, in the 60 seconds or so of looking through the spamass-milter documentation to find the -m option info, I didn't see any mention of encapsulation -- I don't think it's possible with this milter. I think I may downgrade the milter as far back as I can and see if that fixes it. If it does, then we know that this specific version ignores the SA local.cf commands to change subject and report safe.
Re: Arithmetic score for replaced O's and I's?
Robert Menschel wrote: Hello mewolf1, Tuesday, April 12, 2005, 6:37:15 PM, you wrote: mgn In an older episode (Wednesday 13 April 2005 02:57), Robert Menschel wrote: Send me your t1r3d, h0m3|ess, hun6ry, un\/\/anted [EMAIL PROTECTED], and I'|| f1nd a 600D horme 4 them... (Not the entire spam emails, please -- just the obfuscations.) mgn Robert, I just sent you obfuscations privately off list, is that mgn what you meant? Perfect. I built rules for them last night and mass-checked them this morning. I'll run a few passes to refine them, then have other SARE ninja's mass-check to get broader results, and then we'll fine tune for performance, and hopefully have something published before end of month. Other contributions more than welcome. Bob Menschel Something that tries to catch those weird table obfuscations would be great ;) Something like i posted a while back in the Extra Sare rules for meds thread. I dont know if this is possible or not but... -Jim
RE: report_safe doesn't seem to work since FC3 upgrade
I think I may downgrade the milter as far back as I can and see if that fixes it. If it does, then we know that this specific version ignores the SA local.cf commands to change subject and report safe. Looks like I may have a different answer. Am testing it now. - Check your /etc/rc.d/init.d/spamass-milter file. The RPM distributed by RedHat apparently puts -m in EXTRA_FLAGS by default. Make sure you file a bugreport with them so they can fix it. http://savannah.nongnu.org/support/?func=detailitemitem_id=103990
Re: sa-learn - bayes training...
Kevin, your assumption is correct, user accounts are on the server and spamc is used. I already have the central DB setup using bayes_path in local.cf. I think what you are saying confirms what I suspected, but it's still not 100% clear. Even though I have a central DB, all users must train it individually, is that it ? For example, if UserA populates the shared folders respectively with ham and spam from messages he/she received, if UserB trains the central DB against those msgs, it will have no effect for UserA ? All users must individually train the central DB even though they train using the same msgs from the same shared folders ? Sorry if I seem a little dense, but I think I'm getting it. I hope ! Jean Kevin Peuhkurinen writes: Jean Caron wrote: Folks, I searched the archive, tried different things, yet I need to ask a few questions. I'm running SA 3.0.2 with Qmail/QQ 1.25, and procmail, on linux. Works great. Bayes auto-learns ok, I run sa-learn from a dedicated user every night for ham and spam. My logs show how many msgs were inspected and how many were learned. So far so good. Here's the part I'm unsure of, I have one centralized bayes DB own by this dedicated user. This user runs sa-learn against two shared folders, one for ham and one for spam. All users (only a hand full) may populate the shared folders. Many thousand msgs have gone through sa-learn. I thought this was all too easy... My problem is bayes does not seem to have any effect what so ever on the amount of spam delivered to INBOXes. I keep receiving these low score spam msgs still. I now suspect this centralized DB, updated by this user alone, may not produce the expected results. I've read in the archive that individual users should run cron jobs against their own ham and spam folders. The issue with this is that only one user has an actual shell defined on the system, so the others can't run cron. Then again, that just a suspicion, I may be wrong, and something else may be missing or mis-configured, and that's why I'm posting this... I'm a little confused. I don't understand how bayes works exactly, so I can't come to any helpfull conclusion about my setup. Can anyone see through this and help me understand what is happening ? Thanks in advance, Jean Jean, I'm not entirely sure based on the information you provided how spamd is getting called, but I'm quite sure that your setup is not doing what you expect it to.I'm guessing since you say that you are using procmail that you have user accounts set up on the server itself and that spamc is being called as individual users from .forward files.If this is the case, then each user will have a .spamassassin/ directory in their home which will contain their own personal Bayes database. Your problem is that you have one particular user who runs sa-learn, so only their Bayes DB is being trained (other than through the auto-learning feature, that is, which is updating the individual databases). One easy option you can consider is the use of a global Bayes DB for all your users instead of each of them having their own personal DB. Bayes tends to be less effective with global rather than personal databases, but only if the individual users are able to do their own training. You could do this fairly easily by setting the bayes_path option in your /etc/mail/spamassassin/local.cf file and have it point the .spamassassin/ directory of the user who is doing all the sa-learn training. Hope that helps. Kevin
Bayes Problems
I am having one heck of a time getting Bayes working with SpamAssassin. I am using postfix 2.2.2 and SA 3.00.2. Postfix is being ran as the user postfix. SA is being ran as postdrop. The following is the output from the syslog. spamd[22065]: debug: plugin: Mail::SpamAssassin::Plugin::Hashcash=HASH(0xa8b6820) implements 'parse_config' spamd[22065]: debug: bayes: 22065 tie-ing to DB file R/O /home/postdrop/.spamassassin_toks spamd[22065]: debug: bayes: 22065 tie-ing to DB file R/O /home/postdrop/.spamassassin_seen spamd[22065]: debug: bayes: found bayes db version 3 spamd[22065]: debug: bayes: Not available for scanning, only 35 ham(s) in Bayes DB 200 spamd[22065]: debug: bayes: 22065 untie-ing spamd[22065]: debug: bayes: 22065 untie-ing db_toks spamd[22065]: debug: bayes: 22065 untie-ing db_seen spamd[22065]: debug: Score set 1 chosen. spamd[22065]: debug: MIME PARSER START spamd[22065]: debug: main message type: text/plain spamd[22065]: debug: parsing normal part spamd[22065]: debug: added part, type: text/plain spamd[22065]: debug: MIME PARSER END spamd[22065]: debug: using /tmp/spamd-22065-init/.spamassassin for user state dir spamd[22065]: debug: bayes: no dbs present, cannot tie DB R/O: /tmp/spamd-22065-init/.spamassassin/bayes_toks spamd[22065]: debug: metadata: X-Spam-Relays-Trusted: Unfortunately I have tinkered with this too much so I really can not list what I have or have not tried. Any input would be appreciated. Thank you, Tom
RE: Bayes Problems
[clipped for brevity]... The source of your problem is indicated by spamd[22065]: debug: bayes: Not available for scanning, only 35 ham(s) in Bayes DB 200 To use Bayes with SA, you need a minimum of 200 HAM and SPAM messages learned into the db. Hope this helps. -Joe K.
RE: yet another Sendmail filter for SpamAssassin daemon spamd
John Andersen wrote: On Wednesday 13 April 2005 09:57 am, Eugene Kurmanin wrote: 5. Copy SPAM to the defined mailbox; 6. Reject SPAM at the DATA stage, if SPAM score is greater than defined value; 7. Log all activities to syslog. Well if you are going to reject, why also accept and copy to mailbox. I can think of situations where you would reject (in order to not assume responsibility for the final delivery of the mail) but still want a copy of what you rejected for forensic purposes. Most of them have to do with espionage :) Matthew.van.Eerde (at) hbinc.com 805.964.4554 x902 Hispanic Business Inc./HireDiversity.com Software Engineer perl -emap{y/a-z/l-za-k/;print}shift Jjhi pcdiwtg Ptga wprztg,
Mailbox disabled rejection scripting
Anyone doing any automated methods for catching large numbers of these rejects and then adding the host into a sendmail access db similar to vispan? ruleset=check_rcpt, arg1=[EMAIL PROTECTED], relay=[211.150.242.139], reject=550 5.2.1 [EMAIL PROTECTED]... Mailbox disabled for this recipient
Re: Still Stuck. bayes
Thank you for the detailed reply. I made all of the changes you suggestd. They were very good, and I will have to see how well they work now. I just had one more question. Your last statement You don't want to sa-learn 200 messages just to learn 5 I guess it would be doing that in the Inbox and Spam Directory all the time. I am sure some users, myself included, don't always file messsages as quick as they should from their inbox .. so they would end up relearning all of that mail multiple times ... well .. at least running it through sa-learn multiple times. Is there a problem doing this, and if so, is there a better solution for learning ham ? (by the way, I changed one of the moves, to move data from MissedSpam to Trash instead of the spam box, so that eliminates learning those messages twice) Thank you a bundle for looking at my script. Peter Bowie Bailey wrote: From: Peter Marshall [mailto:[EMAIL PROTECTED] I got this book (slightly outdated) called Spamassassin (by O'Reilly). Anyway, it says if you are going to sa-learn a bunch of directories in Maildir format you should do the following: sa-learn --no-rebuild --spam mail/spam sa-learn --no-rebuild ...blah. sa-learn --no-rebuild --ham ...blah blah salearn --rebuild So I give that a go, and it gives messages to use sync and no-sync. Right, sync and no-sync are the correct options. If I leave out the --no-sync options ... it gives no out put .. (i assume this means nothing got learned.) Here is my script. You need to run sync once. It doesn't need to be run for each mailbox. Do I need to sync ? I am going to be running this for every user on the box (as that user of course) in a cron job. Each user will need to sync after he learns all of his directories. ---The Script #!/bin/sh # Inbox /usr/bin/sa-learn --no-sync --ham --dir ~/Maildir # Spam Box /usr/bin/sa-learn --no-sync --spam --dir ~/Maildir/.Spam # Missed Spam /usr/bin/sa-learn --no-sync --spam --dir ~/Maildir/.Spam.MissedSpam # Not Spam /usr/bin/sa-learn --sync --ham --dir ~/Maildir/.Spam.NotSpam ## Clean up spam Directories. if [ `\ls ~/Maildir/.Spam.MissedSpam/cur |wc -l` -ne 0 ]; then mv ~/Maildir/.Spam.MissedSpam/cur/* ~/Maildir/.Spam else echo Nothing to move in MissedSpam - cur fi if [ `\ls ~/Maildir/.Spam.NotSpam/cur |wc -l` -ne 0 ]; then mv ~/Maildir/.Spam.NotSpam/cur/* ~/Maildir/cur else echo Nothing to move in NotSpam - cur fi --- What I noticed immediately is that the directories you are learning from are not the ones holding the message files. Try learning from the 'cur' directories. For example: /usr/bin/sa-learn --no-sync --ham --dir ~/Maildir/cur /usr/bin/sa-learn --no-sync --spam --dir ~/Maildir/.Spam/cur /usr/bin/sa-learn --no-sync --spam --dir ~/Maildir/.Spam.MissedSpam/cur /usr/bin/sa-learn --sync --ham --dir ~/Maildir/.Spam.NotSpam/cur Also, after you do the learning, you are moving the messages to the wrong place. That first 'mv' line should look like this: mv ~/Maildir/.Spam.MissedSpam/cur/* ~/Maildir/.Spam/cur All of this brings up another question...What is the intended mail flow here? I'm a bit confused by the way you are moving messages around. Normally, after you learn a message, you should move it to a place where it won't be learned next time. Otherwise, the messages will continue to pile up and sa-learn will have to wade through more and more messages each time you run it. You don't want sa-learn to have to process 200 messages just to learn from 5 of them. Bowie -- Peter Marshall, BCS System Administrator, CARIS CARIS 2005 - Mapping a Seamless Society 10th International User Group Conference and Educational Sessions Halifax, NS, Canada E-mail [EMAIL PROTECTED] for more.
Re: RCVD_IN_SORBS_WEB
Ronan McGlue wrote: why is the weighting for RCVD_IN_SORBS_WEB scores 0 0 0 then 0.007... I know there is probably a good reason for this low a score but could someone explain it to me please as I have one very irate user who likes nothing better than to pick holes in spamassassin, which in turn is a headache for me. Looking at statistics.txt it's got a low overall hitrate, and while it's S/O is fairly good, it does in fact hit some nonspam. Without combing the entire mass-check results of the corpus, it would be impossible to determine the cause. However, I suspect that those few nonspams were also being hit by other rules and the perceptron was forced to compromise the score of this rule in order to avoid FPs. Remember, SA's score evolver will accept 100 FN's before it will accept 1 FP. Which really is a good thing. FP's hurt, lots.. FN's are a nuisance, but they don't cause loss of mail. Since it's got that policy, the perceptron will try very hard to avoid the FP. Even if it means letting some spam slip by, it's better than tagging a bunch of legitimate mail.
Bayes question
I apologize if this has been asked before, but I need some clarification. If I have autolearn for ham set to 0, and the default BAYES_00 score assigns mail a negative value, and a spam message comes through with enough good text in it to give it a BAYES_00 and therefore a negative value BUT it is not a message that has been learned before, is there the potential for that mail to be learned as ham based on the negative BAYES score assigned it? If nothing else, I just wrote the king of all run on sentences.
Re: Still Stuck. bayes
Peter Marshall wrote: You don't want to sa-learn 200 messages just to learn 5 I guess it would be doing that in the Inbox and Spam Directory all the time. I am sure some users, myself included, don't always file messsages as quick as they should from their inbox .. so they would end up relearning all of that mail multiple times ... well .. at least running it through sa-learn multiple times. Is there a problem doing this, and if so, is there a better solution for learning ham ? It's slower, since sa-learn has to look through all the old messages to find the new ones, but it shouldn't mess up the training. It's just efficiency. If your system has the resources to handle it, don't worry. -- Kelson Vibber SpeedGate Communications www.speed.net
Re: RCVD_IN_SORBS_WEB
Paolo Cravero as2594 wrote: Same goes for who asks to unblock certain messages. They are told they can decide to have spam pass through (periodical automatic quarantine unlock, actually). In less than a day they usually beg to restore their antispam protection (and who cares for that job-unrelated mailing list!). That reminds me of a customer we had who asked us to disable all spam filtering on his account. A few months later he cancelled because he was receiving too much spam. A definite *headdesk* moment. -- Kelson Vibber SpeedGate Communications www.speed.net
Re: Bayes question
Joe Zitnik wrote: I apologize if this has been asked before, but I need some clarification. If I have autolearn for ham set to 0, and the default BAYES_00 score assigns mail a negative value, and a spam message comes through with enough good text in it to give it a BAYES_00 and therefore a negative value BUT it is not a message that has been learned before, is there the potential for that mail to be learned as ham based on the negative BAYES score assigned it? No. It's 100% impossible, as the bayes autolearner makes it's judgments based on the score the message would have gotten if bayes was disabled. That kind of self-feedback is exactly why this is done. (Note that calculating the score as if bayes was disabled also involves calculating the score using scoreset 0 or 1 instead of 2 or 3.) The autolearner also ignores any userconf flagged rules, such as white and blacklists.
0 Hits on blatant spam
I've been getting alot of leak-through with 3.02 lately and I thought this one was interesting, particularly that there are plenty of rules that look for a certain word that rhymes with truck (YKWIM), but no header rules that look for the word with an ing on the end of it. I only see one body rule in 20_porn.cf that looks for this string in message bodies, but it scores pretty low. I have a hunch that this word might be somewhat common in ham, but rarely in the subject or anywhere else in the headers of ham... here's a link to the message text: http://www.timuel.com/badmessage.txt Also, I can't find a complete list of what rules that I used in 2.64 were obsoleted by the 3.x series. Perhaps this would be good wiki fodder. I will post the rules that I am left with after my migration to 3.02 (below my sig) and anyone who feels up to it can correct me. =] Thanks... -- Tim Wesemann == Rules that were left after upgrade to 3.02 === 10_misc.cf 20_anti_ratware.cf 20_body_tests.cf 20_compensate.cf 20_dnsbl_tests.cf 20_drugs.cf 20_fake_helo_tests.cf 20_head_tests.cf 20_html_tests.cf 20_meta_tests.cf 20_phrases.cf 20_porn.cf 20_ratware.cf 20_uri_tests.cf 23_bayes.cf 25_body_tests_es.cf 25_hashcash.cf 25_spf.cf 25_uribl.cf 30_text_de.cf 30_text_fr.cf 30_text_nl.cf 30_text_pl.cf 50_scores.cf 60_whitelist.cf 70_sare_bayes_poison_nxm.cf 70_sare_genlsubj0.cf 70_sare_genlsubj1.cf 70_sare_header0.cf 70_sare_header1.cf 70_sare_html0.cf 70_sare_html1.cf 70_sare_oem.cf 70_sare_random.cf 70_sare_specific.cf 70_sare_spoof.cf 70_sare_unsub.cf 70_sare_uri0.cf 70_sare_uri1.cf 70_sc_top200.cf 72_sare_redirect_post3.0.0.cf 88_FVGT_Bayes_Poison.cf 88_FVGT_body.cf 88_FVGT_subject.cf 88_FVGT_uri.cf 99_FVGT_Tripwire.cf 99_FVGT_meta.cf 99_sare_adult.cf 99_sare_biz_market_learn_post25x.cf 99_sare_fraud_post25x.cf antidrug.cf backhair.cf bogus-virus-warnings.cf cheat.cf chickenpox.cf evilnumbers.cf languages mangled.cf mime_validate.cf mr_wiggly.cf random.current.cf rnd_uc_char.cf rolex.cf useless.cf weeds.cf wordword.cf x_headers.cf =
Re: 0 Hits on blatant spam
Tim Wesemann wrote: I've been getting alot of leak-through with 3.02 lately and I thought this one was interesting, particularly that there are plenty of rules that look for a certain word that rhymes with truck (YKWIM), but no header rules that look for the word with an ing on the end of it. I only see one body rule in 20_porn.cf that looks for this string in message bodies, but it scores pretty low. I have a hunch that this word might be somewhat common in ham, but rarely in the subject or anywhere else in the headers of ham... here's a link to the message text: http://www.timuel.com/badmessage.txt It looks like you've mangled the headers a bit, making SA unable to do DNSbl tests correctly when I test locally. However, it looks like that message should hit SBL+XBL. 84.130.193.118 is listed. It also should have hit several DUL tests, but that only works correctly if your trusted_networks is working correctly. If your MX Server is NATed, make sure you've got trusted_networks set up right so SA applies the DNSBLs properly. If it's not, you might be OK, but make sure it SA trusts all your servers, and nothing more or less than all your mailservers. DUL's are applied to the most recent (ie: first if you work backwards in time through the Received chain) untrusted host delivering to a trusted host. If SA doesn't trust the right hosts, then these tests will miss their mark. Also, I can't find a complete list of what rules that I used in 2.64 were obsoleted by the 3.x series. I can tell you for certain that antidrug.cf is obsoleted by the standard 20_drugs.cf. Remove the old outdated file. == Rules that were left after upgrade to 3.02 === 10_misc.cf snip 60_whitelist.cf Please distinguish between rules that were left after the upgrade, and rules that were installed by it.. All of the above should be in $PREFIX/share/spamassassin, and would have been installed by SA. All of the below should be in /etc/mail/spamassassin, and would have been untouched by the upgrade. 70_sare_bayes_poison_nxm.cf Snip mime_validate.cf Side note - I'd delete mime_validate.cf. It doesn't work correctly. The author assumes that rawbody tests are run on the truly raw message body, which is not true. The message has already been base64 and QP decoded, so the rules will misfire on properly encoded mail which contains properly encoded unicode, or binary attachments. Every rule in that ruleset requires considerable rework. It's an interesting experiment, but it's unfortunately written without consideration for what SA does to normalize the content message before feeding it to rules.
RE: Rules to identify simplified and traditional chinese character sets
This header was missed by your rule example. Does anyone have any ideas why it was missed? Thanks in advance: Header --=_alternative 00390FDE48256FE2_= Content-Type: text/html; charset=gb2312 Content-Transfer-Encoding: base64 Rule: rawbody CHINESE_WL_1_B /\bgb2312\b/i describe CHINESE_WL_1_B Whiltelist Simplified Chinese mimepart full CHINESE_WL_1_C /^Content-Type:\s+gb2312\b/im describe CHINESE_WL_1_C Whiltelist Simplified Chinese mimepart -Original Message- From: Loren Wilton [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 12, 2005 5:45 PM To: users@spamassassin.apache.org Subject: Re: Rules to identify simplified and traditional chinese character sets This code fragment illustrates how I do this for Internet headers: header CHINESE_WL_1 Content-Type =~ /gb2312/i describe CHINESE_WL_1 White list Simplified Chinese Does anyone no how to create a rule to detect these codes in a mime header? There was talk on the dev list a while back of being able to test the items in MIME headers. I'm not clear on whether anything ever came of that. In any case you can run a 'full' to look for the headers and find them. Perhaps something like (untested): full CHINESE_xxx /^Content-Type:\s+gb2312\b/im Loren
Re: Rules to identify simplified and traditional chinese character sets
Johnson, Robert F wrote: This header was missed by your rule example. Does anyone have any ideas why it was missed? Thanks in advance: Header --=_alternative 00390FDE48256FE2_= Content-Type: text/html; charset=gb2312 Content-Transfer-Encoding: base64 Rule: rawbody CHINESE_WL_1_B /\bgb2312\b/i describe CHINESE_WL_1_B Whiltelist Simplified Chinese mimepart full CHINESE_WL_1_C /^Content-Type:\s+gb2312\b/im describe CHINESE_WL_1_C Whiltelist Simplified Chinese mimepart It was not detected by the rawbody rule because this text would have been stripped first and would have no chance of being matched by it at all. It wasn't detected by the full rule because it doesn't have any ability to deal with the quotes and other stuff. It's looking for the gb2312 to be directly after the Content-Type:, without anything in between but spaces. might I suggest this instead: full CHINESE_WL_1_D /^Content-Type:.{0,30}\bgb2312\b/im describe CHINESE_WL_1_D Whitelist Simplified Chinese mimepart
Re: SpamAssassin and Horde
Checked trusted_networks and i guess is not it, received: headers from emails send from imp 4.x are: Received: from 200-102-255-31.smace701.dsl.brasiltelecom.net.br (200-102-255-31.smace701.dsl.brasiltelecom.net.br [200.102.255.31]) by domain.tld (Horde) with HTTP for [EMAIL PROTECTED]; Thu Thu, 14 Apr 2005 14:19:04 -0300 This is the IP from the computer the user was using to send mail. Some thing is very wrong here. Why IMP 4.x takes user ip and send it as Helo?? This does no happens with imp 3.x. I guess i have two options one hack imp code to send localhost in helo or make spamassasin igonore imp headers. Any ideas ??? Full headers: Return-Path: [EMAIL PROTECTED] Delivered-To: [EMAIL PROTECTED] Received: from domain.tld (localhost.localdomain [127.0.0.1]) by odi.com.br (Postfix) with ESMTP id 1C14D19072 for [EMAIL PROTECTED]; Thu, 14 Apr 2005 14:19:05 -0300 (BRT) Received: by odi.com.br (Postfix, from userid 48) id E617919071; Thu, 14 Apr 2005 14:19:04 -0300 (BRT) Received: from 200-102-255-31.smace701.dsl.brasiltelecom.net.br (200-102-255-31.smace701.dsl.brasiltelecom.net.br [200.102.255.31]) by webmail.domain.tld (Horde) with HTTP for [EMAIL PROTECTED]; Thu, 14 Apr 2005 14:19:04 -0300 Message-ID: [EMAIL PROTECTED] Date: Thu, 14 Apr 2005 14:19:04 -0300 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Teste Testando testado MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.0) X-AV-Checked: ClamAV using ClamSMTP X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on odi.com.br X-Spam-Level: * X-Spam-Status: Yes, score=5.7 required=5.0 tests=AWL,BAYES_00, HELO_DYNAMIC_HCC,HELO_DYNAMIC_IPADDR2,NO_REAL_NAME autolearn=no version=3.0.1 X-Spam-Report: * 0.0 NO_REAL_NAME From: does not include a real name * 3.5 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP addr 2) * 3.7 HELO_DYNAMIC_HCC Relay HELO'd using suspicious hostname (HCC) * -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% * [score: 0.] * 1.0 AWL AWL: From: address is in the auto white-list - Original Message - From: Matt Kettler [EMAIL PROTECTED] To: Angelo Ayres Camargo [EMAIL PROTECTED] Cc: users@spamassassin.apache.org Sent: Tuesday, April 12, 2005 2:18 PM Subject: Re: SpamAssassin and Horde Angelo Ayres Camargo wrote: Hello, Mail sent from horde imp are been taged as spam, this was discussed here before, searching the archives i found no solution. Anyone have any ideia of how make mail from Horde/Imp not be taged as spam? Angelo Angelo, First, I assume you mean the thread with subject: Confused about HELO_DYNAMIC_* At the end of that thread we concluded it had nothing to do with IMP whatsoever. Instead, it was a NATed mailserver triggering the broken trust path problem. If your inbound MX mailserver is NATed such that it IP is in reserved range (ie: 10.*, 192.168.*, 172.16.*, etc) you MUST declare trusted_networks manually. If you don't, ALL mail originating at dialup accounts that appear in the Received: headers will be heavily penalized. That includes mail sent via IMP by dialup users, but is not IMP specific. Mail sent by a dialup user through even their own ISP's sendmail server will be subject to the same problems. See the wiki for details: http://wiki.apache.org/spamassassin/TrustPath