Re: Spam and the Internet [Was: xxxl spam]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matt Kettler wrote: ...snip... Here's one, if you want to see it: http://mywebpages.comcast.net/mkettler/spam.jpg There's pretty close to zero chance that anyone in the US is going to hop on a plane and fly to Guatemala to buy ordinary lawn care products from a small store. But that's the kind of ads I'm getting. but they've got heart-shaped pancake molds... you wouldn't fly to guatamala for that? and at Q.29?! what a bargain! (heh, i couldn't resist) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEQ0keE2gsBSKjZHQRAjkKAJ9AnC7vS409cSYvoyczXPpK9NNa9QCgtZsb 68xY13eQIvXXLSrkT996/hM= =rejD -END PGP SIGNATURE-
Re: Non-English languages
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kenneth Porter wrote: ...snip... To those of you who've successfully learned 2nd and 3rd languages as an adult, what do you recommend for accomplishing that? Kenneth, I started learning Japanese when I was 30. (I feel so old saying it like that) ... anyways, I started with a teach yourself Japanese book and a computer program to help. after that I took courses after work at my local community college. *THEN* I moved to Japan and really started to learn :p Anyways, I've learned a number of programming languages since I was young. I applied the same techniques to learning Japanese (specifically with reading/writing (or typing as the case may be)) and made sure I had good reference materials handy. also, I got involved with the Japanese communities on iVisit which helped a lot too. alan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEQ06tE2gsBSKjZHQRAutxAJ0SrBAWtgkt5fNVQdYG4VGGAMaXuACg4XrN 1kPOs6ScAZ3Gieb/sG323R8= =Twyl -END PGP SIGNATURE-
RE: Filter check request
Owen Mehegan wrote: I've upgraded to SA 3.1.1 and now both messages hit solidly as spam. I also don't see the ALL_TRUSTED mistake, so I'm guessing that was caused by the trust code mismatch you mentioned. Thanks! The trust path, IMO, is too important to be left to chance. The trust path is used behind the scenes in all sorts of ways. Read the wiki and the Mail::SpamAssassin::Conf man page and specify trusted_networks and internal_networks manually so you know they are right. http://wiki.apache.org/spamassassin/TrustPath http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.ht ml#network_test_options Sorry about the long url. You may have to piece that last one back together and paste it into the browser. -- Bowie
Managing Spamassassin Data
Hi all, After having spamd exit on me a couple of times (still no idea why), I decided to put spamd under daemontools control (run file below). While this has resulted in the stability I was looking for, I am now presented with a number of growing log/spamassassin files - i.e.: /service/spamd/razor-agent.log /root/.spamassassin/auto-whitelist /root/.spamassassin/bayes_journal /root/.spamassassin/bayes_seen /root/.spamassassin/bayes_toks My question breaks into several parts; 1. The startup script from the FreeBSD port for spamd ran the service as root - is there any reason not to switch spamd to the qpsmtpd user/group? 2. Is there a way I can put the razor-agent.log into multilog? If not, how do I rotate this log file? 3. My experiance with Bayes and AWL on amavisd-new is that these files will only grow, what is the proper approach to pruning this data? 4. I am considering using an external Mysql database for my Bayes database - how much should I expect this to trash my server, and what is the proper approach to pruning this data? Thanks in advance, Max #!/bin/sh exec 21 \ sh -c ' exec \ /usr/local/bin/spamd \ -x \ --socketpath=/var/run/spamd/spamd \ -s stderr ' -- Max Clark http://www.clarksys.com
Re: Managing Spamassassin Data
Max Clark wrote: Hi all, After having spamd exit on me a couple of times (still no idea why), I decided to put spamd under daemontools control (run file below). While this has resulted in the stability I was looking for, I am now presented with a number of growing log/spamassassin files - i.e.: /service/spamd/razor-agent.log /root/.spamassassin/auto-whitelist /root/.spamassassin/bayes_journal /root/.spamassassin/bayes_seen /root/.spamassassin/bayes_toks My question breaks into several parts; 1. The startup script from the FreeBSD port for spamd ran the service as root - is there any reason not to switch spamd to the qpsmtpd user/group? I would not start it as qpsmtpd, as spamd needs privs to bind its port. However, if you're not doing multi-user bayes, you can start spamd with -u. Also, be aware.. the above bayes_db in roots home-directory should never be used by spamd. Spamd always setuid's itself to nobody if it finds itself running as root when being called to scan mail. Normally SA gets started as root, then setuid's to match the userid that calls spamc. However, if root is calling spamc, then SA winds up using this safety and setuid'ing to nobody. If you start spamd with -u, it will setuid to the specified user, without regard for what user called spamc. This should show up as a bunch of spamds that start as root, and switch to the -u user when scanning. 2. Is there a way I can put the razor-agent.log into multilog? If not, how do I rotate this log file? Can't say as I know. This is generated by the razor tools themselves, so man razor-agent.conf would be the reference here. However, judging from the docs, this only supports plain dumb file access, so there's probably no safe way to rotate it short of killing off SA. http://razor.sourceforge.net/docs/doc.php?type=podname=razor-agent.conf 3. My experiance with Bayes and AWL on amavisd-new is that these files will only grow, what is the proper approach to pruning this data? Bayes_journal, and Bayes_toks should prune on their own during opportunistic expiry. However, you can run spamassassin --force-expire to force this. bayes_seen does not get pruned by expiry, but it can be safely deleted in SA 3.1.0 and SA will re-create it. (or so the devels claim, I've not tested this) http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2975 auto-whitelist can be pruned using the check-whitelist script from the tools directory of the tarball and pass it the --clean parameter. (note: most distro packages do not install this tool, so just grab it from a tarball download if you don't have it). usage: check_whitelist [--clean] [--min n] [dbfile] min will specify the number minimum number of hits for a given AWL entry for it to be considered worth keeping. min defaults to 2 if not specified (this prunes all one-off entries). 4. I am considering using an external Mysql database for my Bayes database - how much should I expect this to trash my server, and what is the proper approach to pruning this data? Dono, I'm not a SQL-bayes user.
Re: Managing Spamassassin Data
2. Is there a way I can put the razor-agent.log into multilog? If not, how do I rotate this log file? For myself on FreeBSD, I installed by source, not by port, so adjust your configs as necessary, but I use the newsyslog facility (/etc/newsyslog) to rotate the log files with the nightly checks: The maillog is rotated nightly: /var/log/maillog640 120 *@T00 JC So, I added another entry for my spam log: /var/log/spam.log 640 120 *@T00 JC I've added several logfiles to the file to auto-rotate, such as named, and it works like a charm. My relevant config bits: How I start spamd: /usr/local/bin/spamd --daemonize --username spamd --max-children=20 --min-spare=5 --pidfile /home/spamd/spamd.pid -s local5 (notice the local5 part at the end, which defines the local5 syslog identifier) The relevant syslog config: local5.*/var/log/spam.log Hope this helps. -Gary
Re: Managing Spamassassin Data
Max Clark wrote: 2. Is there a way I can put the razor-agent.log into multilog? If not, how do I rotate this log file? Set up a cron job to run 'find / -name razor-agent.log |xargs rm -f'. g I've found razor is a little indiscriminate about where it spews this log file; I've found it in some truly bizarre places. That may have something to do with the razor-agents version I'm running (don't even recall - it's probably a little old and outdated). If at all possible, look into disabling it entirely if you can. IIRC it's just a transcript of what razor does when it's called to scan a message. -kgd
Re: Managing Spamassassin Data
On Mon, Apr 17, 2006 at 05:44:57PM -0400, Kris Deugau wrote: 2. Is there a way I can put the razor-agent.log into multilog? If not, how do I rotate this log file? Set up a cron job to run 'find / -name razor-agent.log |xargs rm -f'. g Alternately, put the following in razor-agent.conf: debuglevel = 0 The log still gets created, but nothing ever goes into it. -- Randomly Generated Tagline: You tell 'em Cemetery, You are so grave. pgpV0Rmm3fpSm.pgp Description: PGP signature
non-fuzzy body parts in subject: missed
I have been receiving a spate of short messages that don't seem to trigger enough default rules to be knocked out. I was investigating and noticed a discrepancy [bug?] in the rules. One particular email refers to the uniquely Male-Body-Part starting w/P, let's call MBP for purposes discussion. It gets hit by a '20' rule for body parts in the message body, but I noticed it doesn't get anything for the subject: Want a Bigger MBP? A '25_replace' rule is present for fuzzy MBP's, but doesn't seem to catch unfuzzy ones. So I guess questions might be: 1) should 'fuzzy' rules match non-fuzzy targets as well as fuzzy ones? 2) Should there be some normalization adjustment for short messages? I'm thinking a scale factor rather than an absolute score to add, -- reflecting the general idea that short messages are not bad, but if you are scoring on the bad side, a multiplier (ex. 1.1 or 1.2) would increase the score of a message that is already being sized up as bad. Does SA support any multiplier type rules? Should it, or rather, do people feel this is a good idea? i.e.: RULENAME *1.1 (0,*1.1,0,*1.1) type format? -l
Re: non-fuzzy body parts in subject: missed
Linda Walsh wrote: I have been receiving a spate of short messages that don't seem to trigger enough default rules to be knocked out. I was investigating and noticed a discrepancy [bug?] in the rules. One particular email refers to the uniquely Male-Body-Part starting w/P, let's call MBP for purposes discussion. It gets hit by a '20' rule for body parts in the message body, but I noticed it doesn't get anything for the subject: Yes it does.. the text of the subject line will match against any body rule. SA pre-pends this so we don't have to have a massive duplication of rules to cover both body and subject. Want a Bigger MBP? A '25_replace' rule is present for fuzzy MBP's, but doesn't seem to catch unfuzzy ones. So I guess questions might be: 1) should 'fuzzy' rules match non-fuzzy targets as well as fuzzy ones? IMHO, no. I think there should be two rules with separate scores. In the above example the scores would be pretty much the same. However consider the word viagra, an obfuscation is a clear sign of spam. Un-obfuscated is a less strong sign of spam in this case, because it could be a joke or a conversation with a medical discussion of some form. 2) Should there be some normalization adjustment for short messages? I'm thinking a scale factor rather than an absolute score to add, -- reflecting the general idea that short messages are not bad, but if you are scoring on the bad side, a multiplier (ex. 1.1 or 1.2) would increase the score of a message that is already being sized up as bad. Does SA support any multiplier type rules? No. Should it, or rather, do people feel this is a good idea? I don't feel that would be a good idea. Bear in mind this would also make a good message (ie: one at -1.0) be more good. It just doesn't make sense to me to have something which merely acts as a score amplifier instead of a score adjustment. Performing any kind of GA to establish a reasonable multiplier value for these would be a logistical nightmare. You also get into an issue of order-of-operations. Does this multiplier apply to the current score as of the momet the rule hits? or after the total message score is calculated do you make a second pass and factor in all the multipliers, taking a slight performance hit for the extra calculation run?
Adding RBLs
We are currently testing SA 3.1.0 - as our installation may end up being quite large. For several years we have run our own dnsrbl lists and would like to incorporate them into SA. Most are IPV4sets, but we do have one RHBL list. Unfortunately, we have not been successful in getting the rules to fire. We have tried adding them to both a variation of 20_dnsbl_tests.cf and to the local.cf. Here is a sample of the type of rule we are loading: ## Local RBL header RLBL_OSH_RBL rbleval:check_rbl('rblos', 'rbl.onshore.com.') describe RLBL_OSH_RBL rbl.onshore.com tflags RLBL_OSH_RBL net header RLBL_OSH_RBL rbleval:check_rbl_results_for('rblos', '127.0.0.4') describe RLBL_OSH_RBL Host in rbl.onshore.com tflags RLBL_OSH_RBL net scoreRLBL_OSH_RBL 3.0 spamassassin -D --lint shows no errors; however the rules don't seem to get called, or fire when we send a test mail from a host listed in the RBL. Other dnsrbls -- ie. spamcop, sorbs -- seem to work fine. Any help would be appreciated. TIA - sjk Steven Kent onShore Networks http://www.onshore.com fingerprint: pub 1024D/D2779F66 2004-04-07
Re: non-fuzzy body parts in subject: missed
Matt Kettler wrote: Yes it does.. the text of the subject line will match against any body rule. SA pre-pends this so we don't have to have a massive duplication of rules to cover both body and subject. --- Ah. Didn't know that. Different tools, different lingo for message, message header, message body. Want a Bigger MBP? A '25_replace' rule is present for fuzzy MBP's, but doesn't seem to catch unfuzzy ones. So I guess questions might be: 1) should 'fuzzy' rules match non-fuzzy targets as well as fuzzy ones? IMHO, no. I think there should be two rules with separate scores. In the above example the scores would be pretty much the same. --- I agree on keeping the rules separate, just didn't know fuzzy Subj was included in body. However consider the word viagra, an obfuscation is a clear sign of spam. Un-obfuscated is a less strong sign of spam in this case, because it could be a joke or a conversation with a medical discussion of some form. --- Agreed. Should it, or rather, do people feel this is a good idea? I don't feel that would be a good idea. Bear in mind this would also make a good message (ie: one at -1.0) be more good. It just doesn't make sense to me to have something which merely acts as a score amplifier instead of a score adjustment. --- I realized it would increase goodness as well, but I guess I didn't see that as much of an issue of the multiplier was applied last. Performing any kind of GA to establish a reasonable multiplier value for these would be a logistical nightmare. --- :-) True, but that doesn't mean SA couldn't support a post multiplier! :-) I can see it's use would be somewhat limited though, as I'm not sure under what other conditions one would want such a scaling, so its loss in one circumstance seems minor. Sometimes I get overfocused on the problem, and blow up its severity, in my mind. Uh, maybe I can blame it on original spam's intent on increasing small problems? ;^? Feedback is good! :-) Tnx, Linda
Re: Adding RBLs
stevek wrote: We are currently testing SA 3.1.0 - as our installation may end up being quite large. For several years we have run our own dnsrbl lists and would like to incorporate them into SA. Most are IPV4sets, but we do have one RHBL list. Unfortunately, we have not been successful in getting the rules to fire. We have tried adding them to both a variation of 20_dnsbl_tests.cf and to the local.cf. Here is a sample of the type of rule we are loading: ## Local RBL header RLBL_OSH_RBL rbleval:check_rbl('rblos', 'rbl.onshore.com.') describe RLBL_OSH_RBL rbl.onshore.com tflags RLBL_OSH_RBL net header RLBL_OSH_RBL rbleval:check_rbl_results_for('rblos', '127.0.0.4') describe RLBL_OSH_RBL Host in rbl.onshore.com tflags RLBL_OSH_RBL net scoreRLBL_OSH_RBL 3.0 Two things: First, rename the first grouping of header, describe and tflags to __RLBL_OSH_RBL. Note the addition of double-underscore at the beginning. This choice of naming is key if you don't want the base rule to fire off with a score of 1.0 if the RBL returns anything at all. You cannot ever have two rules with the same name. If this ever happens, the second declaration over-writes the first. This is very much intentional, as it allows local configurations to patch the default rulesets, if they so desire, by over-writing the rule with a different version. Basically, your setup using the same name causes the second group of three to over-write and destroy the first three, preventing the rule from running because there is no check_rbl call. Second, I'd also suggest changing check_rbl_results_for to check_rbl_sub. check_rbl_results_for is deprecated, and present for backward compatibility only. spamassassin -D --lint shows no errors; however the rules don't seem to get called, or fire when we send a test mail from a host listed in the RBL. Other dnsrbls -- ie. spamcop, sorbs -- seem to work fine. Any help would be appreciated. YW.