Re: configure question
--As of January 17, 2015 4:20:36 PM -0700, Michael Williamson is alleged to have said: to both /etc/mail/spamassassin/local.cf and /home/username/.spamassassin/user_prefs, I check the file permissions to be readable by all. I restart it # service spamassassin restart --As for the rest, it is mine. That's calling some script from /etc/rc.d/init.d, if I remember Centos correctly. Would you be able to look at/post that script? I suspect that it's probably setting the location of the config files via options, so if we can figure out what it's doing than we can figure out what needs to be changed. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: regex: chars to escape bsides @
--As of January 5, 2015 4:38:03 PM -0800, John Hardin is alleged to have said: On Mon, 5 Jan 2015, Bowie Bailey wrote: On 1/5/2015 4:13 PM, John Hardin wrote: On Mon, 5 Jan 2015, Bowie Bailey wrote: You can avoid having to escape the slash (/) by using a different separator for the regex. This can avoid leaning toothpick syndrome. For example: m#http://match/this/url/# Ouch. # won't work for that (in SA at least) as it comments out the rest of the RE. Ack! Forgot about that minor difference with SA. # is my general go-to character for that in normal Perl scripts. This should illustrate the same point with the minor improvement of actually *working* in SA: m^http://match/this/url/^ I tend to avoid using symbols that are syntactically significant in REs for that purpose. In your example, you can't then anchor the RE at the beginning of the URL because ^ has been repurposed as the RE delimiter. --As for the rest, it is mine. Since we've already established this is Perl... I like to use braces. Perl handles them (and brackets or parens) specially: Open with the opening brace and you close with the closing brace. I think Perl will parse for balance as well, but I haven't checked at the moment. m{http://match/this/url} In general though I do tend to stick with slashes unless it's going to be a problem; it's just more common and easier for people to recognize. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: BAYES_999=0.2 how to set this score higher?
--As of November 4, 2014 10:39:56 AM -0800, motty cruz is alleged to have said: Hello, I would like to set BAYES_999=0.2 score higher than 0.2; I searching for file but I can't find it in /usr/local/etc/mail/spamassassin (am using FreeBSD) --As for the rest, it is mine. Another poster already answered your question, but you should also keep in mind that BAYES_999 is an *additive* score - Anything that hits it hits BAYES_99 as well, so really the score for BAYES_999 should just be the *additional* amount of likelihood that such a mail is spam. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: procmail (was Re: Spam messages bypassing SA)
--As of October 27, 2014 8:29:52 PM +0100, Robert Schetterer is alleged to have said: by the way http://www.exploit-db.com/exploits/34896/ always have a shellshock patched system these days with postfix/procmail --As for the rest, it is mine. Interesting. I dug a bit further out of curiosity. Postfix is irrelevant in this - Procmail is what needs to be looked at. More specifically, the rules that are being used; running procmail in and of itself doesn't allow this to be exploited, it's only if you have a procmail rule that sticks info into the environment (not uncommon) that it happens. The default shell is the recipient's login shell - though that can be overridden in procmailrc. I wouldn't rule out other LDA's from having similar problems without proof - but it's something to be aware of. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: .link TLD spammer haven?
--As of October 25, 2014 12:45:31 AM +0200, Reindl Harald is alleged to have said: Am 25.10.2014 um 00:42 schrieb RW: On Fri, 24 Oct 2014 21:31:51 +0200 Reindl Harald wrote: Am 24.10.2014 um 21:20 schrieb Quanah Gibson-Mount: --On Thursday, October 23, 2014 11:56 PM +0100 Martin Gregorie Thanks for that. I've now installed it and have been running tests against my spam corpus to make sure that this subrule: uri __MG_LTD1 /\.link/i was now working correctly. Its hit all the stuff I thought it should, but my subrule turned out to be deficient because it will also hit any URI containing .linkedin, so anybody who has copied it should rewrite that rule so it looks like this: uri __MG_LTD1 /(\.link$|\.link\/)/i Even with that change, it always hits mail from linkedin logical, the seond part of the or is not terminated and defeats the first one and so the whole purpose of the or In the second part the \.link has to followed by a '/' thanks, i stand corrected but then it should not catch linkedin If it does it's behaving oddly. Still, I might try this instead: uri __MG_LTD1 /\.link\b/i That should be faster and more general than the second one above, and shouldn't grab linkedin either. (Unless of course they've decided to set up a .link address...) Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Spam messages autolearned as ham
--As of September 25, 2014 11:13:16 AM -0400, Deeztek Support is alleged to have said: You *did* keep your initial Bayes training corpora, right? I have an account that I have used to sign up for everything under the sun over the past 10 years. It's a goldmine for spam. I figured I use that to train the Bayes. --As for the rest, it is mine. If it's not the same types of spam as your main mail accounts, it's pretty much useless for bayes training. Check. ;) Also: Make sure you train enough ham. Bayes needs to learn what's *different* about spam and ham. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)
--As of September 9, 2014 3:45:33 AM +0200, Karsten Bräckelmann is alleged to have said: This incidence is part of the initial round of IANA accepting generic TLDs. There's hundreds in this wave, and some are abused early. This is moonshine registration, nothing like new TLDs being accepted in the coming years. Or is it? Will new generic TLDs in the future be abused like that, too? How frequently will that happen? Is it worth being able to react to it quickly? How long will URIBLs take to list them? How long will it take for the average MUA to even linki-fy them? Opinions? Discussion in here, or should I move this to dev? --As for the rest, it is mine. New TLDs will always be abused... Anyway, personal opinion: Spamassassin is currently structured to have code and rules as separate things. Putting this in the code blurs that - it's a rule. Unless there is a major performance penalty, I would move it to be with the rest of the rules. It should make maintenance easier and clearer. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: drop of score after update tonight
--As of August 25, 2014 7:06:32 PM +0200, Reindl Harald is alleged to have said: masscheck ties to ensure spams score at least 5 points, but doesn't care beyond that yes, but given that the intention is to flag message above 5 with [SPAM] and reject messages above 7 which is the intention running SA as milter the reduced score matters --As for the rest, it is mine. Who sets that policy? Is it something you could think about changing (if it's a problem). Did the percentage of spam flagged vs. rejected change overall? Every time the rules update some rules will be scored higher and some lower, so figuring out each individual case is going to be pointless, but if the overall percentages remain stable your system hasn't actually changed how it operates. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: drop of score after update tonight
--As of August 25, 2014 7:49:39 PM +0200, Reindl Harald is alleged to have said: Am 25.08.2014 um 19:35 schrieb Daniel Staal: --As of August 25, 2014 7:06:32 PM +0200, Reindl Harald is alleged to have said: masscheck ties to ensure spams score at least 5 points, but doesn't care beyond that yes, but given that the intention is to flag message above 5 with [SPAM] and reject messages above 7 which is the intention running SA as milter the reduced score matters Who sets that policy? Is it something you could think about changing (if it's a problem). finally i do that - which values needs to be found out and honestly seeing that change i am unsure how to set score limits for both (flag and reject) to prevent too mach messages passing through and at the same time if such a large change happens introduce false positives from one day to another Based on a quick check of my email, if you consider 'flagged' as non-spam (but possible), then I'd probably set flag at 3 or 4, and reject (as spam) at 5. Personally I use a 'probably spam' and 'definitely spam' system (both are set aside), with cutoffs at 5 and 10, respectively. But part of the point is that 7.5 to 5.3 is *not* a large change, as far a spamassassin is concerned. 5.1 to 4.9 would be a large change. ;) I have rarely ever had a false positive with spamassassin - I get maybe two-three a year. I get that in false negatives a day, when things are working well. (Which amounts to about 1% of the spam I get as false negative.) i admit not have that much expierience but want to avoid major mistakes in the setup as good as possible before going live My advice: Don't over-think it. Spamassassin normally does a good job, with base settings and things turned on. Train your bayes well, and watch for new things, but in general don't try messing with a lot of settings unless you have problems with a live mail stream. Did the percentage of spam flagged vs. rejected change overall? i am at early testing of SA and there is no active mail flow since i am about finsish admin backends and how to generate config files for SA/ClamAV/Postfix which is now at a nearly well, for my private doamin as public test good enough Every time the rules update some rules will be scored higher and some lower, so figuring out each individual case is going to be pointless, but if the overall percentages remain stable your system hasn't actually changed how it operates as said - i am about implement SA, saw the message from the update cronjob the first time for some days and looked a bit deeper if things changed And I think you ended up over-thinking it. It was marked as spam before, it's marked as spam now. Some other emails would probably have scored higher than they used to. We've actually had a long break in updates - usually they are multiple times a week, if not every day, but it's been around a month since they last updated. Rules probably changed scores more than normal - but it still scored the mail as spam. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: [Spam] Re: Bayes training via inotify (incron)
--As of August 25, 2014 4:00:15 AM +, Eric Wong is alleged to have said: Daniel Staal dst...@usa.net wrote: Good points, but inotify might still be overkill. `ls maildir/cur/ | grep ',.*S` will give you all messages that have been seen in the mailbox, so you can run on a periodic schedule fairly easily. I'm not sure whether you need the immediate notification inotify gives. I used to use `find' in a similar way you use `ls', but that redundantly trains old ham messages. That's slow for large ham folders, but fine for spam, though (combined with `rm'). But maybe training ham is overkill? I'm not sure about that. I've never actually found it worth the effort to set up, personally. I archive the old spam into another folder, but basically the same idea. You could use `ls -t` + `head` to only get new files... (I was mostly pointing out that the info is in the filename.) Or you could resort to a script I wrote ages ago that simplifies some of that. ;) https://github.com/DanStaal/Arcfind (I really should finish cleaning it up for CPAN at some point...) inotify won't work for me - I'm on a BSD where inotify doesn't exist - but it's an interesting approach. Yeah, I wonder if there's something like incron for kqueue, since I know kqueue supports FS notifications. You probably could do it using famd, I think... (Though it's a bit less widespread.) Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Bayes training via inotify (incron)
--As of August 23, 2014 3:22:13 AM +0200, Karsten Bräckelmann is alleged to have said: On Fri, 2014-08-22 at 17:32 -0700, Ian Zimmerman wrote: Isn't inotify a bit of overkill for this? If you have a dedicated maildir for training, you know that anything in maildir/new is, uh, new. So you process it and move it to maildir/cur. What am I missing? The new/ directory is for delivery, messages moved will end up in cur/. Training on messages in new/ means training solely on classification. These messages have not been seen by a human, and he's most likely not even aware there's new mail at all. Messages moved (copied) into dedicated (ham|spam) learning folders will be placed in cur/. Thus, training on content in dedicated learning folders' new/ dirs won't work, because human reviewed mail does not go there. And training on new/ dirs in general is like overriding all of the precaution measures of SA auto-learning, and blindly train anything and everything above or below the required_score threshold. Besides, moving messages from new/ to cur/ is the IMAP server's duty. No third-party script should ever mess with that. --As for the rest, it is mine. Good points, but inotify might still be overkill. `ls maildir/cur/ | grep ',.*S` will give you all messages that have been seen in the mailbox, so you can run on a periodic schedule fairly easily. I'm not sure whether you need the immediate notification inotify gives. That said: It's still an interesting and possibly useful approach. My current system is that I have a 'misfiled spam' folder, and I train on everything in it every night. (And auto clean it out every night as well.) I let autolearn take care of normal ham. (The occasional misfiled ham I've always handled manually, as they are so few it's never been worth automating.) inotify won't work for me - I'm on a BSD where inotify doesn't exist - but it's an interesting approach. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Second step with SA
--As of August 15, 2014 1:23:37 PM +0200, Antony Stone is alleged to have said: On Friday 15 August 2014 at 13:05:26 (EU time), Timothy Murphy wrote: 1) What is the simplest way to reject mail in chinese, russian and turkish? http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf .html#language_options snip I guess 1% of email from Brazil might be legit, but losing it is a small sacrifice. I guess I could look at the sites - there may be only a couple. What is the easiest way to define email from a given site as spam? http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf .html#whitelist_and_blacklist_options Both of these links are out of date. The whitelist/blacklist it probably doesn't matter to much, but the language option in the first has been discontinued entirely. The correct links for the current version of Spamassassin are: http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#language_options http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#whitelist_and_blacklist_options Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Opinions needed on what to consider spam
--As of August 13, 2014 11:25:26 AM -0400, David F. Skoll is alleged to have said: I believe that unsubscribing is safe. If the list owner is legitimate, unsubscribing will work. If the list owner is a spammer, he/she already has your email address and I don't believe spammers track the validity of addresses anyway. (Safe doesn't mean effective, of course!) The only case in which unsubscribing is dangerous is if you unsubscribe from a previously-unknown address. That'll get you added to spammers' lists. --As for the rest, it is mine. There is a third case I've seen on occasion, that hasn't been discussed: Unsubscribe via web. Many legitimate sites use it - to unsubscribe you click a link and go a web site, which gives some option to unsubscribe. (Often from multiple lists, or something similar.) But these are *not* safe if the mail isn't 'legitimate': I have also seen the link go to a site filled with malware; the unsubscribe link then is the real attack. I'm still split on unsubscribe-via-email, but I don't consider it actively hazardous. Unsubscribe-via-web can be. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Spam Assassin - does it work or not?
--As of August 11, 2014 10:00:34 AM -0400, David F. Skoll is alleged to have said: On Mon, 11 Aug 2014 06:45:24 -0700 Andy a...@opticaltoys.com wrote: If I'm sounding like a leech, that's because in this case I would very much like to be. :o) I have fired paying customers for behaving like you. It's even worse to abuse a community of free software users and authors. Paid spam filtering is cheap. If the spam filtering you receive from your hosting provider is inadequate, either switch providers or pay for spam filtering from someone else. --As for the rest, it is mine. He's being polite, and trying to understand if he's getting good service from his hosting provider, while dealing with a product that's above his technical level. I really don't see what the problem is. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: I need professional help
--As of July 13, 2014 7:56:38 PM +0200, Antony Stone is alleged to have said: On Sunday 13 July 2014 at 19:52:57, Pat Traynor wrote: On Sun, 13 Jul 2014, Antony Stone wrote: Have you been able to identify whether the unsolicited mail which has been thus detected is: - genuine email (possibly of a marketing variety, but still deliberately sent) from your hosting customers It's absolutely not from MY customers. I don't let anyone relay their outgoing email through me. On Sunday 13 July 2014 at 16:35:14, Pat Traynor wrote: I run a web server, and for many of my hosting customers, I'll forward their email to other mail servers. Now I'm confused. --As for the rest, it is mine. That's incoming mail - mail to known and enumerated email addresses. He'll forward mail *to* his customers, but not *from* his customers. To the original poster: If you want to hire someone, might I suggest a site like oDesk? https://www.odesk.com/ Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: More text/plain questions
--As of July 7, 2014 5:20:01 PM -0400, Kevin A. McGrail is alleged to have said: On 7/7/2014 5:09 PM, Philip Prindeville wrote: On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail kmcgr...@pccc.com wrote: On 7/7/2014 2:28 AM, John Wilcock wrote: Le 05/07/2014 19:08, Philip Prindeville a écrit : As for encoding a cyrillic small a: there are many ways to do this. iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this would be very efficient—there are just too many charsets possible. Normalising the input message to UTF-8 before body checks would help somewhat with that. I seem to remember there's been talk of doing this. Yes, or utf-16... I think that will be necessary to keep SA effective in the modern world sooner than later. Okay, but… if the message body is non-ASCII and the CTE is 8bit or base64 and no explicit charset has been given, how do you know which translation to perform? I get a lot of Han SPAM in GB2312 where the charset is never specified (apparently it’s a national default in China, despite the requirements stated in RFC-2045 and -2046). Sorry, I haven't even started delving into the devilish details but I know it's looming as a needed feature. --As for the rest, it is mine. Just to start the discussion: I'd say default to UTF-8 if not otherwise specified and can't be worked out. (How hard to work on 'working it out' is a question, of course.) It's the growing standard, as far as I can tell. Even if it's wrong in a particular case, it would probably be useful: It would give rule writers something to work with. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: getting tons of SPAM
--As of July 1, 2014 7:39:43 PM -0500, Steve Bergman is alleged to have said: On 07/01/2014 05:07 PM, motty cruz wrote: If it needs to be *instant*, have them visit a web page to enter service requests. Because there's not way that web-based email forms can be abused. Please. The whole delay thing is about the ridiculous greylisting kluge. There are plenty of other spam avoidance kluges which don't involve significant delay. I really can't believe what I'm hearing here. It has little to nothing to do with reality. Spam is a problem. But you don't have to make your users wait hours for important emails by making your mail servers play hard to get games with each other. This is just silly. If I forwarded this conversation to my email users, they'd be ROTFL over what the experts are saying about the tool they use daily. It has problems. But long delays would be unacceptable. And http can't really replace all it's functionality. Web email forms are the slow, limiting, and annoying. --As for the rest, it is mine. 95+% of the time, email is immediate, true. But it is not uncommon for mail to be delayed for hours or days either, even without greylisting. It happens in the wild all the time, even (especially...) with the big providers. Email is also not 100% reliable: It is a best-effort service and can and does drop messages on occasion. (With varying degrees of notification: By the spec, notification should always happen, but experience says that causes backscatter, so it's not always by the spec.) If you need an immediate, reliable communication method email will appear to work - but will randomly fail, and there will be *nothing you can do about it.* If that's what your users are expecting you are doing a *disservice* to your users, because it *won't work.* There are solutions that will, which have higher overhead costs than email. A password-protected web form is better - it won't fail silently. Or there are specialist messaging protocols. But if your users are expecting email to be that solution you are going to give yourself headaches. Now, if 'most of the time' immediate communication is enough, that's fine. It may not be worth it for you to implement a higher reliability protocol - they cost time and money. (I used to work for a company who's sole product was a 100% reliable communication protocol.) But don't complain when it fails, because it will, and both you and the users need to expect that. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: getting tons of SPAM
--As of July 1, 2014 9:40:05 PM -0500, Steve Bergman is alleged to have said: 95+% of the time, email is immediate, true. More like 99%+ of the time. When it's not, I hear about it. But it is not uncommon for mail to be delayed for hours or days either, It's uncommon enough that when it does happen I get a phone call about a user not being able to receive email. It's common enough that I saw it every day in my last job. 99.9% of the time the users didn't notice, or care. On the other hand there were the times I had to show them the log files showing exactly when we got and sent the message, and had to have a talk about expectations. (Nearly always the message had gone through our system in seconds.) even without greylisting. Greylisting is an ugly hack that I'm hesitant to even dignify by having the topic of serious conversation. I won't defend it. I've never used it. ;) I'm not at all sure what you're talking about regarding email vs web form reliability. What are the links in that chain? The email client can malfunction in some way. But then again, so can a browser. The sending server can malfunction in some way. But so can the web proxy. Then WAN link can go down on the sending side. But then, that can happen with both web and email. The receiving side's WAN can go down too. But in the case of a mail server it tries and tries and tries to get the message through as quickly as possible. The browser and proxy server certainly don't. They just drop it if anything goes wrong. I only said that it won't fail silently: If you are depending on it for immediate communications, you'll know when you didn't get that, while with email it'll be hidden. Maybe 'better' wasn't the right word: It's a trade off. If you want the message to go through, email is set up to keep trying. If you want the message to go *now*, the web form will tell you if it did (making the assumption that the form returns a 'message delivered' screen once it has delivered the message), and the user can try for another form of communication if it fails. You tell me that email is unreliable. And yet anyone can see that it *is* quite reliable, until you, as a mail admin, foolishly introduce the self-DOSing technique of greylisting, and fall on your own sword. You can go on about how it makes sense to fall on your sword. But I'm a realist, and not buying it. As I said: I've never used greylisting. I have seen mail queues regularly holding messages for hours or days. Email is fairly reliable - but I wouldn't let a user treat it as 100% reliable and immediate, because I know it isn't. Better a few hard conversations about expectations and options then lost business due to using the wrong tool for the job. I'll also be typing this post up, putting a stamp on it, and mailing it. It might reach you there faster. ;-) Not faster, but probably more reliable. ;) How many people here actually use greylisting and don't get complaints? Our ISP, who previously handled our email certainly didn't introduce any noticeable delays. And nobody ever got a noticeable amount of spam, or reported to me a missed or late email. Then they didn't notice them. In the normal course of things, most mail gets through in seconds, and most of the delays are in the range of minutes to hours - short enough that people don't see them unless they are paying close attention. (And they may not be checking mail that often anyway.) Amazing, IMO. But it was obviously done without the ridiculous and unacceptable practice of greylististing. I want to achieve the results that Windstream does. You probably can. ;) But I'm sure Windstream didn't get you every piece of mail immediately after it was sent - just as soon as they could after they got it. I'm not even saying I like greylisting - I'm just saying you should work to set user expectations to reality, which is that email sometimes takes time to get delivered and (rarely) gets lost. If something is absolutely time-critical, they should treat email as a backup, not the primary form of communication. If it can spare an hour or two on occasion, email's fine. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: SA without procmail?
--As of June 20, 2014 2:05:04 PM +0100, Timothy Murphy is alleged to have said: On Thursday, June 19, 2014 11:52:59 PM Ian Zimmerman wrote: Axb Dovecot's Sieve is your friend. (replaces procmail) Not really, not in this context. OP is using procmail merely as a LDA. And in that capacity, is is replaced by the LDA that comes with dovecot. On my debian system, it is /usr/lib/dovecot/dovecot-lda. Thanks for the response. (I am the OP.) Did you mean that procmail _can_ be replaced by dovecot-lda, or that that is done _automatically_? Can be, as you are seeing. On my CentOS-6.5 system, I have /usr/libexec/dovecot/dovecot-lda but I don't see any evidence that it is replacing procmail . I get procmail by appending mailbox_command = /usr/bin/procmail -f- -a $USER to /etc/postfix/main.cf . Is there something similar I could append instead to use dovecot-lda? Incidentally, nobody really answered my original query - I don't see why SA couldn't divert spam to a spam-folder, instead of adding a header? That would seem much simpler to me. Mostly because it's designed as a filter: It's not operating on a file, it's operating on a message. That message may be from a file, from the mail system, or from some mail store. It might be going to any of the above. 'Divert spam to a spam-folder' can mean a *lot* of different things, under different circumstances. It can mean writing a file to a folder, it can mean appending to a file, it can mean inserting into a database, etc. And what if (like me) you want some spam in one folder and some in another? Or something else? At the end of the day, delivering mail to the user is the job of the LDA, and it's best to let it do it's job. SA does the 'simple' thing and provides the LDA with the information it needs. Or whatever part of the mail system it's talking to - SA doesn't have to be used on an end system either, it can be part of a filter in the middle of processing/forwarding mail. Writing to a spam-folder makes one use-case simpler, but only the one, and it makes many others harder. (And you'd still have to work out what is meant by 'folder'.) Making it an option just makes SA more complicated, especially if you try to cover all possible cases. As a filter SA is simple to use, implement, and deploy. It's usable in a wide variety of situations, including ones the devs never thought of or hear about. Writing to a folder would be limiting and complex. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/... //r
Please try to keep responses on-list; other people may have better answers than I do. ;) --As of June 16, 2014 10:58:29 AM +1000, Tom Robinson is alleged to have said: On 14/06/14 05:22, Christoph (Stucki) von Stuckrad wrote: Hi! - and Sorry, all my tries to post did bounce. Seemingly our updated mailsystem changed something. So directly to you and may be somebody can post it, if it's useful. On Fri, 13 Jun 2014, Tom Robinson wrote: ...[errormessage]... and Daniel Staal ...[tied split into four lines, which will work]... Try changing 227 {printf %s, ((localtime $twas) =~ s/... //r =~ s/:.. / /r);} to: {printf(%s, ((localtime $twas) =~ s/... //r =~ s/:.. / /r));} There exist cases of ambiguity calling a function with a list of parameters which themselves are lists and need (...). Then you fix the interpretation of the parameter list by the extra pair of (...) around ALL the parameters of printf. Hope this helps, as I have not tested it, but experienced the same problem many times in debug prints :-) Hi Christoph, Thanks for looking at this. I tried your suggestion but it didn't help. :-\ I also tried Daniel's suggestion: { my $temp = localtime $twas; $temp =~ s/:.. / /; $temp =~ s/... //; printf %s, $temp; } Which does allow the script to run. Daniel, you said that your fix *should* be equivalent. How will I know? Does any one else use this script? Where I can log a bug report? Well, to be absolutely certain, you'd need to run it through B::Deparse read the output, but I don't think you'll need to go quite that far... Mostly that was me saying 'I'm coding in the email client - no suitability for anything is guaranteed'. I don't see any reason why it should be different in any cases - but I haven't researched all possible cases. The only real difference is that I am using a temporary variable - and even there I suspect Deparse would show that Perl is using one anyway. If you do file a bug report someplace, mention that they should take a look at the POSIX module and strftime - I have the *strong* suspicion that the whole convoluted mess could be replaced with one function call. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Rule Update!
I just wanted to say that my sa-update cronjob finally succeeded in updating the rules tonight. Congrats and thanks to everyone who's been working on getting the update server back up and running; it appears you've succeeded. ;) Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/... //r
--As of June 13, 2014 8:21:50 AM -0400, Joe Quinn is alleged to have said: On 6/12/2014 10:27 PM, Tom Robinson wrote: Hi, Sorry to bother you with this. As referenced on the ApacheSpamAssassin Wiki for AutoWhiteList (https://wiki.apache.org/spamassassin/AutoWhitelist) I downloaded the Truxoft version of the sa-heatu utility (http://truxoft.com/resources/sa-heatu.v4.02.tar.gz ) but when I run it I get these errors: Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/... //r Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/:.. / /r syntax error at /usr/local/bin/sa-heatu line 227, near s/... //r Execution of /usr/local/bin/sa-heatu aborted due to compilation errors. I'm running a CentOS 5.10, 32bit system. My version of perl is: # perl -version This is perl, v5.8.8 built for i386-linux-thread-multi ---8---snip*--- I fetched a version of sa-heatu from git hub as well but it is the same file (diff shows no differences and I get the same errors when running). Here is a snippet of the code in context: 224 if ($count ($opt_verbose || ($opt_verboseHits $count$opt_verboseHits) || ($opt_showUpdates $prtu))) { 225 printf $fmt, $totscore/$count, $totscore,$count, $email, $ip, $reason; 226 if (!$opt_NoTimes (($twas||0)!=0)) 227 {printf %s, ((localtime $twas) =~ s/... //r =~ s/:.. / /r);} # don't include d-o-w, and drop seconds as that implies precision 228 } Not being a perl expert I'm not sure exactly what is wrong here. Can anyone please help determine the issue? Kind regards, Tom /r is not a valid regex modifier, and gets parsed as a bareword - see http://perldoc.perl.org/perlre.html#Modifiers --As for the rest, it is mine. That's not it: /r is a valid *substitution* modifier: http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators I'm not sure what the problem is though. It *looks* ok to me. It might be worth breaking line 227 into four lines just to see if that can show the problem better. It should be equivalent to: { my $temp = localtime $twas; $temp =~ s/:.. / /; $temp =~ s/... //; printf %s, $temp; } (Note the /r is needed in the original because `localtime $twas` isn't something you can assign to.) I'm not entirely certain on which order the strung-together substitutions are evaluated, or if it matters. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/... //r (fwd)
Got this off-list, might be helpful. Forwarded Message Date: June 13, 2014 9:22:21 PM +0200 From: Christoph (Stucki) von Stuckrad stu...@math.fu-berlin.de To: Tom Robinson tom.robin...@motec.com.au, Daniel Staal dst...@usa.net Subject: Re: Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/... //r Hi! - and Sorry, all my tries to post did bounce. Seemingly our updated mailsystem changed something. So directly to you and may be somebody can post it, if it's useful. On Fri, 13 Jun 2014, Tom Robinson wrote: ...[errormessage]... and Daniel Staal ...[tied split into four lines, which will work]... Try changing 227 {printf %s, ((localtime $twas) =~ s/... //r =~ s/:.. / /r);} to: {printf(%s, ((localtime $twas) =~ s/... //r =~ s/:.. / /r));} There exist cases of ambiguity calling a function with a list of parameters which themselves are lists and need (...). Then you fix the interpretation of the parameter list by the extra pair of (...) around ALL the parameters of printf. Hope this helps, as I have not tested it, but experienced the same problem many times in debug prints :-) Stucki -- Christoph von Stuckrad * * |nickname |Mail stu...@mi.fu-berlin.de \ Freie Universitaet Berlin |/_*|'stucki' |Tel(Mo.,Mi.):+49 30 838-75 459| Mathematik Informatik EDV |\ *|if online| (Di,Do,Fr):+49 30 77 39 6600| Takustr. 9 / 14195 Berlin * * |on IRCnet|Fax(home): +49 30 77 39 6601/ -- End Forwarded Message --
Re: Operations on headers in UTF-8
--As of June 11, 2014 4:25:31 AM +0200, Karsten Bräckelmann is alleged to have said: On Tue, 2014-06-10 at 21:22 -0400, Daniel Staal wrote: --As of June 11, 2014 2:45:25 AM +0200, Karsten Bräckelmann is alleged to have said: Worse, enabling charset normalization completely breaks UTF-8 chars in the regex. At least in my ad-hoc --cf command line testing. --As for the rest, it is mine. This sounds like something where `use feature 'unicode_strings'` might have an affect Possibly. enabling normalization is probably setting the internal utf8 flag on incoming text, which could change the semantics of the regex matching. Nope. *digging into code* This option mainly affects rendered textual parts and headers, treating them with Encode::Detect. More complex than just setting an internal flag. What exactly made the ad-hoc regex rules fail is beyond the scope of tonight's code-diving. Right. And as a side-effect, Encode::Detect (as documented in Encode) is probably setting the utf8 flag on the Perl string. Note I mean internal to *perl*, not one of the modules or code. The utf8 flag affects what semantics perl uses when it compares strings, including in regexes. If that's the case, it raises the question of if we want Spamassassin to require Perl 5.12 (which includes that feature) - the current base version is 5.8.1. Unicode support has been evolving in Perl; 5.8 supports it generally, but there were bugs. I think 5.12 got most of them, but I'm not sure. (And of course it's not the current version of Perl.) The normalize_charset option requires Perl 5.8.5. All the ad-hoc rule testing in this thread has been done with SA 3.3.2 on Perl 5.14.2 (debian 7.5). So this is not an issue of requiring a more recent Perl version. `use feature 'unicode_strings'`, as a feature, only tangentially cares about what version of Perl you are running. Yes, you need a new enough version to use it, but since features are not enabled by default any affect they might have doesn't occur unless they are requested. While of course something to potentially improve on itself, the topic of charset normalization is just a by-product explaining the original issue: Header rules and string encoding, with a grain of charset encoding salt. True. I was just thinking aloud as it were, and wondering if an explanation could be found for breaking UTF-8 strings in the regex. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Operations on headers in UTF-8
--As of June 11, 2014 2:45:25 AM +0200, Karsten Bräckelmann is alleged to have said: Worse, enabling charset normalization completely breaks UTF-8 chars in the regex. At least in my ad-hoc --cf command line testing. --As for the rest, it is mine. This sounds like something where `use feature 'unicode_strings'` might have an affect - enabling normalization is probably setting the internal utf8 flag on incoming text, which could change the semantics of the regex matching. If that's the case, it raises the question of if we want Spamassassin to require Perl 5.12 (which includes that feature) - the current base version is 5.8.1. Unicode support has been evolving in Perl; 5.8 supports it generally, but there were bugs. I think 5.12 got most of them, but I'm not sure. (And of course it's not the current version of Perl.) Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Spamd not scoring messages
--As of May 22, 2014 3:04:04 PM +0200, Tom Hendrikx is alleged to have said: Hi, After checking the results of sa-update and doing some manual dns queries, it seems that last rule updates were done more than a month ago. This used to be an almost daily process, even when there were only score changes due to masschecks. Any specific reason for no new updates? Something we can assist with? --As for the rest, it is mine. This actually brings up an issue I've been tracking and trying to isolate. I still haven't isolated it, but I'll bring it up here in case anyone can help. My system only restarts spamd when the rules have been updated. This break has brought to light an issue where spamd - after running over 24 hours - stops actually *scoring* messages. It still logs that it's *processing* them, but no score is applied - either in header or in logging. As I've said, I'm still trying to isolate the exact causes: other activity on the box seems to be involved, so there may be load issues, or something weird going on with FreeBSD's Jail system. (The issue seems to go away if I stop my CPAN smoker jail, though I'm not 100% sure of that. In theory there should be no way the two processes interact - they aren't even using the same perl or kernel.) The exact time to failure is also in question - it doesn't seem to happen in under 24 hours, but how long over that is a question. (This is complicated by the fact that spamd has occasionally restarted itself inside the testing period.) I *do* know it isn't a 3.4 issue - it was occurring before I upgraded. I'll admit I haven't been working to hard on isolating it - mostly just as I do other things on the box I've been noticing if the behavior changes. It's even possible that some of what I think is 'normal' uncaught spam is part of this - my main notice is the mornings when I wake up and find 50+ spam emails in my inboxes. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Unexpected missing rule name, failure of spams/spamd to output X-Spam headers
--As of May 23, 2014 11:23:44 PM +0100, Martin Gregorie is alleged to have said: This morning SA 3.3.2 was working as expected on my SA test box when I amended a rule to recognise a new spam variant. The test box is running a fully patched (as of last Friday) copy of Fedora 20. Then I did my normal weekly yum upgrade. Shortly after that I got some new spam which I ran a test on using my normal spamc/spamd test system on the SA test box. To my surprise, no X-Spam headers at all were added to it. --As for the rest, it is mine. Two quick questions: Does it happen to *every* message passed to spamc, and does restarting spamd solve it? This sounds similar to the behavior I was mentioning in a post earlier, and am having trouble tracking down. Restarting mitigates in my case. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: sa-learn from a cronjob?
--As of April 20, 2014 12:14:37 PM -0700, Dan Mahoney, System Admin is alleged to have said: Most of my users aren't command-line friendly. I'd like to basically have my IMAP server default to handing out two imap mailboxes that get auto-crontabbed to training bayes. Ideally, I'd also like to make it so that things dropped in the learn_spam folder are deleted, and stuff in the learn_ham folder (mistake-based training) are de-tagged and moved back to the inbox. Alternatively, a single learned folder would do. Perl's Mail::Box seems like a heavy tool for this simple task. Does anyone else have any recommendations? --As for the rest, it is mine. You might find this script helpful: https://github.com/DanStaal/Arcfind I wrote it ages ago for my own use to help in doing basically what you are asking for. I found that my IMAP server had a bad habit of auto-deleting newly emptied directories, so I wanted to always leave at least one message in the 'learn as spam' folder. I use it with Maildir folders: the invocation is usually along the lines of 'mv `arcfind /mail/source/dir/cur/` /mail/dest/dir/cur/' It doesn't feed to spamassassin itself, but a separate cronjob of 'sa-learn' works just fine. Daniel T. Staal (I'm planning on putting it on CPAN as well, though I'm still considering the name and I need to fix some of the docs. The main page README is correct, I just don't have the module versions documented fully yet.) --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: BAYES_999 strange behavior
--As of February 20, 2014 1:56:18 PM -0500, Kevin A. McGrail is alleged to have said: People have hard_coded BAYES_999 entries as well. I recommend forwarding the announcement from John to the other mailing lists you are aware of these discussions. --As for the rest, it is mine. I intend to, as soon as I'm sure what's going to happen. ;) I just don't want people who've fixed their scores to be penalized. I know that doesn't help people who copied your block re-defining the rules entirely, but nothing really helps them. (Besides telling them not to do that unless they know what they are doing.) Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: BAYES_999 of score 1.0 (default)
--As of February 17, 2014 2:54:11 PM +, RW is alleged to have said: On Mon, 17 Feb 2014 09:09:33 -0500 Kevin A. McGrail wrote: On 2/17/2014 8:43 AM, Matus UHLAR - fantomas wrote: Hello, seems after last rule update we've got new rule BAYES_99 in 72_scores.cf but without score (and thus default 1.0) in 50_scores.cf. ... a mistake happened apparently? I'll look and see. I've never tried to promote a bayes rule so it might need to bypass sandbox. I have spam that's already hitting BAYES_999 with the default 1.0 score. --As for the rest, it is mine. Same here - it's causing a fair amount of FNs, as I have BAYES_99 set with a 4.7 score, so this is lowering the spam score for a lot of mail. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: New expensive Regexps
--As of February 6, 2014 5:32:47 PM -0800, Dave Warren is alleged to have said: On 2014-02-06 17:17, John Hardin wrote: On Thu, 6 Feb 2014, Kevin A. McGrail wrote: I've discussed it with Alex a bit but one of my next ideas for the Rules QA process is the following: - we measure and report on metrics for the rules that are promoted such as rank (existing), computational expense, time spent on rule. I assume meta rules would combine the expense of their components? Sounds interesting! How about if one or more components were called more by more than one meta-rule? It's perhaps not entirely fair to divide it evenly, since that might imply that removing the metarule would kill off that CPU usage. Perhaps documenting the cost of the individual components, summing them, with a flag to indicate that some or all of the components are shared? That sounds overly complex, but it at least gives the enterprising rule author or server administrator the ability to understand what is happening. --As for the rest, it is mine. I would probably give the meta-rule no cost - add up the cost of the components if you want it. (With the understanding that all no-cost rules are meta rules.) Another option would be to give meta rules *negative* cost - the number is the size of the cost of the sub-rules, the negative indicates that it is a meta rule. Just thoughts on options. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
RE: Rules always triggering.
--As of January 13, 2007 7:17:46 AM -0500, Dave Koontz is alleged to have said: Just a wild stab here, run a lint check on all your rules. I once fat fingered a rule in my local.cf file and got similar hit results as you are describing here. --As for the rest, it is mine. I fixed a couple of things, but the issue is still there. Current lint output: [24241] warn: config: failed to parse line, skipping: auto_learn 1 [24241] warn: config: failed to parse line, skipping: safe_reporting 0 [24241] warn: config: failed to parse line, skipping: use_terse_report 0 [24241] warn: config: failed to parse line, skipping: subject_tag *** Warning: Junk Mail *** [24241] warn: config: failed to parse line, skipping: rewrite_subject 0 [24241] warn: config: warning: score set for non-existent rule FAKE_HELO_YAHOO [24241] warn: config: warning: score set for non-existent rule HABEAS_SWE [24241] warn: config: warning: score set for non-existent rule FAKE_HELO_USA_NET [24241] warn: lint: 8 issues detected, please rerun with debug enabled for more information (Yes, I've built this config over a long period of time...) I'm liking the idea that this is an issue with Perl on Darwin expecting a different line ending. I just need to figure out how to _verify_ that. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Rules always triggering.
--As of January 12, 2007 12:40:00 PM -0800, John D. Hardin is alleged to have said: On Fri, 12 Jan 2007, Daniel T. Staal wrote: I am scanning mail via a procmail recipe Anything in that configuration that you can think of that would mess up those headers? I can post a set if you would like. There are procmail flags that allow passing only the message body text to the filter program. What's the procmail rule that you're using to call spamc? --As for the rest, it is mine. :0fw | spamc Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: Rules always triggering.
--As of January 12, 2007 7:08:18 PM -0600, Shane Williams is alleged to have said: System is Darwin, running Postfix. The sign-up message for this list got those rules triggered. (_Everything_ triggers them.) This is just a guess, but is it possible that OS X's use of carriage returns is making the message look to spamassassin as if it's a single line of text? --As for the rest, it is mine. I said Darwin, not OS X, though I recognize it is a small distinction. ;) The mail files are all saved to my Maildir folders with unix line endings. In general Darwin handles files in the format it receives them, and unix-tools create unix-files. ...But it does raise the question of what _Perl_ thinks the line endings is... Hmm. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Skipping Resent-From for blacklist.
I've got a problem where one of the message boards I was on has been hijacked and taken over by spammers. They are sending out short notifications of new board topics, all of which contain nothing but spam. Bayes hits these, but at the moment nothing else is. They do have the nice distinguishing characteristic though that they are being sent *from* the message board's email address. So, time for a blacklist. Which would be nice, except that all of my mail is forwarded from a commercial service to my personal mailserver. When this happens, the mailservice puts in a 'Resent-From' header, with my own public address. This obviously is screwing up my attempts to blacklist these spams. (Since if SA sees a 'Resent-From' it ignores all other froms in the headers for basic whitelists and blacklists.) Any ideas on how to get around this, using SA? (I can of course just filter those using procmail, but I'd rather SA did my spam filtering.) Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: sa-learn and Caught spams
--As of September 28, 2006 11:05:35 AM -0700, Kelson is alleged to have said: Daniel Staal wrote: Depends on the setup. For instance, given the explanations above, I'll start a system to automatically learn from my 'checkspam' folder, but not my 'highspam' folder. I have procmail automatically sort my spam by score, so I can pay extra attention to low-scoring spam. (Which is more likely to be ham which was misplaced than the high-scoring spam.) So, since I *already* have them separated out, I can avoid the double-check. ;) But the final score alone doesn't determine whether something gets autolearned. As Matt pointed out, there are a number of different factors, including the mix of head/body tests and the current Bayes score -- and it acts on what the score would have been if Bayes had been disabled. So unless you've filtered on the autolearn=(ham|spam|no) tag in the X-Spam-Status header, you could be missing some high-scoring spam that hasn't already been learned. You could probably filter your training folder to remove any messages where X-Spam-Status contains autolearn=spam (assuming, of course, that your server takes full control of that header). That should be relatively fast and cut down on the resources used to identify duplicates. --As for the rest, it is mine. Just as an update, since I'm seeing something interesting... As an experiment, I set procmail to copy all the 'highspam' that I get that *doesn't* get autolearned to a separate folder, and have been attempting to train on that folder daily. I say 'attempting' because despite these *only* being the emails that had 'autolearn=no' and were definitely spam, in three days sa-learn has yet to see any useful tokens in one of these messages. Generally, upon examination, these messages already are receiving bayes scores of 99% or better, so it appears that the tokens found are already fully scored. (Though not all of them have had such high bayes scores.) I'll be keeping it up for a while; three days isn't much of a test, after all. But at this point it appears extra training on messages with scores over 10 (my high-spam cut-off) doesn't actually do anything. All relevant tokens are already learned, at least in a fully-trained and well-tuned system. Spam emails scored less than 10 do have a number of messages each day that have useful tokens, on my system. Which is to be expected, after all. Just thought this might be of interest. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: SpamAssassin MX Gateway Server
--As of September 30, 2006 12:32:41 PM -0500, Russ B. is alleged to have said: Basically, anything that arrives over 15 in score, will have that SPAM-STATUS header embedded, so it does NOT run SpamAssassin on this server, and just puts it in the Caught-Spam. If it has LOWER than a score of 15 from the MX, then the MX server didn't put a header on it, so it's processed here and filed here. Why do that? Because my users on the sendmail server farm have a whole variety of score choices they are using, so I want their specfic score to be utilized - but by making the score on the MX 15, I'm saving the sendmail server from a WHOLE LOT of processing, and nobody's going to have a default score over 15... so that's a safe number? --As for the rest, it is mine. Just as a thought: Since you are running procmail on them anyway, it should be possible to have a script in there that reads the desired score and uses the score count Spamassassin embeds in the 'X-Spam-Level:' header to filter. It wouldn't reformat the mail (at least not without a lot of work), but you could at least file it differently... Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---
Re: sa-learn and Caught spams
--As of September 27, 2006 5:43:28 PM -0700, Kelson is alleged to have said: Daniel T. Staal wrote: True. So... Optimal is obviously to train, once and correctly, on all messages. Sending a message through that has been trained will consume *some* resources, but less then one that still needs to be learned. So the exact balance is a complicated question. ;) I just train on everything. If it's already learned from a message, it takes a few resources for it to recognize that, but almost certainly less time than it would have taken me to separate them out. --As for the rest, it is mine. Depends on the setup. For instance, given the explanations above, I'll start a system to automatically learn from my 'checkspam' folder, but not my 'highspam' folder. I have procmail automatically sort my spam by score, so I can pay extra attention to low-scoring spam. (Which is more likely to be ham which was misplaced than the high-scoring spam.) So, since I *already* have them separated out, I can avoid the double-check. ;) Anyway, I just knew that there was an automatic system, and at the very least there is *some* load to re-learning, even if a full analysis is skipped. It would be interesting to see how much it actually is, compared to an easy filter. If I find time, I may try to figure out a good test. Daniel T. Staal --- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---