Re: Moving Spam to Junk Folder
On Thu, 3 Sep 2020, David B Funk wrote: On Thu, 3 Sep 2020, bobby wrote: I am following this tutorial: https://www.linuxbabe.com/redhat/spamassassin-centos-rhel-block-email-spam.I followed the steps in "Move Spam into the Junk Folder". When I send an email from a blacklisted e-mail address, I get a bounce e-mail from my e-mail server. Here is what is in my spamass-milter file: EXTRA_FLAGS="-m -r 8 -R NO_SPAM -i 127.0.0.1 -g sa-milt -- --max-size=512" I would prefer it to go into my Junk folder. How can I make this happen? Bobby, You need to read the spamass-milter documentation to understand what those options are doing. That "-r 8" tells spamass-milter to return a 'SMIFS_REJECT' status to postfix if the spam score is over 8. This causes postfix to refuse to accept the message at all (sort of like when somebody tries to send a message to a bogus recipient). So if postfix never lets spam get in the front door it cannot be delivered to any kind of "Junk Folder" You probably want either the -b or -B option, which allows you to specify an address that tagged mail gets sent to. It's particularly useful in combination with the -r option so that you can get a sense of what's being rejected outright. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Why the new changes need to be "depricated" forever
On Tue, 21 Jul 2020, Loren Wilton wrote: You note that "gay" has a different meaning today. As far as I know, the words "black" and "white" were not systematically used to refer to skin colors before about 1963, when a movement was set afoot in the USA to replace "negro" with "black" and "caucasian" with "white". As I mentioned in a post on July 14, black and white to refer to races and skin color (and also red and yellow) gained traction at least as far back as the European Enlightenment, when it was all the rage to classify things, and most Enlightenment writers are explicitly racist in their descriptions and classification. But these terms are used going back thousands of years as well. My post from the 14th includes several links you might find informative. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave
manuel Kant, while dissagreeing with Buffon that someone could return to "normal" just by moving to a different climate, agreed that "the Negroes, and in general all the other species of men [are] naturally inferior to the whites" (Hume) or that "the Negroes of Africa have by nature no feeling that rises above the trifling" https://books.google.com/books?id=eem1AQAAQBAJ=PA9=PA9#v=onepage=false -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave
I would argue that welcome is better than allow in many contexts, including SpamAssassin. After all, w.*list isn't just used to indicate something is allowed, but to indicate that we actively want to receive the email in question (by lowering its score). You allow a maintenance worker into your apartment, but you welcome a friend On Tue, 14 Jul 2020, Kevin A. McGrail wrote: Yeah, allow/deny is more logical but using them requires all acronyms to change. After some trial and error, we dialed in the changes to welcome and block which also keeps other terminology like RBL, DNSBL, WLBL, etc. consistent so there is less upheaval. Regards, KAM -- Kevin A. McGrail Member, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Tue, Jul 14, 2020 at 10:08 AM Marc Roos wrote: > I like the change from whitelist/blacklist to allowlist/blocklist because it is more descriptive. Allow/deny list sounds more logical. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave
On Fri, 10 Jul 2020, Axb wrote: On 7/10/20 8:31 PM, Bill Cole wrote: The SpamAssassin Project has a particular self-interest in attracting contributors from a diversity of cultures, because we are always at risk of mislabelling a pattern of letters or words as 'spammy' when in fact it is entirely normal in a cultural context other than those of the existing contributors to the project. C From what I see, until now, only two ppl of the SpamAssasin project have supported this motion and intend to impose this quatsch to the rest of the world. Voices against these changes have been politely ignored. The danger of judging the world only by what is within your sight is that your field of vision is limited, and there are any number of explanations for why what you see is not representative of the whole. Maybe those who agree feel no need to comment. Maybe a lot of people on either side of the issue want to avoid adding more noise to a list that's about SpamAssassin. Maybe a lot of people recognized this wasn't a "motion" or a request for comment at all, but rather notice of a change to code. Or, as you yourself mention, maybe a lot of people are just politely ignoring the negative voices.
Re: Rule for detecting two email addresses in From: field.
I use a plugin that detects mismatches, but tries to be a little smart about what counts as a mismatch (like making sure the mismatch isn't really just that one address is from a subdomain of the other's domain, or someone carelessly using the "@" character in the name part of the From header). https://github.com/enkidushane/sa-frommismatch On Fri, 4 Oct 2019, Philip wrote: Morning List, Lately I'm getting a bunch of emails that are showing up with two email addresses in the From: field. From: "Persons Name " When you look in your mail client (Outlook, Thunderbird) it's showing only "Persons Name " Is there a way I can mark From: that has 2 email addresses in it as spam? Pro's Cons? Phil -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Whitelist rcvd IP
I believe the "whitelist_from_rcvd" option, which is now in SpamAssassin core, functions the same as the old Mail::SpamAssassin::Plugin::WhitelistRcvdIP module, though with a slightly different syntax. If you really want to use it as a blanket whitelist for a certain IP address or range, the first parameter can be specified as *@*. Whether that's advisable, I'll leave to others to comment. Also, the old WhitelistRcvdIP plugin is about 12 years old, and I see no development since then, so I'd be reluctant to use it. On Wed, 12 Jun 2019, Emanuel Gonzalez wrote: Hello, I have the need to mark certain IP addresses as secure, only for receiving mail, but I can not find information about it. In a publication they advise using the module called Mail :: SpamAssassin :: Plugin :: WhitelistRcvdIP but I can not find it. Any ideas.? Regards, -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: New URL shortener
I knew that URL looked familiar. I added it and a few others last year, and was going to add the one mentioned earlier in the week but got distracted by how to get my fork in sync with Steve's. That said, I think it's tough for even a handful of people to keep up with all the new shorteners. On Thu, 6 Jun 2019, Amir Caspi wrote: On Jun 6, 2019, at 9:03 PM, Kenneth Porter wrote: I'm seeing a lot of fake DHL delivery notices using the shortener smarturl.it. I suggest adding it to __URL_SHORTENER. FWIW there is a long list of url shorteners as part of the DecodeShortURLs plugin (sadly, no longer maintained),here:https://github.com/smfreegard/DecodeShortURLs/blob/master/DecodeShortU RLs.cf It includes the one you just mentioned as well as a whole bunch of others. Kevin, perhaps DecodeShortURLs should become part of the default SA distribution? It works really well in general, with the exception of a few outstanding bugs that are fairly minor and likely easily fixable by someone who knows what they're doing. The original owner no longer maintains this, having moved to rspamd. Cheers. --- Amir -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Having trouble getting Spamassassin to work on Ubuntu Server 18.10
I'd suggest running spamassassin directly from the command line with the -D and --lint options to see if that provides more detail about what exactly is going wrong. This is going to give you a lot of output so you'll probably want to run it like: spamassassin -D --lint 2>&1 | less On Sun, 10 Feb 2019, Ken Wright wrote: I've been trying to set up an email server and I want to use Spamassassin to prevent it from becoming Spam Central. I've installed SA and spamass-milter, but when I try to restart it after customizing the config files, I get this: Job for spamassassin.service failed because the control process exited with error code. See "systemctl status spamassassin.service" and "journalctl -xe" for details. So I checked journalctl and got this: -- Unit spamassassin.service has begun starting up. Feb 08 02:19:31 grace spamd[6289]: logger: removing stderr method Feb 08 02:19:32 grace spamd[6314]: Timeout::_run: check: no loaded plugin implements 'check_main': cannot scan! Feb 08 02:19:32 grace spamd[6314]: Check that the necessary '.pre' files are in the config directory. Feb 08 02:19:32 grace spamd[6314]: At a minimum, v320.pre loads the Check plugin which is required. Feb 08 02:19:32 grace spamd[6289]: child process [6314] exited or timed out without signaling production of a PID file: exit 255 at /usr/sbin/spamd line 3034. Feb 08 02:19:32 grace systemd[1]: spamassassin.service: Control process exited, code=exited status=255 Feb 08 02:19:32 grace systemd[1]: spamassassin.service: Failed with result 'exit-code'. Feb 08 02:19:32 grace systemd[1]: Failed to start Perl-based spam filter using text analysis. -- Subject: Unit spamassassin.service has failed At a friend's suggestion I also checked the mail.log and got this: Feb 8 02:19:25 grace spamd[6144]: logger: removing stderr method Feb 8 02:19:26 grace spamd[6172]: Timeout::_run: check: no loaded plugin implements 'check_main': cannot scan! Feb 8 02:19:26 grace spamd[6172]: Check that the necessary '.pre' files are in the config directory. Feb 8 02:19:26 grace spamd[6172]: At a minimum, v320.pre loads the Check plugin which is required. Feb 8 02:19:26 grace spamd[6144]: child process [6172] exited or timed out without signaling production of a PID file: exit 255 at /usr/sbin/spamd line 3034. Yes, v320.pre loads the Mail::SpamAssassin::Plugin::Check module, which is installed and up to date. I've just about run out of ideas. Anyone have any? Sorry this is so long, but I didn't want to omit any pertinent information. Ken Wright, pulling his hair out. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: SpamSender with 2 @-signs in the address
On Mon, 3 Dec 2018, Alan Hodgson wrote: On Mon, 2018-12-03 at 13:17 -0600, sha...@shanew.net wrote: Yeah, I see all these same things. Better to test against From:addr rather than the full From: Perhaps something like: From:addr =~ /\@[^\s]+\@/ Of course, there might still be legit cases of that kind of usage. The problem though for phishes is that some user agents (ie. Outlook) only display the quoted user-friendly part of the address, not the rest of the From: header. So phishers specifically put a fake @domainbeingphished.com in quotes so your users will see that. There were several different plugins started about a year ago to detect that sort of thing. I know of: https://github.com/enkidushane/sa-frommismatch https://github.com/fmbla/spamassassin-fromnamespoof and I think someone has implemented some of this in a regex rule, but I don't recall off the top of my head who that was. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: SpamSender with 2 @-signs in the address
Yeah, I see all these same things. Better to test against From:addr rather than the full From: Perhaps something like: From:addr =~ /\@[^\s]+\@/ Of course, there might still be legit cases of that kind of usage. On Mon, 3 Dec 2018, Alan Hodgson wrote: On Mon, 2018-12-03 at 11:15 -0700, Grant Taylor wrote: I don't think the multiple @ signs have worked in a very long time. So I see no reason not to add score based on multiple @ signs. Or if there is a legitimate use for it, it should be extremely rare and the false positive rate should be acceptable. I've been watching these for a while, and unfortunately there are a lot of customer-service type systems that send From: addresses with quoted @domain addresses in them. Many of them do "user@address via" , but not all. And then there are the messages with 2 different From: addresses within <>'s in them. I see those from Gmail sometimes. And I see quite a few messages where the actual sender address is given in quotes and then followed by the same address in <>'s. So you will definitely get false positives just looking at @'s. I've excluded the ones with " via" in them and add a bunch of extra points if they come from phishy countries or have .doc or .pdf attachments, and that hits fewer fps. And I'm only scoring if the domain parts don't match. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Could not retrieve sendmail macro "auth_type"!.
I would doublecheck that the macro appears in sendmail.cf. Maybe the apt-get update ignores your sendmail.mc and just replaces the sendmail.cf directly? On Sun, 2 Sep 2018, Michael Grant wrote: I'm running spamassassin on several debian systems using sendmail and using spamass-milter. I'm seeing this error in my mail logs on one I updated yesterday: Sep 1 08:21:01 debian spamass-milter[536]: Could not retrieve sendmail macro "auth_type"!. Please add it to confMILTER_MACROS_ENVRCPT for better spamassassin results I definitely have this macro in my sendmail.mc file: define(`confMILTER_MACROS_ENVRCPT',`r, v, Z, {auth_type}, {greylist}, {auth_ssf}')dnl Furthermore on 2 other nearly identical systems I don't have this warning message. I only started seeing this warning message when I ran updates yesterday. I only get it on inbound mail. The main packages are all the same version from one system to the other: dpkg -l | g 'sendmail|spamass|milter' ii libmilter1.0.1:amd64 8.15.2-11 amd64 Sendmail Mail Filter API (Milter) ii sa-compile 3.4.1-8 all Tools for compiling SpamAssassin rules into C ii sendmail 8.15.2-11 all powerful, efficient, and scalable Mail Transport Agent (metapackage) ii sendmail-base 8.15.2-11 all powerful, efficient, and scalable Mail Transport Agent (arch independent files) ii sendmail-bin 8.15.2-11 amd64 powerful, efficient, and scalable Mail Transport Agent ii sendmail-cf 8.15.2-11 all powerful, efficient, and scalable Mail Transport Agent (config macros) ii spamass-milter 0.4.0-1+b1 amd64 milter for filtering mail through spamassassin ii spamassassin 3.4.1-8 all Perl-based spam filter using text analysis ii spamc 3.4.1-8 amd64 Client for SpamAssassin spam filtering daemon The sendmail.mc is also the same (with differences being things like hostnames). The only difference I know of is one system was updated via apt yesterday, other a couple months old. Anyone else seeing this? What other change might have caused this? Michael Grant -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Anti Phish Rules
On Thu, 26 Apr 2018, David Jones wrote: header __BAD_FROM_NAME From:name =~ /(^chase$|chase\.com|Internal Revenue Service|banking|Bank of America|American Express|Wells Fargo|NavyFederal|Geico|E-fax|Share.oint|UPS Delivery|FedEx|PayPal|Apple Support|USAA|.ropbox|Dro.box)/i meta BAD_FROM_NAME __BAD_FROM_NAME && !ALL_TRUSTED describe BAD_FROM_NAME Displayed From contains bad information to trick the recipients score BAD_FROM_NAME 4.0 People named Chase may not care for that first item in the grouping -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
You might take a look at https://developers.google.com/url-shortener/v1/getting_started 1 miion requests per day is the default limit. On Wed, 14 Mar 2018, Rob McEwen wrote: On 2/20/2018 9:42 PM, Rob McEwen wrote: Google might easily start putting captchas in the way or otherwise consider such lookups to be abusive and/or mistake them for malicious bots... This prediction turned out to be 100% true. Even though others have mentioned that they have been able to do high-volume lookups with no problems... And granted I wasn't implementing a multi-server or multi-ip lookup strategy... But I don't think I was doing nearly as many lookups as others have claimed that they were able to do. I took a batch of 55,000 spams that I had collected from the past 4 weeks where those spams were maliciously using the Google shortener as a way to get their spam delivered via hiding their spammy domain names from spam filters. I started checking those by looking up the redirect from Google's redirector, but without actually visiting the site that the redirector was pointing to. Please note that I was doing the lookups one-at-a-time, not starting the next lookup until the last lookup had completed. After about ONLY 1,400 lookups, ALL of my following lookups started hitting captchas. See attached screenshot. Also, other than not sending from multiple IPs, I was otherwise doing everything correct to make my script look/act like a regular browser. I'll try spreading it out between multiple IPs in order to try to avoid rate limits... However... This is still cause for concern about high-volume lookups in high production systems... those may have to be implemented a little more carefully if they're going to do these kind of lookups! Just because small or medium production systems are able to do this... Or just because somebody went out of their way to get more sophisticated with it to get it to work out... doesn't mean that it's going to work in high production systems that are trying to use "canned" software or plugins. This is a particular challenge for anti-spam blacklists because they typically process a very high volume of spams. Hopefully, the randomness of the ones I process as they come in... will be sufficiently spread out enough to avoid rate limiting? It was my hope to start processing these live with my own DNSBL engine, so that I could start blacklisting the domains that they redirect to... In those cases where they were not already blacklisted... Now I'm going to have to deal with constantly trying to make sure that I'm not hitting this captcha, along with implementing some other strategies to hopefully prevent that. But this brings up a whole other issue... That is more of a policy or legal issue... is Google basically making a statement that automated lookups are not welcome? Or are considered abusive? (btw, I could have collected order of magnitudes more than 55,000 of THESE types of spams, but this was merely what was left over in an after-the-fact search of my archives, after a lot of otherwise redundant spams had already been purged from my system.) PS - Once I gather this information, I will submit more details about the results of this testing. But what is shocking right now is that less than four tenths of 1% of these redirect URLs has been terminated, even though they average two weeks old, with some almost a month old. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
Just FYI, it does add 3.0 points as soon as it sees any chaining at all. The other 5.0 points get added at 10 redirections. That said, I think you're guess is right that redirections start to look really suspicious after just 3 or 4. On Sat, 3 Mar 2018, @lbutlr wrote: On Feb 26, 2018, at 09:55, sha...@shanew.net wrote: This is why the DecodeShortURLs plugin has an explicit limit of 10 lookups (and penalizes such with a total of 8 points). I’d guess more than one redirect is highly suspicious and more than two is probably a waste of time, just score 5.0 and be done with it. Has anyone done any analysis on multi-redirects? -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
On Mon, 26 Feb 2018, David B Funk wrote: Just be careful how you do that "expand redirections until no more redirections" or you may get caught in a spammer trap. This is why the DecodeShortURLs plugin has an explicit limit of 10 lookups (and penalizes such with a total of 8 points). DecodeShortURLs has been on my list of must-have plugins for years, so I was a little surprised it took so long for someone to mention it in this thread. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
On Thu, 15 Feb 2018, RW wrote: On Thu, 15 Feb 2018 11:56:55 -0600 (CST) sha...@shanew.net wrote: So, the sample size doesn't matter when calculating the probability of a message being spam based on individual tokens, but it can matter when we bring them all together to make a final calculation. It's not a matter of how they combine, smaller counts just lead to less accurate token probabilities. I'm not saying that it doesn't matter how much you train, I'm saying that if you have enough spam and enough ham Bayes is insensitive to the ratio. I agree that past a certain minimum threshold, the ratio doesn't matter much. But as I understand it, larger sample size makes a difference. I haven't checked the math in the Bayes plugin, but it explicitly mentions using the "chi-square probability combiner" which is described at http://www.linuxjournal.com/print.php?sid=6467 Maybe I'm misunderstanding what that article describes, but I'm pretty sure what it boils down to is that when the occurence of a token is too small (he uses the phrase "rare words") it can lead to probabilities at the extremes (like a token that occurs only once and is in spam, so its probability is 1). The way to address these extremely low or extremely high probabilities is to use the Fisher calculation (which is described in the second page of the article). Maybe this is where I'm making a logical leap that I shouldn't, but I think that "non-rare words" increasingly outnumber "rare words" as the sample size of messages (and thus tokens) increases. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam
On Thu, 15 Feb 2018, RW wrote: On Thu, 15 Feb 2018 00:01:18 +0100 Reindl Harald wrote: Am 14.02.2018 um 23:07 schrieb RW: My point is that an imbalance doesn't create a bias wrong - what you tried to say was "doesn't necessarily create a bias" - but in fact when the imbalance is too big *it does* simply think about how bayes works makes that clear: eahc word a token with ham/spam counter - when you have 1 Mio of one type and 1 of the other type guess how that counter start to get biased As I said, Bayes is based on frequencies. If a token occurs in 10% of ham and 0.5% of spam based on 10,000 hams and 10,000 spams, what do you think is likely to happen to those percentages with 10,000 hams and 1,000,000 spams? Perhaps it would help to state Bayes' formula explicitly. The probabality that a message is spam given a specific token is equal to: (the probabilty of a token occuring in spam) times (the probability that a message is spam) divided by (the probabilty of that token occuring in all messages) The important feature in this formula is that every value being operated on is a probability, so if a given token occurs in .5% of 10,000 spams, we would expect it to occur in .5% of 100,000 or 1,000,000. If that assumption is true, and the .5% probability doesn't change, the resulting calculated probability also doesn't change. For actual spam detection, this is complicated by the fact that we end up with a whole stack of calculated probabilites for each token (including the probabilities that a message is non-spam given specific tokens), and we have to take all of them into account to calculate a final probability. In this process, it's not unusual that some individual calculated probablities "matter" more than others, and one basis for how much weight a particular probability gets is how much we can trust that probability. Here's where the 10,000 vs. 1,000,000 comes into play, because we can rely on the .5% probability out of 1,000,000 samples more than we can the .5% probability out of 10,000 samples, and both of those are better than a .5% probability out of 100 samples (that said, the difference in trust increases more between 100 samples and 10,000 samples than from 10,000 samples to 1,000,000 samples due to diminishing return). So, the sample size doesn't matter when calculating the probability of a message being spam based on individual tokens, but it can matter when we bring them all together to make a final calculation. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: From name containing a spoofed email address
Just a hunch, but did you make sure to add the "$self->register..." line inside the "sub new {" block with all the others in HeaderEval.pm? On Fri, 26 Jan 2018, Chris wrote: On Mon, 2018-01-22 at 10:05 -0500, Rupert Gallagher wrote: This is my current solution for a problem that has been discussed many times in this list. I wrote it last year, and it serves me well. Feel free to use it, if you find it useful. This part goes into your local.cf: header __F_DM1 eval:from_domains_mismatch() header __F_DM2 From:addr =~ /\@(pec|legalmail|telecompost)(\.[^\.]+)?\.it/ meta F_DM ( __F_DM1 && ! __F_DM2 ) describe F_DM From:name domain mismatches From:addr domain priority F_DM -1 score F_DM 5.0 This part goes into the general HeaderEval.pm: $self->register_eval_rule("from_domains_mismatch"); [...] sub from_domains_mismatch { my ($self, $pms) = @_; my $temp; $temp = $pms->get('From:addr'); $temp =~ /@(.+)/; my $fromAddrDomain; $fromAddrDomain = "$1"; $temp = $pms->get('From:name'); $temp =~ /@([^\@\"\s]+)/; my $fromNameDomain; $fromNameDomain = "$1"; dbg("from_domains_mismatch: fromNameDomain=$fromNameDomain, fromAddrDomain=$fromAddrDomain"); if ( $fromNameDomain eq "" ) { return 0; # all well } else { if( $fromNameDomain eq $fromAddrDomain ) { return 0; # all well, they match } else { return 1; # mismatch, possibly spam } } } R.G. Just for the heck of it I added the above to my SpamAssassin setup at home. However my syslog shows: rules: failed to run __F_DM1 test, skipping: (Can't locate object method "from_domains_mismatch" via package "Mail: [...]:SpamAssassin::PerMsgStatus" at (eval 1816) line 19.) I did restart SA after adding this. SA version 3.4.1 -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: From name containing a spoofed email address
Just to add to the confusion, uh, I mean options. Here's what I've got so far. I'm using it in production currently, but it's still very young code, so use it at your own risk. https://github.com/enkidushane/sa-frommismatch/ I purposely avoided using uri_to_domain because it's in flux right now, but I might go back and add a version check to make use of it. As I mentioned to Paul privately, seeing others' code strengthens my opinion that the hard part here is recognizing when an email address / domain actually needs to be checked. For instance, I require "@" to be immediately followed be a valid domain character. This avoids false positives on things like "Events @ GA" (example from my email stream); on the other hand, it would miss something like "Bob @ usaa.com". If you try out my plugin, be warned that you will likely get false-positives on yahoogroups.com. I have yet to decide whether detection of exceptions like this should be happening in the plugin, or via some meta combination of rules. If you hit other false positives, I'd be interested to hear about them. On Mon, 22 Jan 2018, Alex wrote: Hi, This part goes into the general HeaderEval.pm: $self->register_eval_rule("from_domains_mismatch"); [...] I'd like to try this, but this is not in the current 3.4.2 svn. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: From name containing a spoofed email address
I think what's tripping you up is what parts of the mail "From:addr" and "From:name" refer to. In the example you give: From: blablabla <blabla...@gmail.com> From:name will be "blablabla" and From:addr will be "blabla...@gmail.com" Since there's no "@" in From:name, there's clearly not an email address there, so there's nothing to compare to the domain part of From:addr. The "bounces.em.secureserver.net" you're referring to is part of the EnvelopeFrom (AKA ReturnPath). This particular check doesn't consider that domain name in any way whatsoever. On Mon, 22 Jan 2018, Chip wrote: I might be wrong here understand I'm still learning, but the purpose of the filter, from what I've been able to grasp, is that it checks the From:addr and From:name values in SA to find their domain and triggering a rule hit if there is a domain in the From:name that doesn't match the domain in the From:addr. In the example I sent From: (as in From:name) contains the domain "gmail.com" - blabla...@gmail.com From:addr contains "bounces.em.secureserver.net" Thus mismatch between From:name that doesn't match the domain in the From:addr. Thus it would identify this message as probably spam, which it is not. Are people talking about a name like "bla@bla...@domain.com"? in this thread meaning the actual "@" character in the "name" or are we comparing domains from the From:add to the domain in the From:name? On 01/22/2018 05:56 PM, RW wrote: On Mon, 22 Jan 2018 17:44:00 -0500 Chip wrote: Following is the full header with identifiable information anonymized. I don't see what you are getting at, in: From: blablabla <blabla...@gmail.com> blablabla doesn't contain an "@". -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: From name containing a spoofed email address
This particular effort is looking at the From header, not the EnvFrom header (though there is a check From==EnvFrom as well). What we're looking for here are things like: From: "b...@usaa.com" <bgef...@gmail.com> Or look at the pastebin example at the start of the thread. Also, without seeing the full email, I can't say for sure, while your example may be legitimate email, the "dmarc=fail" suggests that the sender is, in fact, spoofing that gmail address (as in, it lacks a valid DKIM and/or doesn't come from a server approved by gmail's SPF record). It's just that spoofing isn't a sure-fire way to determine that something is spam (if only...). On Mon, 22 Jan 2018, Chip wrote: So it's my understanding that SA does the following with this rule, which is it is checking the From:addr and From:name values in SA to find their domain and triggering a rule hit if there is a domain in the From:name that doesn't match the domain in the From:addr. However, when I examine the headers from many legitimate non-spoofed emails from bulk senders such as constantcontact, madmimi, sendgrid, etc. it is very common to find a legitimate sender with a From:addr such as n...@gmail.com which clearly conflicts with the domain name in the From:addr, address being, for example, with madmini bulk sending as an example: smtp.mailfrom=sp_12x.55xx.1.d2b65521fe5d9342...@bounces.em.secureserver.net; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: <sp_12x.55xx.1.d2b65521fe5d9342...@bounces.em.secureserver.net;> Received: from m205.em.secureserver.net (m205.em.secureserver.net. [1xx.xx.xxx.xx]) From: balblabla <blabla...@gmail.com> would this rule classify that email as probably spam when in fact it most certainly is not. So what am I not understand here? On 01/22/2018 10:20 AM, David Jones wrote: On 01/22/2018 09:05 AM, Rupert Gallagher wrote: This is my current solution for a problem that has been discussed many times in this list. I wrote it last year, and it serves me well. Feel free to use it, if you find it useful. This part goes into your local.cf: header __F_DM1 eval:from_domains_mismatch() header __F_DM2 From:addr =~ /\@(pec|legalmail|telecompost)(\.[^\.]+)?\.it/ meta F_DM ( __F_DM1 && ! __F_DM2 ) describe F_DM From:name domain mismatches From:addr domain priority F_DM -1 score F_DM 5.0 This part goes into the general HeaderEval.pm: $self->register_eval_rule("from_domains_mismatch"); [...] sub from_domains_mismatch { my ($self, $pms) = @_; my $temp; $temp = $pms->get('From:addr'); $temp =~ /@(.+)/; my $fromAddrDomain; $fromAddrDomain = "$1"; $temp = $pms->get('From:name'); $temp =~ /@([^\@\"\s]+)/; my $fromNameDomain; $fromNameDomain = "$1"; dbg("from_domains_mismatch: fromNameDomain=$fromNameDomain, fromAddrDomain=$fromAddrDomain"); if ( $fromNameDomain eq "" ) { return 0; # all well } else { if( $fromNameDomain eq $fromAddrDomain ) { return 0; # all well, they match } else { return 1; # mismatch, possibly spam } } } R.G. This looks like a simple and valuable approach that should be considered for inclusion into SA for everyone. Do you mind opening up a bug at https://bz.apache.org/SpamAssassin/ in the Plugins section? We could put this in for everyone with a low score and give it a trial run before increasing the score. I will run it locally as well and see how it goes. Sent with ProtonMail <https://protonmail.com> Secure Email. Original Message On 17 January 2018 8:31 PM, David Jones <djo...@ena.com> wrote: Would a plugin need to be created (or an existing one enhanced) to be able to detect this type of spoofed From header? From: "h...@hulumail.com <mailto:%22h...@hulumail.com> !" lany...@hotmail.com <mailto:lany...@hotmail.com> https://pastebin.com/vVhGjC8H Does anyone else think this would be a good idea to make a rule that at least checks both the From:name and From:addr to see if there is an email address in the From:name and if the domain is different add some points? We are seeing more and more of this now that SPF, DKIM, and DMARC are making it harder to spoof common/major brands that have properly implemented some or all of them. David Jones -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: From name containing a spoofed email address
I've got a basic plugin written for this now, but I'd like to do a litle more testing before I make it widely available. If you have mail samples (ham or spam) with an "@" character in the name part of the From field that you're willing to share, let me know. BTW, I've already run into some false-positive situations, the most common being things from yahoogroups, which apparently writes the "true" sender address in the name part of From (they also dkim sign, so not too hard to work around). I started trying to handle these in the plugin itself, but I'm beginning to think these would be better as separate rules and then combined as metas to mitigate the actual mismatch score. On Wed, 17 Jan 2018, David Jones wrote: Would a plugin need to be created (or an existing one enhanced) to be able to detect this type of spoofed From header? From: "h...@hulumail.com !" <lany...@hotmail.com> https://pastebin.com/vVhGjC8H Does anyone else think this would be a good idea to make a rule that at least checks both the From:name and From:addr to see if there is an email address in the From:name and if the domain is different add some points? We are seeing more and more of this now that SPF, DKIM, and DMARC are making it harder to spoof common/major brands that have properly implemented some or all of them. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Turn OFF SA spam filtering but keep ON header examination
I can't help but think that you'd be better of using something like procmail, maildrop (part of Courier), or sieve if want you want is sorting without all the overhead of checking for spam. But maybe I'm not understanding what you want to accomplish... On Thu, 18 Jan 2018, Chip wrote: Newbie excited to use the features of SpamAssassin for a new project that needs to flag inbound email for sorting into folders (this can be done via cpanel-level filtering) based on keywords in headers (header search by SA). This is a Centos 6.9 machine running cpanel/WHM 11.68.0.23 and SpamAssassin version 3.4.1 running on Perl version 5.10.1. I would like to TURN OFF any and all Spam Identification features and only leave behind SpamAssassin's examination of headers and subsequent Subject modification based on keywords in headers (such as keywords in DKIM or SPF, etc) 1) Can this be done, and; 2) What tweaks need to be made to SA in its configuration files to make it happen, and; 3) what else is recommended here. Thank you. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Mail flagged as spam on command line getting passed through as ham
Most likely you've forgotten to restart spamd or maybe whatever glue calls SpamAssassin (amavisd, for example). As a side note, if you want it to score 7 regardless of network/bayes tests (which is what your score line indicates), you can just use "score SHARK_TANK 7" On Thu, 18 Jan 2018, Andy Howell wrote: I've been getting annoying spams for "Shark Tank". I added a simple rule in local.cf to check the subject line: header SHARK_TANK Subject =~ /\bshark tank\b/i score SHARK_TANK 7 7 7 7 The mail still get through. In my inbox: X-Spam-Flag: NO X-Spam-Score: 4.148 X-Spam-Level: X-Spam-Status: No, score=4.148 required=6.2 tests=[BAYES_80=2, DIET_1=0.001, HTML_IMAGE_RATIO_02=0.437, HTML_MESSAGE=0.001, SPF_HELO_PASS=-0.001, T_REMOTE_IMAGE=0.01, T_RP_MATCHES_RCVD=-0.01, T_SPF_TEMPERROR=0.01, URIBL_BLACK=1.7] autolearn=no autolearn_force=no If I pass the mail through spamassasin on the command line, it gets flagged as spam: spamassassin -D < spam-mail-shark-tank.txt >out.txt 2>&1 In out.txt: X-Spam-Flag: YES X-Spam-Level: X-Spam-Status: Yes, score=20.5 required=5.0 tests=BAYES_60,DIET_1, HTML_IMAGE_RATIO_02,HTML_MESSAGE,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK, RCVD_IN_SBL_CSS,SHARK_TANK,SPF_HELO_PASS,T_REMOTE_IMAGE,URIBL_ABUSE_SURBL, URIBL_BLACK,URIBL_DBL_SPAM autolearn=spam autolearn_force=no version=3.4.1 X-Spam-Report: * 7.0 SHARK_TANK No description available. * 1.2 URIBL_ABUSE_SURBL Contains an URL listed in the ABUSE SURBL * blocklist * [URIs: coloringkidsus.com] * 3.3 RCVD_IN_SBL_CSS RBL: Received via a relay in Spamhaus SBL-CSS * [107.175.23.4 listed in zen.spamhaus.org] * 2.5 URIBL_DBL_SPAM Contains a spam URL listed in the DBL blocklist * [URIs: coloringkidsus.com] * 1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist * [URIs: coloringkidsus.com] * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record * 0.0 DIET_1 BODY: Lose Weight Spam * 0.4 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image area * 1.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% * [score: 0.7650] * 0.0 HTML_MESSAGE BODY: HTML included in message * 1.9 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% * [cf: 100] * 0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/ * 0.0 T_REMOTE_IMAGE Message contains an external image X-Spam-Bayes: bayes=0.7650, N=176(88-0+3), ham=(), spam=(shark, Pill, craze) Any ideas what I'm doing wrong? Thanks, Andy -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: From name containing a spoofed email address
On Thu, 18 Jan 2018, RW wrote: I think the hard part is handling IDNs, e.g. "=?UTF-8?B?Zm9vQGLDvGNoZXIuY29t?=" <f...@xn--bcher-kva.com> the display name should decode to the UTF-8 byte sequence for foo@bücher.com, but I presume the address would be left as the ASCII IDN. In the short term it's probably best to avoid matching on IDNs, but that does allow the use of homographs in spoofing ASCII domains. Yeah, that occured to me, and I decided to set that problem aside for now (probably someone more familiar with the issues should address it). BTW it's best to only match on the organizational domain, to avoid FPs on the likes of: Do you (or anyone, for that matter) have samples of emails like this that they could share for me to test against? -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: From name containing a spoofed email address
I started working on this, and quickly realized the hard part is determining/parsing the domain out of the From:name variable. Is there any existing code in SA that "recognizes" email addresses that can be called and/or re-used? On Wed, 17 Jan 2018, David Jones wrote: Would a plugin need to be created (or an existing one enhanced) to be able to detect this type of spoofed From header? From: "h...@hulumail.com !" <lany...@hotmail.com> https://pastebin.com/vVhGjC8H Does anyone else think this would be a good idea to make a rule that at least checks both the From:name and From:addr to see if there is an email address in the From:name and if the domain is different add some points? We are seeing more and more of this now that SPF, DKIM, and DMARC are making it harder to spoof common/major brands that have properly implemented some or all of them. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: From name containing a spoofed email address
I swear I came across a rule like this just the other day, but now I can't find it, which is probably a sign of faulty memory. In any case, the existing HeaderEval Plugin seems like a good place for this (it already does a check for EnvFrom and From domain mismatches). On Wed, 17 Jan 2018, David Jones wrote: Would a plugin need to be created (or an existing one enhanced) to be able to detect this type of spoofed From header? From: "h...@hulumail.com !" <lany...@hotmail.com> https://pastebin.com/vVhGjC8H Does anyone else think this would be a good idea to make a rule that at least checks both the From:name and From:addr to see if there is an email address in the From:name and if the domain is different add some points? We are seeing more and more of this now that SPF, DKIM, and DMARC are making it harder to spoof common/major brands that have properly implemented some or all of them. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Using fuzzy patterns
On Sat, 13 Jan 2018, Alex wrote: From: "F*e dE x" <fedexdispatchl...@speedpost.com> That address hardly resembles "Fed Ex", but how general of a rule can we create and still catch variations such as this? I thought something like this would work: headerFUZZY_FEDEX From =~ /(?!f.?e.?d.{0,3}e.?x).?.?.{0,3}.?/i To fully debug this, I think we need to know the replace_tag definitions you've set for these characters. That said, the first thing I notice is that the negative lookahead pattern matches your From header (twice, I think). This means that no matter what follows, this rule will not trigger. I suspect you want the negative lookahead to be more strictly correct, like "(?!fed ex)". You may also want to use "From:name =~" to limit the search to the non-address portion of the header. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Mailsploit
Note that after enabling KAM.cf, you'll want to watch more closely for false positives and possibly adjust scores as necessary. I think it's a great addition to the default rules, but it's primarily tuned to Kevin's environment (though he's open to improvements) and some of the rules/scores may not be appropriate for your environment. On Wed, 13 Dec 2017, Groach wrote: On 13/12/2017 20:48, Antony Stone wrote: On Wednesday 13 December 2017 at 21:41:04, Groach wrote: Is there any suggestions on a rule or procedure to implement that will help defend against the MAILSPLOIT type of spoofing? See https://marc.info/?l=spamassassin-users=151265708616825=2 and follow - ups? Thanks for that. I followed the thread you mentioned: I see that 'Kevin' says he has a rule in his personal KAM.cf and that there isnt anything published in base spamassassin scores. (Or am I missing something)? So how does one: a, obtain KAM.cf or b, decipher the mechanism to which Kevin uses in order we can apply similar in our own local.cf (All help appreciated) -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Mailsploit and RFC1342 and spoofed From
I managed to run a test about an hour ago on my first try, so maybe AWS upped his limit or demand has slowed down. Or maybe I just got lucky... YMMV On Thu, 7 Dec 2017, Kevin A. McGrail wrote: The tests are not working because of aws send limits. Unlikely to work. Regards, KAM On December 7, 2017 1:57:41 PM EST, Pedro David Marco <pedrod_ma...@yahoo.com> wrote: You can get tests here... https://www.mailsploit.com/index#demo --- PedroD. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: spamd Will Not Create unix:socket
tmpfiles.d became a thing when /run became a temporary filesystem, so it is relatively new. And most of the time packages install the necessary files in /usr/lib/tmpfiles.d, so admins may have never run up against this issue since it became a thing. As John says, you can file a bug report with RedHat. Technically that directory is only necessary when you're running spamd on a socket, so they may not consider it a bug. For what it's worth, there's no tmpfiles.d entry on my Ubuntu or Gentoo systems (Gentoo does its thing in the init script). I wonder if it's worth adding a note to the wiki, or even the --socketpath section of the spamd man-page? On Mon, 27 Nov 2017, John Hardin wrote: On Mon, 27 Nov 2017, Colony.three wrote: > I suspect you need an entry in /etc/tmpfiles.d so that directory gets > created at boot time. Indeed there is no tmpfiles in the spamassassin package. (I've never heard of this in 22 years) How can this be, in the 21st Century? As I'd suspected, everyone is settling for the tcp:port. What should I do about this, if anything? Fix it just for myself, or let someone else know? Report it to the RedHat bugzilla. The SA team doesn't handle distro-specific packaging issues. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: spamd Will Not Create unix:socket
I suspect you need an entry in /etc/tmpfiles.d so that directory gets created at boot time. Google tmpfiles.d or see this redhat blog page: https://developers.redhat.com/blog/2016/09/20/managing-temporary-files-with-systemd-tmpfiles-on-rhel7/ On Mon, 27 Nov 2017, Colony.three wrote: I have fought with this for days, and finally had to hotwire it. But I'd like to understand what's going on. RHEL7 with spamassassin 3.4.0 and spamass-milter-postfix 0.4.0. /etc/sysconfig/spamassassin SPAMDOPTIONS="--daemonize --create-prefs --max-children=5 --username=spamd --groupname=spamd --socketpath=/run/spamassassin/spamd.sock --socketowner=spamd --socketgroup=spamd --socketmode=660 --ipv4-only" spamassassin.service: [Unit] Description=Spamassassin daemon After=syslog.target network.target PartOf=spamassassin-update.service [Service] Type=forking PIDFile=/run/spamd.pid EnvironmentFile=-/etc/sysconfig/spamassassin ExecStartPre=-/sbin/portrelease spamd ExecStart=/usr/bin/spamd --pidfile /run/spamd.pid $SPAMDOPTIONS StandardOutput=syslog StandardError=syslog Restart=always [Install] WantedBy=multi-user.target It simply would not create /run/spamassassin directory on boot. It is supposed to create it automatically like clamd does, since /run is wiped at each boot. To make it work I finally had to add: ExecStartPre=/usr/bin/mkdir /run/spamassassin ExecStartPre=/bin/chown -R spamd:spamd /run/spamassassin SELinux is set to Permissive, so that's not it. Any ideas? -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Ends with string
On Fri, 15 Sep 2017, Robert Boyl wrote: uri __KAM_SHORT/(\/|^|\b)(?:j\.mp|bit\.ly|goo\.gl|x\.co|t\.co|t\.cn|tinyurl\.com|hop\.kz|u rla\.ru|fw\.to)(\/|$|\b)/i Seems a bit complicated. It would be to make this rule check that suffixes are at the end of URI. uri __TEST_URLS /\b(\.vn|\.pl|\.my|\.lu|\.vn|\.ar)\b/i I believe this does it, correct? uri __TEST_URLS /\b(\.vn$|\.pl$|\.my$|\.lu$|\.vn$|\.ar$)\b/i As Paul said, if you're just looking at uris, the enlist_uri might be the better way to go. And it has the advantage that you don't have to use (some might say abuse) regular expressions. I believe URIs as collected for the uri tests consist of more than just the server part of the URI, but maybe I'm wrong (or maybe the list includes the server part only as well as the full URI). If I'm correct, then using the "$" will not work where URIs have a local part and might not work where there's only a trailing "/". In the case where you're only looking at the TLD, you don't have to worry about the front word boundary because you're explicitly anchoring the front of the match with the "\." part. At the end, you need to make sure that you're not allowing characters that would indicate the server part of the URI continues past your intended match (to avoid things like matching "blah.com" when you're really trying to match ".co"). In my estimation, the characters that might indicate continuation of the URI are letters, numbers, underscores, hyphens, and the literal ".". So, my rule for just matching TLDs looks like: uri __TEST_URLS /\.(vn|pl|my|lu|vn|ar)\b[^\.-]/i The "\b" part excludes the letters, numbers and underscore because those wouldn't be a word boundary. The "[^\.-]" part excludes the hyphen and literal "." from being on the right side of that word boundary. And now that I'm looking at it, I'm wondering if it would match a URI like "https://legit.domain.com/great.beer/; ("beer" being one of the TLDs my rule contains). Like I said, the enlist_uri method might be worth it just to avoid regular expressions. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Ends with string
On Fri, 15 Sep 2017, Paul Stead wrote: Something along the following still seems the easiest to read approach to me enlist_uri_host (BADTLDS) vn enlist_uri_host (BADTLDS) pl enlist_uri_host (BADTLDS) my enlist_uri_host (BADTLDS) lu enlist_uri_host (BADTLDS) ar header __TEST_URLS eval:check_uri_host_listed('BADTLDS') If you're only looking at uris, it probably is (though I wonder a little about processing time between a long list of such entries and a single (if also long) regular expression). I have rules for "bad" tlds that look in headers as well (Received, From, Env_From being the main ones), so these wouldn't help with that. If there's something similar for those cases, I'd love to know about it. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Ends with string
If I recall correctly (and it's been a while), I was seeing false positives where t.co was matching t.com (or something like that) so I was only paying attention to the need to not allow an alpha-num. Short-sighted, I know (and I might have forgotten that \b isn't a character match). The regex I use to anchor tlds these days (and please tell me if this doesn't work the way I intend) looks like: uri NEWTLD_URI /\.(accountant|beer|bid|..|win|work|xyz)\b[^\.-]/i I have slightly different regexes to match email addresses or server names in headers, but they all basically express the rule "I need to see a word boundary here, but certain non-word characters don't count because it implies the domain name may continue in the given context" On Fri, 8 Sep 2017, RW wrote: On Fri, 8 Sep 2017 13:03:57 -0400 Kevin A. McGrail wrote: On 9/8/2017 12:24 PM, Robert Boyl wrote: Hello, everyone! Is there a way to create a Spamassassin rule that checks for a certain URL suffix such as .ru but makes sure it has to be at the end of the URI? Ends with string. Thanks! Rob Yes, it's called an anchor and Shane Williams a long time ago gave me some advice on that I used in this rule: uri __KAM_SHORT /(\/|^|\b)(?:j\.mp|bit\.ly|goo\.gl|x\.co|t\.co|t\.cn|tinyurl\.com|hop\.kz|urla\.ru|fw\.to)(\/|$|\b)/i That doesn't look right, at least not in the context of the OP's question. In (\/|$|\b) the \b seems superfluous as it will match a boundary between a letter and a '.' so the rule will for example match goo.gl.example.com -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: tflags
Apologies, I should have used the phrase "score set" rather than ruleset. The "score" section of Mail::SpamAssassin::Conf talks about it briefly, as does the this wiki page: https://wiki.apache.org/spamassassin/WritingRules On Thu, 3 Aug 2017, Ian Zimmerman wrote: On 2017-08-03 10:38, sha...@shanew.net wrote: The most common ones that I make use of are "multiple" and "maxhits" in order to allow a rule to be scored for each time it hits, but to stop counting after some threshold. I also use the "net" tflag so that RBL checks only run when a net-based ruleset is loaded. Where is the concept of "ruleset" in general documented, and in particular what makes it "net-based"? Not in Mail::SpamAssassin::Conf. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: tflags
The Mail::SpamAssassin::Conf man page includes a section on tflags and their various functions, but generally speaking tflags allow you to alter the way in which a rule is processed. The most common ones that I make use of are "multiple" and "maxhits" in order to allow a rule to be scored for each time it hits, but to stop counting after some threshold. I also use the "net" tflag so that RBL checks only run when a net-based ruleset is loaded. As an example, I have various uri rules to detect emails from questionable journals. Since it's possible that someone might be having a legitimate mail conversation about that journal and share the URL to their site, I want to count how many times the URL appears, so I add a "multiple" tflag for the rule. More appearances means the mail is more likely to be advertising the journal or soliciting articles. On the other hand, once it's been seen eight time (or 15 or whatever), there's a diminishing return on that rule's ability to tell me anything more about the email, so I use "maxhits=8" to keep it from continuing to look for the uri (and to stop scoring additional points). On Thu, 3 Aug 2017, John Schmerold wrote: I don't understand the purpose of tflags. Where is this parameter explained? -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Feature idea: Expiring rules
On Tue, 13 Jun 2017, Dianne Skoll wrote: Hi, Something I and possibly others might find useful would be rules that expire. Quite often, we might make some very specific rules to handle a particular spam run and they lose their effectiveness pretty quickly. I would love this for private rules, especially if it could be applied to blacklist (or whitelist, I suppose) entries. We regularly blacklist specific addresses when they've obviously fallen victim to some form of compromise. If I could set those to expire rather than add an annotation that I have to manually remember (or more likely forget) to remove later, it would be fantastic. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Weird Spamassassin startup behaviour on Ubuntu 16.10
only on a cold start of the system? Is it possible to configure a SA starup dependency on the network being up? -- Public key #7BBC68D9 at | Shane Williams http://pgp.mit.edu/ | System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Weird Spamassassin startup behaviour on Ubuntu 16.10
I recently set up an email server on Ubuntu 14.10 and kept being frustrated that on boot various filter software and related milters were regularly starting after sendmail, sometimes by as much as five minutes. We don't reboot that server very often, so it took a while to test various fixes, but in the end I added the following lines to the INIT INFO section of various milters (it's really only the first one that matters for startup): # X-Start-Before:sendmail # X-Stop-After: sendmail If postfix uses an /etc/init.d script like sendmail does on 14.10, check to see what the "Provides:" part of the INIT INFO is (probably postfix), and add an X-Start-Before line with tha value to the spamassassin init script. Or, if you just want to make sure that SA starts before monit, use whatever the "Provides:" is set to in the monit init script. If you have a mixture of SysV (regular) and upstart script, things get more complicated (unless 16.10 introduces functionality to make dependencies interoperable that doesn't exist in 14.10). On Tue, 6 Dec 2016, Michael Heuberger wrote: Hi David I dont know. Not sure how I can find this out whether it does some DNS/network stuff. In my other response to John you can see that it takes about 5.69 sec to start spamassassin. And no idea how to configure a SA startup dependency on the network being up. And shouldn't that come along with the package when installed via apt-get? - Michael On 6/12/16 11:47, David B Funk wrote: Could it be some kind if interaction with other system services startup? (in particular this feels like a network timeout issue). One of the things SA does during its startup process is check to see if DNS/network stuff is available. If the system hasn't yet brought up the network stack when SA starts, it may hang waiting for the network to stabilize. On a running system, if you stop/restart SA do you see the same delay or is it only on a cold start of the system? Is it possible to configure a SA starup dependency on the network being up? -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Assistance needed
On Tue, 18 Oct 2016, Kris Deugau wrote: I saved the message and dug up a copy of the FB_CIALIS_LEO3 rule RW mentioned; I note that as he said it's not part of the current live rules, and in fact checking further it looks like it's been commented out entirely in the rules development sandbox, so it's not even considered for testing. Running the saved message through SA with the rule pasted into a temporary rules definition file, I found: dbg: rules: ran body rule FB_CIALIS_LEO3 ==> got hit: "Calm All is" (from "NW1826 All is Calm All is Bright") which is probably a good example of why this rule is no longer present. Ideally, I'd say you should ask GetResponse to remove that rule entirely. If they won't do that, it should at least be scored _way_ lower (less than 1 for sure, but more like 0.2 or 0.1). If they won't (or can't) do that, then you may want to tell them that you'll be looking for a new provider, because that tells me they really have don't know what they're doing (that they couldn't figure this out for you isn't impressive either). -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: RCVD_IN_SORBS_SPAM and google IPs
On Thu, 8 Sep 2016, RW wrote: On Thu, 8 Sep 2016 15:53:00 -0500 (CDT) Shane Williams wrote: Hey all, I'm seeing google IP ranges hit the RCVD_IN_SORBS_SPAM rule, and in digging deeper, I realize that there are zero hits on this rule for the two weeks prior to Aug. 31, and now I'm seeing it thousands of times per week (not just against google IPs). Was this rule added/changed/re-scored in a recent sa-update? It was commented out for a long time because it had a delisting fee, but was recently re-enabled. https://bz.apache.org/SpamAssassin/show_bug.cgi?id=2221#c16 Thanks for that link, as it clarifies why it just started scoring again. This is the first time (at least in a long time) that I've looked at ruleqa, but it seems like http://ruleqa.spamassassin.org/20160904-r1759058-n/RCVD_IN_SORBS_SPAM/detail would indicate that it should be scored at zero (since its S/O is nearly .5), but instead it's 2.399, which is a lot to add for a rule that's been napping for the last 13 years. Perhaps more to the root issue, I'm concerned that it looks like listing on SORBS is based on total volume rather than percentage. Their summary page for the IP I checked (209.85.218.48), seems to say that there have been 28 "recent" spam entries seen from this address, but I would imagine this is a miniscule percentage off all email sent from that address. If that's all it takes to get listed, I'm kind of surprised that all of google's IPs aren't listed. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: I have some bad news
I'm finding this discussion interesting, because I've been trying to wrap my head around the theoretical basis of this system. As such, I've noticed that several questions have been asked now that are explained in the document Marc initially pointed to (http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter). Given Marc's situation, it seems reasonable to read that document before asking too many questions. As a way to (maybe) save Marc some time, test my own knowledge and perhaps help move the conversation forward, I'm going to summarize the questions I've seen so far and, as much as possible, the answers to those questions (and Marc, correct me if I'm getting anything wrong here): - How do you classify an email that has tokens from both the ham and spam set? Whichever set (out of "only found in ham" and "only found in spam") is larger (or "better") determines the final classification. - What length are the tokens? Marc's examples use multiple length tokens, capturing everything between 1 and 4 "words", but I suspect the exact maximum token length might be adjustable. - What happens when spammers use "hammy" text to avoid detection? I don't see this directly addressed, but I would guess there are several things that mitigate against this. Multi-word tokens prevent the truly random word salad attempts at poisoning, and probably help with "cuttings" from other texts because the transition from one cutting to the next probably doesn't appear in ham, leaving the "spam-only" aspects of the mail to push it towards a spam classification. The unlearning and expiration of fingerprints would mean that such cuttings would have to appear repeatedly over time in legitimate mail to tip an email toward a ham classification. - Will bad spellers (or typists) be seen as spammier? Again, I don't see this addressed specifically, but I don't think so, unless they are such tremendously bad spellers that nearly every word is misspelled. To take the "let's get some lunch" example, even if I accidentally mis-type "some" as "som", I still have other tokens to compare against, and the tokens "som", "get som", "som lunch", "let's get som", etc. would have to have appeared in spam (and only spam) to pull the classification toward spam. So I'd say the occasional typo or misspelling would come up neutral. - What happens to messages that have a lot of neutral tokens? Now I'm really speculating, but unless every token is neutral, there's still something to decide on, though it does seem that detection becomes less reliable as the number of non-neutral tokens appraches zero. A similar question that I thought of is what happens to messages where the the final sets "only found in spam" and "only found in ham" are nearly (or exactly) the same size. If you're using this filter as part of SA scoring, the answer would seem to be that you have an appropriately small score for "undetermined" (like bogofilter does), but if it's acting as a separate filter, I don't know. On Wed, 17 Aug 2016, Antony Stone wrote: On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote: What I'm doing is looking for fingerprints in email that intersect HAM and not in SPAM - which would be a HAM result. If it matches SPAM and does NOT match HAM - then it's SPAM. The magic is in the NOT matching on the other side. So if I say to you, "Let's get some lunch" that's ham because spammers never say that, but normal people do. So the way to test what "spammers never say" is to store what they do say and see if it's NOT in the list. (Thus the infinite set) What length are the tokens you store in the list? Single words (so the above lunch example would contain 4 tokens)? Entire phrases (so the above would be just 1 token)? Also how do you deal with spam which contains random cuttings from legitimate texts (generally along with a graphic attachment and/or a URL to get aross the "real" message)? Similarly, there's only so many ways to misspell viagra, and good email wouldn't have it spelled wrong. Does this mean that people with bad spelling will more likely get classified as spam, because they do not match the 'ham' group very well? Also, what happens to mail contains lots of tokens which match neither set (for example, perfectly legitimate email which happens to be in a language the system hasn't been trained with)? Antony. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Is greylisting effective? (was Re: Using Postfix and Postgrey - not scanning after hold)
On Sun, 31 Jul 2016, Robert Schetterer wrote: Greylisting was invented as an idea against bots. Its based on the idea that bots "fire and forget" when they see a tmp error and dont get back. But thats historic, bots are recoded, better antibot tecs were invented. The only problem now is people still believe in historic stuff. This argument ignores two important facts. First, even if 98% of bots and viruses (and that number is pure conjecture on my part) are now smart enough to retry, that doesn't change that greylisting is a just about the lowest "cost" way of preventing the ones that aren't smart enough (or aren't designed to retry because they want to push the most amount of junk at the lowest-hanging fruit). Second, the ability of a bot, virus, server or any other spam source to retry delivery after a temp failure is not the only "weakness" greylisting takes advantage of. A spam source might not get past my greylist for any number of reasons, including the classic case of poor coding/design, but also: - It is detected and blocked (or taken offline) by the source network before its greylist period is up - It make use of a compromised account, and that account is disabled or secured before its greylist period is up - It is part of a distributed botnet, so subsequent attempts come from a different IP/network - It sends a high volume of spam, so it doesn't come back around to retry again until after its entry has been removed, requiring a whole new greylisting period Others could probably add to that list, but that's just off the top of my head. But, even if a spam source retries and successfully makes it past the greylisting, the greylisting still provides potential benefits, like: - While it was waiting to retry, its IP has been added to BLs, which my other filters will score appropriately - While it was waiting to retry, the phishing URL in it has been reported and taken down (or the URL shortener link it used has been removed) - While it was waiting to retry, the virus it carries has been identified and pushed out to my virus definitions - While it was waiting to retry, its registered domain has been removed - While it was waiting to retry, others who received the spam have reported it to services like Razor and DCC, which other filters will act on - If it has to keep retrying addresses to my server, I'm consuming resources (however minimally) that could be used to send their junk to others Again, I'm sure others could add more based on their experiences. I'm not saying greylisting is without problems, that it just works out of the box (initial and ongoing configuration is critical), or that everyone should be using it, but there's a lot more going on here than just outwitting poorly written bots. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Using Postfix and Postgrey - not scanning after hold
On the off chance that your decision to turn off greylisting was related to Matus Uhlar's message that concludes with: "if you run SA, there's no point in running greylisting anymore." That could be interpreted to read "if you run SA at all, there's no need for greylisting at all", but I don't think that's what he meant. I think the correct interpretation (at least the one that makes sense to me) is "during processing of mail, it makes no sense to run greylisting after SA does its thing". I would generalize that even more to say that greylisting should come before any other content-based filtering (virus scanners, defanging, etc.). On the other hand, you may have disabled greylisting because you're tired of futzing with it and just want your mail to work right again, in which case, nevermind. On Thu, 28 Jul 2016, Ryan Coleman wrote: Doesn’t matter. I killed it. It’s gone. I have eliminated postgrey from the installation and things are back to “normal” On Jul 28, 2016, at 12:53 PM, Bill Cole <sausers-20150...@billmail.scconsult.com> wrote: On 19 Jul 2016, at 15:50, Ryan Coleman wrote: strange... how do you run spamassassin from postfix? In master.cf like everyone else… Um, not so much... smtp inet n - - - - smtpd -o content_filter=spamassassin [...] spamassassin unix - n n - - pipe user=spamd argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} FWIW, that's probably roughly the 5th most common way to integrate Postfix and SpamAssassin. I'd guess that amavisd-new as a before-queue filter is 1st, followed by amavisd-new as an after-queue filter, spamass-milter, and MIMEDefang (also a milter). There are pros and cons for every approach but a 'pipe' content_filter using spamc's '-e' option probably has the fewest "pros" and has the problems described at https://wiki.apache.org/spamassassin/IntegratedSpamdInPostfix. Also, you probably want 'flags=Rq' in the pipe arguments and there is no '-f' argument documented for spamc, so that should probably go unless you know something the spamc man page doesn't... A possible cause of your trouble could be spamc not knowing the correct way to talk to spamd. In that case, the '-e' option causes spamc to bypass spamd and just pipe its input to the given command, exiting with a successful return code unless that command fails. This seems to match what you're describing. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Bayes filter marking everything as ham
On Wed, 1 Jun 2016, Reindl Harald wrote: Am 01.06.2016 um 02:32 schrieb sha...@shanew.net: Kind of a shot in the dark, but are you sure everyone is promptly moving their spam out of the inboxes? I worry about automated learning like this autolearning has nothing to do with inboxes http://www.maiamailguard.com/maia/wiki/sa-autolearn "autolearn=ham, autolearnscore=-0.001" "autolearnscore=-0.001" must be a bad joke in the config hence it's dangerous, unpredictable and will sooner or later ruin your bayes without having a corpus where you could kill bad samples, move them from ham to spam or the other direction and just rebuild the bayes-db from scratch based on the fixed corpus, so you will end in wipe it and start from scratch (and need to take care of the minimum amount of training messages until bayes get enabled at all again) I wasn't referring to SA's autolearning feature, which I agree can suffer from feedback loops if your thresholds are set wrong (I set my ham threshold to -2 for this reason). That's why I used the phrase "automated learning" to distunguish OP's "automated" cron jobs that calls sa-learn. In retrospect, I should have used words that more clearly distinguished it from the autolearning feature. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Bayes filter marking everything as ham
on # shortcircuit SUBJECT_IN_WHITELIST on # shortcircuit USER_IN_BLACKLIST on # shortcircuit USER_IN_BLACKLIST_TO on # shortcircuit SUBJECT_IN_BLACKLIST on # shortcircuit ALL_TRUSTED on # shortcircuit BAYES_99 spam # shortcircuit BAYES_00 ham -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: spamass-milter: orphaned?
I hope he means a mechanism by which spamass-milter will allow specified (in the config, not in the code) SA headers to actually get added when they pass through spamass-milter. The current behavior is that four(?) SA headers are kept, but everything else is discarded. I've wanted something like that for years, though not enough to actually ask for it ;-) On Thu, 26 May 2016, Andy Balholm wrote: ...some other headers to be pushed to mail SA generates What do you mean? Andy -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Problem with SPF plugin and MX2
On Wed, 25 May 2016, Dianne Skoll wrote: On Wed, 25 May 2016 13:05:57 +0200 Support SimpleRezo <simpler...@gmail.com> wrote: We are expecting a problem when emails are coming from our MX2 with the SPF plugin, because the SPF test is made on the last "Received" IP and not the first one (as we can expect for a SPF test). Does someone has already notice this? Can this be fixed by configuration? Yes. Don't run a backup MX machine that relays to a primary machine that does spam-scanning. It's more trouble than it's worth, particularly as spammers sometimes specifically pick the worst MX record rather than the best. It also seems problematic for your backup MX to accept an email only for your primary to potentially reject said email later on. At that point you can no longer reject the mail, leaving the problematic (some might say wrong) choices to either bounce it or drop it (or deliver it, I suppose, if you're only using SA to provide info to end users). Running the same SA setup on your backup would seem to minimize that risk, but not totally eliminate it, since network-based tests might return different results given sufficient time until your backup finally transfers to the primary. So, for those with more experience, what is the preferred way to run a backup MX (or two or three, etc.) without losing or breaking the benefit of spam filtering? -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Anyone else just blocking the ".top" TLD?
On Mon, 28 Mar 2016, Vincent Fox wrote: On 03/27/2016 06:58 PM, Thomas Cameron wrote: Has anyone actually gotten a single legit message from that domain? Never. WTF was ICANN thinking? I occasionally go through the lists of abused gTLD here: http://www.surbl.org/tld/ Thanks for that link. If there were a nice source for how many total domains were in each TLD you could calculate a useful signal to noise ratio. I was recently surprised when I had a user complain that a known correspondent with a .xyz TLD was being blocked by our filter. I added a whitelist entry in the user's settings, but also explained that the domain was _the_ primary reason it was blocked because all we ever see from it is spam. So apparently there are some legit (if clueless) users of some of these TLDs. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: "Received" headers for rules?
On Mon, 26 Oct 2015, RW wrote: On Mon, 26 Oct 2015 11:37:58 -0500 (CDT) Shane Williams wrote: I've created a header rule with "Received =~ /blahblahblah/", and I just got a false positive on it when none of the Received headers in the mail actually match. I had a similar situation last week, and (I think) found in the SA code where it will treat ezmlm headers as if they were Received headers (which explained why it hit). I had a quick look at the code and the only mention of ezmlm was related to gated_through_received_hdr_remover() which looks for signs that the email passed through something that might have stripped headers. It tests the received headers, but doesn't modify them. In my sleuthing, I found the part of Received.pm that looks for "received" headers that don't actually start with "Received:" and adds them on to the @hdrs array. I thought I'd tracked down that one of those alternate "received" headers was the ezmlm, which is related to the email's path through various systems, so it made sense. Unfortunately, with a weekend between when I looked at it and now, I no longer see what led me to think that, nor can I remember which email started my search, so it seems likely that I came to the wrong conclusion. Instead, I think what was throwing me off is the fact that the envelope-from gets checked as part of the Received header it appears in, but then sendmail tears that out and puts it in the Return-Path: header. Add the fact that I'm running SA from a milter, and basically I had no way to know exactly what the email looked like at the point SA was analyzing it. John Hardin's __ALL_RECEIVED rule suggestion created the entries in the debug log that let me have a better idea what SA was actually seeing and running rules against. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: "Received" headers for rules?
On Mon, 26 Oct 2015, Reindl Harald wrote: Am 26.10.2015 um 17:37 schrieb Shane Williams: I've created a header rule with "Received =~ /blahblahblah/", and I just got a false positive on it when none of the Received headers in the mail actually match. I had a similar situation last week, and (I think) found in the SA code where it will treat ezmlm headers as if they were Received headers (which explained why it hit). Is there anywhere, other than the code, where I can see what all headers might be checked as part of a "Recevied =~" rule? what about posting details like the headers of said message and the whole rule instead hope for readers crystal balls? Because the question I asked is not specific to any one email or rule, but rather about how SpamAssassin processes mail (specifically headers) in general. Thanks to John Hardin for pointing out a way to determine (on a per email basis even) what headers count as Received. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains
On Tue, 20 Oct 2015, Rob McEwen wrote: On 10/20/2015 12:13 PM, sha...@shanew.net wrote: Unlike Larry (and others) I DO want to block the vast majority of the new tlds, because we see nothing but spam from them (and my users tend toward the more false-positives than false-negatives side of the spectrum). Rather than maintain a list of all the problematic tlds, I'd rather have a blanket block rule with the ability whitelist the handful that might be legit. Be careful about doing this for the long term. I think that spammer exploit new TLDs because they know that many anti-spam systems don't account for them correctly at first. (and/or maybe they are cheaper at first?). But in the longer term (years down the road).. they tend to move on to other ones, while the legit TLDs slowly increase. So this strategy can backfire in the long term. (but, of course, MMV... and some smaller hosters don't have to be as concerned about a few extra FPs) I totally agree. In fact, I assume anything I'm doing right now to successfully block spam could change tomorrow, much less months or years from now. For now, though, I'm seeing almost no legitimate traffic from most of the new ones (I'm thinking of the longer ones especially; .work, .ninja, .site, .science, etc.). I already have rules that score for these tlds in received or envelope from, but I'm getting tired of making the regular expression longer and longer (in two different places), and I know there's a smarter way. Whether I'm smart enough to implement that smarter way is another matter entirely. Is there an existing (relatively simple) plugin that behaves similarly that I could crib from? -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains
I've got 3.4.1 installed and sa-update runs regularly. Unlike Larry (and others) I DO want to block the vast majority of the new tlds, because we see nothing but spam from them (and my users tend toward the more false-positives than false-negatives side of the spectrum). Rather than maintain a list of all the problematic tlds, I'd rather have a blanket block rule with the ability whitelist the handful that might be legit. Is anyone doing anything like this (perhaps as a plugin)? On Tue, 20 Oct 2015, Kevin A. McGrail wrote: If you have 3.4.1 and use sa-update then we add new tlds to a rule file that is then parsed. This does not block those tlds. It let's the engine recognize the urls for further rules. If you have a tld that is missed and you are using 3.4.1 with sa-update, let us know. Regards, KAM On October 14, 2015 3:37:58 PM PDT, sha...@shanew.net wrote: On Tue, 13 Oct 2015, Kevin A. McGrail wrote: At the end of the day, if you are having problems with new TLDs, ONE soluti on is to use something that uses SA 3.4.1 and has sa-update configured so you get updates with said new TLDs. I think maybe people are confused about how exactly this change helps them get rid of all the spam that's coming from the "new" TLDs. So, in other words, having just updated to 3.4.1, how does one go from having a list of all the new TLDs that can now be nicely maintained with sa-update to getting rules which actually score against the vast majority of the new TLDs (since most of them seem to be 99.99% spam)? I had created a local rule before moving to 3.4.1 that looks for new TLDs in the Received, From and EnvelopeFrom headers, but it was obvious that this wasn't going to scale well. Did the new system in 3.4.1 make this easier for me to do, or did it just make it possible for new TLDs to be handed off to RBLs and the like (not that that's not a major win)? Any elaboration (or a pointer to documentation (not the man page)) would be greatly appreciated. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains
On Tue, 13 Oct 2015, Kevin A. McGrail wrote: At the end of the day, if you are having problems with new TLDs, ONE solution is to use something that uses SA 3.4.1 and has sa-update configured so you get updates with said new TLDs. I think maybe people are confused about how exactly this change helps them get rid of all the spam that's coming from the "new" TLDs. So, in other words, having just updated to 3.4.1, how does one go from having a list of all the new TLDs that can now be nicely maintained with sa-update to getting rules which actually score against the vast majority of the new TLDs (since most of them seem to be 99.99% spam)? I had created a local rule before moving to 3.4.1 that looks for new TLDs in the Received, From and EnvelopeFrom headers, but it was obvious that this wasn't going to scale well. Did the new system in 3.4.1 make this easier for me to do, or did it just make it possible for new TLDs to be handed off to RBLs and the like (not that that's not a major win)? Any elaboration (or a pointer to documentation (not the man page)) would be greatly appreciated. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: DCC whitelisting
On Wed, 10 Jun 2015, John Hardin wrote: On Wed, 10 Jun 2015, Shane Williams wrote: Two examples that I know are legitimate senders, but get caught by DCC (and pyzor in some cases) and other rules that push them over the threshold are the SourceForge.net Project of the Month list and various Netflix emails to customers (New Arrivals or we just added a show you might like). In both those cases, the user part of the env_from changes, and as I understand it, the DCC Whitelist doesn't allow wildcards, so I can't have an entry that matches the server part. Maybe I could be using the substitute List-ID: syntax, but neither of those has List-ID as a specific header. Can you reliably identify those at the MTA level and tell the SA glue to skip them entirely? I probably could, but that also seems kludgy. DCC has a whitelisting capability, so why not use it? Am I misunderstading what DCC's whitelist is intended for? -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: FPs on RCVD_ILLEGAL_IP
On Tue, 21 Apr 2015, Dianne Skoll wrote: On Tue, 21 Apr 2015 16:56:48 +0200 Matus UHLAR - fantomas uh...@fantomas.sk wrote: what if Microsoft starts using other IP range tested by RCVD_ILLEGAL_IP? Then it deserves what it gets. Market forces are intended to penalize companies that do stupid things and if we interfere in those market forces, it will only encourage more stupid things. Or you could look at it this way: RCVD_ILLEGAL_IP was a really good spam indicator until Microsoft messed up, so by using those IPs Microsoft is helping spammers by forcing spam-fighters to reduce or abandon a pretty good rule. Should that sort of behavior be rewarded? I presume detecting forged Received headers was the point of this rule all along, so if we all toss this rule out the window (or adjust to exclude this edge case), aren't we potentially encouraging spammers to hide their true networks in the same way? It occurs to me that if MS are the only people who are doing this, a meta-rule could counteract the score in that specific case. If it gets used much beyond that by legitimate actors though, that's a whole other story. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: FPs on RCVD_ILLEGAL_IP
On Mon, 20 Apr 2015, Axb wrote: On 04/20/2015 08:04 PM, Dianne Skoll wrote: Hi, Not sure if this is still an issue in 3.4, but I'm seeing tons of FPs on RCVD_ILLEGAL_IP. Why? Because Microsoft (damn it to hell) has started using RESERVED IP ranges internally! Have a look: Received: from BLUPR10MB0835.namprd10.prod.outlook.com (0.163.216.13) by BLUPR10MB0835.namprd10.prod.outlook.com (0.163.216.13) with Microsoft SMTP Server (TLS) id 15.1.136.25; Mon, 20 Apr 2015 17:43:48 + Is anyone else seeing a sudden uptick in RCVD_ILLEGAL_IP FPs? There is an ongoing discussion about this with MS, thru backchannels. They're intentionally using the 0/8 to mask internal IPs. A very VERY bad choice and they have been advised that not only SA thinks it's a bad idea. Axb I'm so glad to finally see this mentioned on here, because I was starting to doubt my own gut reaction that putting invalid IP addresses in Received is all sorts of broken. We noticed it last week after someone from Microsoft mentioned getting a rejection from our server, and looking back the first examples I was able to find of this was from Apr. 6. Before that emails following similar paths through Microsoft servers weren't doing this. I'm also happy to know there's some discussion going on with MS. When I mentioned it to an MS friend of mine last week he didn't seem particularly shocked that the internal headers wouldn't comply with expectations, but he also seemed surprised that anyone was looking at such headers as a way of determining spam. Hopefully MS will take this seriously, but I'm not holding my breath. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Scroring and SPF questions
On Mon, 13 Apr 2015, John Hardin wrote: On Mon, 13 Apr 2015, Shane Williams wrote: Somewhat related questions: 1. If I alter a rule's score to 0 locally, my understanding is that the rule won't even be tested for. Does that also mean it won't count toward meta-rules? That depends on how it's used in the meta rule. If it's used as an exclusion, setting it to always false won't suppress the meta. Also: setting the score of a meta to zero won't suppress evaluation of its component rules. The specific case I'm wondering about is as part of an arithmetic expression, like (__RULE1__ + __RULE2__ + RULE3__) 2. If I set __RULE2__ to a score of 0, is it now impossible for the meta rule to trigger (since it can never get more than two points)? 2. Is there a way to create a local rule that uses the DKIM/SPF information such that I could match to other headers. In particular, I'm looking to either prevent (or at least counteract) the HEADER_FROM_DIFFERENT_DOMAINS rule when a mailing list is involved. So what I'm looking for is a way to test SPF/DKIM against the mailing list origination point rather than the sender's. Or perhaps I'm missing some smarter way to deal with these situations. Simple subrules combined in a neta having a negative score. There are already subrules for detecting mailing list headers and for detecting an invalid DKIM signature. Write a meta that combines those, and give it enough negative points to offset the positive score. Note, however, that mailing list headers are easy for spammers to forge. What I was getting at (but perhaps not describing well) was finding a way to compare the mailing list domain with DKIM or SPF in order to ensure that the mailing list at least arrives from the source we would expect. It doesn't exactly detect mailing list header forgery, but could take away a few points for the ones that can be verified. That said, there me be some reason this totally won't work, so feel free to tell me so. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Uptick in spam
Apologies if this is an overly obvious answer, but are you using any greylisting? This would (potentially) move your user away from the wavefront of a spam's distribution, and give it a better chance of triggering the network-based tests. On Fri, 27 Mar 2015, Amir Caspi wrote: This is my whole issue -- since my user appears to be very high up on the recipient list for all these spammers, and is therefore getting spams before the network checks are effective, how can we combat these new spams _before_ the network checks become effective? Thanks. --- Amir -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Which milter do you prefer?
I just wanted to report that, despite what the spamass-milter mailing list has to say, you can in fact hand spamass-milter an inet socket in the config and it will happily listen on the network. That'll teach me to not just try stuff. Also, thanks to everyone who had suggestions on specific milters as well as glue for multiple filters. I knew about many, but not all of them, so it's given me lots to investigate (and in some cases rediscover). On Fri, 13 Mar 2015, David B Funk wrote: On Fri, 13 Mar 2015, Shane Williams wrote: I've been reviewing the current landscape of anti-spam tools since I haven't set up a new system in a while, and one place I'm wondering what people are using is milters for spamassassin/spamc. It seems like spamass-milter is the default go-to for most people, but I'd really like one that can listen on an INET socket (and spamass-milter doesn't as far as I can tell, but please correct me if I'm wrong). Milter-spamc from SnertSoft looks promising, but it's not free, and a bit more complicated. smtp-vilter also looks interesting, but it does more than just SpamAssassin stuff, so might be overkill. And I suspect there are a bunch more out there (though a lot of these projects seem to have stalled or died over time). What are your favorite (not spamass-milter) options for plugging spamassassin into a milter? Looking at the source for spamass-milter it looks like they're taking the -p socket argument and passing it directly to smfi_setconn so you should be able to give an INET socket address if you use the correct syntax (see docs for smfi_setconn). 13 years ago I was doing a hunt similar to yours and came across miltrassassin from digitalanswers.org. It was not quite what I was looking for but closer than any of the others I found, so I took it and started developing. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Which milter do you prefer?
On Fri, 13 Mar 2015, David B Funk wrote: Looking at the source for spamass-milter it looks like they're taking the -p socket argument and passing it directly to smfi_setconn so you should be able to give an INET socket address if you use the correct syntax (see docs for smfi_setconn). The spamass-milter mailing list says you can't do this (and I don't think the post about it was _that_ old), but I should probably give it a try anyway. Worst thing that happens is that it doesn't work. -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Which milter do you prefer?
I just came across that in my searching yesterday, but hadn't had a chance to dig deeper. I had seen roundhouse, and a few other things here and there, but they all seemed lacking. After all, as others have mentioned, cloning your mail stream is not to be done lightly. On Fri, 13 Mar 2015, Ted Mittelstaedt wrote: All this, of course, after searching high and low for a milter, proxy, or some other contraption that would allow me to clone a mail stream to a totally separate server without disrupting the original stream (like port spanning or a network tap, but for SMTP), Need a better Search Engine, what you want is here: http://www.dv8.ro/Synonym/synonym.html Throw that Bing crap in the trash. ;-) Ted On 3/13/2015 3:35 PM, sha...@shanew.net wrote: Well, you don't have to try very hard to start a holy war around here ;-) Seriously, though, I wasn't thinking at the level of amavisd, mimedefang, or mailscanner. Those may come later, but the situation I'm in right this moment is that I'm taking over a very idiosyncratic mail environment, and I need to tune and monitor it's performance before switching over. Thus, the least disruptive option I see is to insert a milter for spamassassin in front of anything else, and then score / log messages without tampering with them in anyway, so they can continue through the milter chain as if it weren't even there (except for some slight delay). Spamass-milter does this, but it means running the milter on the existing system rather than just pointing sendmail to a remote milter. I may end up there anyhow, but I thought I'd ask first. All this, of course, after searching high and low for a milter, proxy, or some other contraption that would allow me to clone a mail stream to a totally separate server without disrupting the original stream (like port spanning or a network tap, but for SMTP), and finding nothing outside of alpha or beta to do that. If anyone knows of something like that, I'd be interested to hear about it as well. On Fri, 13 Mar 2015, Kevin A. McGrail wrote: On 3/13/2015 5:41 PM, Shane Williams wrote: What are your favorite (not spamass-milter) options for plugging spamassassin into a milter? Trying to start a holy-war on the list? ;-) +1 for MIMEDefang. Regards, KAM -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Which milter do you prefer?
Well, you don't have to try very hard to start a holy war around here ;-) Seriously, though, I wasn't thinking at the level of amavisd, mimedefang, or mailscanner. Those may come later, but the situation I'm in right this moment is that I'm taking over a very idiosyncratic mail environment, and I need to tune and monitor it's performance before switching over. Thus, the least disruptive option I see is to insert a milter for spamassassin in front of anything else, and then score / log messages without tampering with them in anyway, so they can continue through the milter chain as if it weren't even there (except for some slight delay). Spamass-milter does this, but it means running the milter on the existing system rather than just pointing sendmail to a remote milter. I may end up there anyhow, but I thought I'd ask first. All this, of course, after searching high and low for a milter, proxy, or some other contraption that would allow me to clone a mail stream to a totally separate server without disrupting the original stream (like port spanning or a network tap, but for SMTP), and finding nothing outside of alpha or beta to do that. If anyone knows of something like that, I'd be interested to hear about it as well. On Fri, 13 Mar 2015, Kevin A. McGrail wrote: On 3/13/2015 5:41 PM, Shane Williams wrote: What are your favorite (not spamass-milter) options for plugging spamassassin into a milter? Trying to start a holy-war on the list? ;-) +1 for MIMEDefang. Regards, KAM -- Public key #7BBC68D9 at| Shane Williams http://pgp.mit.edu/| System Admin - UT CompSci =--+--- All syllogisms contain three lines | sha...@shanew.net Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew