Re: Moving Spam to Junk Folder

2020-09-03 Thread shanew

On Thu, 3 Sep 2020, David B Funk wrote:


On Thu, 3 Sep 2020, bobby wrote:


 I am following this
 tutorial: 
https://www.linuxbabe.com/redhat/spamassassin-centos-rhel-block-email-spam.I
 followed the steps in "Move Spam
 into the Junk Folder".  When I send an email from a blacklisted e-mail
 address, I get a bounce e-mail from my e-mail server.  Here is what
 is in my spamass-milter file:
 EXTRA_FLAGS="-m -r 8 -R NO_SPAM -i 127.0.0.1 -g sa-milt --
 --max-size=512"
 I would prefer it to go into my Junk folder.  How can I make this happen?


Bobby,

You need to read the spamass-milter documentation to understand what those 
options are doing.


That "-r 8" tells spamass-milter to return a 'SMIFS_REJECT' status to postfix 
if the spam score is over 8. This causes postfix to refuse to accept the 
message at all (sort of like when somebody tries to send a message to a bogus 
recipient).


So if postfix never lets spam get in the front door it cannot be delivered to 
any kind of "Junk Folder"


You probably want either the -b or -B option, which allows you to
specify an address that tagged mail gets sent to.  It's particularly
useful in combination with the -r option so that you can get a sense
of what's being rejected outright.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Why the new changes need to be "depricated" forever

2020-07-22 Thread shanew

On Tue, 21 Jul 2020, Loren Wilton wrote:

You note that "gay" has a different meaning today. As far as I know, the 
words "black" and "white" were not systematically used to refer to skin 
colors before about 1963, when a movement was set afoot in the USA to replace 
"negro" with "black" and "caucasian" with "white".


As I mentioned in a post on July 14, black and white to refer to races
and skin color (and also red and yellow) gained traction at least as
far back as the European Enlightenment, when it was all the rage to
classify things, and most Enlightenment writers are explicitly racist
in their descriptions and classification.  But these terms are used
going back thousands of years as well.  My post from the 14th includes
several links you might find informative.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-14 Thread shanew
manuel Kant, while
dissagreeing with Buffon that someone could return to "normal" just by
moving to a different climate, agreed that "the Negroes, and in
general all the other species of men [are] naturally inferior to the
whites" (Hume) or that "the Negroes of Africa have by nature no
feeling that rises above the trifling"
https://books.google.com/books?id=eem1AQAAQBAJ=PA9=PA9#v=onepage=false


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-14 Thread shanew

I would argue that welcome is better than allow in many contexts,
including SpamAssassin.  After all, w.*list isn't just used to indicate
something is allowed, but to indicate that we actively want to receive
the email in question (by lowering its score).

You allow a maintenance worker into your apartment, but you welcome
a friend


On Tue, 14 Jul 2020, Kevin A. McGrail wrote:


Yeah, allow/deny is more logical but using them requires all acronyms to
change.  After some trial and error, we dialed in the changes to welcome and
block which also keeps other terminology like RBL, DNSBL, WLBL, etc.
consistent so there is less upheaval.
Regards,
KAM

--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Tue, Jul 14, 2020 at 10:08 AM Marc Roos  wrote:
   

  > I like the change from whitelist/blacklist to
  allowlist/blocklist
  because it is more descriptive.

  Allow/deny list sounds more logical.





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-10 Thread shanew

On Fri, 10 Jul 2020, Axb wrote:


On 7/10/20 8:31 PM, Bill Cole wrote:

 The SpamAssassin Project has a particular self-interest in attracting
 contributors from a diversity of cultures, because we are always at risk
 of mislabelling a pattern of letters or words as 'spammy' when in fact it
 is entirely normal in a cultural context other than those of the existing
 contributors to the project. C


From what I see, until now, only two ppl of the SpamAssasin project have 
supported this motion and intend to impose this quatsch to the rest of the 
world.

Voices against these changes have been politely ignored.


The danger of judging the world only by what is within your sight is
that your field of vision is limited, and there are any number of
explanations for why what you see is not representative of the whole.

Maybe those who agree feel no need to comment.  Maybe a lot of people
on either side of the issue want to avoid adding more noise to a list
that's about SpamAssassin.  Maybe a lot of people recognized this
wasn't a "motion" or a request for comment at all, but rather notice
of a change to code.  Or, as you yourself mention, maybe a lot of
people are just politely ignoring the negative voices.


Re: Rule for detecting two email addresses in From: field.

2019-10-04 Thread shanew

I use a plugin that detects mismatches, but tries to be a little smart
about what counts as a mismatch (like making sure the mismatch isn't
really just that one address is from a subdomain of the other's
domain, or someone carelessly using the "@" character in the name part
of the From header).

https://github.com/enkidushane/sa-frommismatch



On Fri, 4 Oct 2019, Philip wrote:


Morning List,

Lately I'm getting a bunch of emails that are showing up with two email 
addresses in the From: field.


From: "Persons Name " 

When you look in your mail client (Outlook, Thunderbird) it's showing only 
"Persons Name "


Is there a way I can mark From: that has 2 email addresses in it as spam? 
Pro's Cons?


Phil




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Whitelist rcvd IP

2019-06-12 Thread shanew

I believe the "whitelist_from_rcvd" option, which is now in
SpamAssassin core, functions the same as the old
Mail::SpamAssassin::Plugin::WhitelistRcvdIP module, though with a
slightly different syntax.  If you really want to use it as a blanket
whitelist for a certain IP address or range, the first parameter can
be specified as *@*.  Whether that's advisable, I'll leave to others
to comment.

Also, the old WhitelistRcvdIP plugin is about 12 years old, and I see
no development since then, so I'd be reluctant to use it.



On Wed, 12 Jun 2019, Emanuel Gonzalez wrote:


Hello,

I have the need to mark certain IP addresses as secure, only for receiving
mail, but I can not find information about it.

In a publication they advise using the module called Mail :: SpamAssassin ::
Plugin :: WhitelistRcvdIP but I can not find it.

Any ideas.?

Regards,




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: New URL shortener

2019-06-07 Thread shanew

I knew that URL looked familiar.  I added it and a few others last
year, and was going to add the one mentioned earlier in the week but
got distracted by how to get my fork in sync with Steve's.

That said, I think it's tough for even a handful of people to keep up
with all the new shorteners.


On Thu, 6 Jun 2019, Amir Caspi wrote:


On Jun 6, 2019, at 9:03 PM, Kenneth Porter  wrote:
  I'm seeing a lot of fake DHL delivery notices using the
  shortener smarturl.it. I suggest adding it to __URL_SHORTENER.


FWIW there is a long list of url shorteners as part of the DecodeShortURLs
plugin (sadly, no longer 
maintained),here:https://github.com/smfreegard/DecodeShortURLs/blob/master/DecodeShortU
RLs.cf

It includes the one you just mentioned as well as a whole bunch of others.

Kevin, perhaps DecodeShortURLs should become part of the default SA
distribution?  It works really well in general, with the exception of a few
outstanding bugs that are fairly minor and likely easily fixable by someone
who knows what they're doing.  The original owner no longer maintains this,
having moved to rspamd.

Cheers.

--- Amir






--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Having trouble getting Spamassassin to work on Ubuntu Server 18.10

2019-02-11 Thread shanew

I'd suggest running spamassassin directly from the command line with
the -D and --lint options to see if that provides more detail about
what exactly is going wrong.  This is going to give you a lot of
output so you'll probably want to run it like:

spamassassin -D --lint 2>&1 | less


On Sun, 10 Feb 2019, Ken Wright wrote:


I've been trying to set up an email server and I want to use
Spamassassin to prevent it from becoming Spam Central.  I've installed
SA and spamass-milter, but when I try to restart it after customizing
the config files, I get this:

Job for spamassassin.service failed because the control process exited
with error code.
See "systemctl status spamassassin.service" and "journalctl -xe" for
details.

So I checked journalctl and got this:

-- Unit spamassassin.service has begun starting up.
Feb 08 02:19:31 grace spamd[6289]: logger: removing stderr method
Feb 08 02:19:32 grace spamd[6314]: Timeout::_run: check: no loaded
plugin implements 'check_main': cannot scan!
Feb 08 02:19:32 grace spamd[6314]: Check that the necessary '.pre' files
are in the config directory.
Feb 08 02:19:32 grace spamd[6314]: At a minimum, v320.pre loads the
Check plugin which is required.
Feb 08 02:19:32 grace spamd[6289]: child process [6314] exited or timed
out without signaling production of a PID file: exit 255 at
/usr/sbin/spamd line 3034.
Feb 08 02:19:32 grace systemd[1]: spamassassin.service: Control process
exited, code=exited status=255
Feb 08 02:19:32 grace systemd[1]: spamassassin.service: Failed with
result 'exit-code'.
Feb 08 02:19:32 grace systemd[1]: Failed to start Perl-based spam filter
using text analysis.
-- Subject: Unit spamassassin.service has failed

At a friend's suggestion I also checked the mail.log and got this:

Feb  8 02:19:25 grace spamd[6144]: logger: removing stderr method
Feb  8 02:19:26 grace spamd[6172]: Timeout::_run: check: no loaded
plugin implements 'check_main': cannot scan!
Feb  8 02:19:26 grace spamd[6172]: Check that the necessary '.pre' files
are in the config directory.
Feb  8 02:19:26 grace spamd[6172]: At a minimum, v320.pre loads the
Check plugin which is required.
Feb  8 02:19:26 grace spamd[6144]: child process [6172] exited or timed
out without signaling production of a PID file: exit 255 at
/usr/sbin/spamd line 3034.

Yes, v320.pre loads the Mail::SpamAssassin::Plugin::Check module, which
is installed and up to date.  I've just about run out of ideas.  Anyone
have any?

Sorry this is so long, but I didn't want to omit any pertinent information.

Ken Wright,
pulling his hair out.



--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: SpamSender with 2 @-signs in the address

2018-12-03 Thread shanew

On Mon, 3 Dec 2018, Alan Hodgson wrote:


On Mon, 2018-12-03 at 13:17 -0600, sha...@shanew.net wrote:


Yeah, I see all these same things.  Better to test against From:addr
rather than the full From:  Perhaps something like:

From:addr =~ /\@[^\s]+\@/

Of course, there might still be legit cases of that kind of usage.



The problem though for phishes is that some user agents (ie. Outlook) only
display the quoted user-friendly part of the address, not the rest of the
From: header. So phishers specifically put a fake @domainbeingphished.com in
quotes so your users will see that.


There were several different plugins started about a year ago to
detect that sort of thing.  I know of:

https://github.com/enkidushane/sa-frommismatch
https://github.com/fmbla/spamassassin-fromnamespoof

and I think someone has implemented some of this in a regex rule, but
I don't recall off the top of my head who that was.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: SpamSender with 2 @-signs in the address

2018-12-03 Thread shanew

Yeah, I see all these same things.  Better to test against From:addr
rather than the full From:  Perhaps something like:

From:addr =~ /\@[^\s]+\@/

Of course, there might still be legit cases of that kind of usage.


On Mon, 3 Dec 2018, Alan Hodgson wrote:


On Mon, 2018-12-03 at 11:15 -0700, Grant Taylor wrote:

I don't think the multiple @ signs have worked in a very long time.  So 
I see no reason not to add score based on multiple @ signs.  Or if there 
is a legitimate use for it, it should be extremely rare and the false 
positive rate should be acceptable.




I've been watching these for a while, and unfortunately there are a lot of
customer-service type systems that send From: addresses with quoted @domain
addresses in them. Many of them do "user@address via"
, but not all.

And then there are the messages with 2 different From: addresses within <>'s
in them. I see those from Gmail sometimes.

And I see quite a few messages where the actual sender address is given in
quotes and then followed by the same address in <>'s.

So you will definitely get false positives just looking at @'s.

I've excluded the ones with " via" in them and add a bunch of extra points
if they come from phishy countries or have .doc or .pdf attachments, and
that hits fewer fps. And I'm only scoring if the domain parts don't match.




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Could not retrieve sendmail macro "auth_type"!.

2018-09-03 Thread shanew

I would doublecheck that the macro appears in sendmail.cf.  Maybe the
apt-get update ignores your sendmail.mc and just replaces the
sendmail.cf directly?


On Sun, 2 Sep 2018, Michael Grant wrote:


I'm running spamassassin on several debian systems using sendmail and using
spamass-milter.
I'm seeing this error in my mail logs on one I updated yesterday:

Sep  1 08:21:01 debian spamass-milter[536]: Could not retrieve sendmail
macro "auth_type"!.  Please add it to confMILTER_MACROS_ENVRCPT for better
spamassassin results

I definitely have this macro in my sendmail.mc file:

define(`confMILTER_MACROS_ENVRCPT',`r, v, Z, {auth_type}, {greylist},
{auth_ssf}')dnl

Furthermore on 2 other nearly identical systems I don't have this warning
message.  I only started seeing this warning message when I ran updates
yesterday.  I only get it on inbound mail.

The main packages are all the same version from one system to the other:

dpkg -l | g 'sendmail|spamass|milter'
ii  libmilter1.0.1:amd64                 8.15.2-11                   
 amd64        Sendmail Mail Filter API (Milter)
ii  sa-compile                           3.4.1-8                       all 
        Tools for compiling SpamAssassin rules into C
ii  sendmail                             8.15.2-11                     all 
        powerful, efficient, and scalable Mail Transport Agent (metapackage)
ii  sendmail-base                        8.15.2-11                     all 
        powerful, efficient, and scalable Mail Transport Agent (arch
independent files)
ii  sendmail-bin                         8.15.2-11                   
 amd64        powerful, efficient, and scalable Mail Transport Agent
ii  sendmail-cf                          8.15.2-11                     all 
        powerful, efficient, and scalable Mail Transport Agent (config
macros)
ii  spamass-milter                       0.4.0-1+b1                   
amd64        milter for filtering mail through spamassassin
ii  spamassassin                         3.4.1-8                       all 
        Perl-based spam filter using text analysis
ii  spamc                                3.4.1-8                     
 amd64        Client for SpamAssassin spam filtering daemon

The sendmail.mc is also the same (with differences being things like
hostnames).  

The only difference I know of is one system was updated via apt yesterday,
other a couple months old.

Anyone else seeing this?  What other change might have caused this?

Michael Grant





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Anti Phish Rules

2018-04-26 Thread shanew

On Thu, 26 Apr 2018, David Jones wrote:


header          __BAD_FROM_NAME     From:name =~
/(^chase$|chase\.com|Internal Revenue Service|banking|Bank of
America|American Express|Wells Fargo|NavyFederal|Geico|E-fax|Share.oint|UPS
Delivery|FedEx|PayPal|Apple Support|USAA|.ropbox|Dro.box)/i
meta            BAD_FROM_NAME       __BAD_FROM_NAME && !ALL_TRUSTED
describe      BAD_FROM_NAME       Displayed From contains bad information to
trick the recipients
score           BAD_FROM_NAME       4.0


People named Chase may not care for that first item in the grouping

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-03-15 Thread shanew

You might take a look at
https://developers.google.com/url-shortener/v1/getting_started

1 miion requests per day is the default limit.

On Wed, 14 Mar 2018, Rob McEwen wrote:


On 2/20/2018 9:42 PM, Rob McEwen wrote:
  Google might easily start putting captchas in the way or
  otherwise consider such lookups to be abusive and/or mistake
  them for malicious bots...

This prediction turned out to be 100% true. Even though others have
mentioned that they have been able to do high-volume lookups with no
problems... And granted I wasn't implementing a multi-server or multi-ip
lookup strategy... But I don't think I was doing nearly as many lookups as
others have claimed that they were able to do. I took a batch of 55,000
spams that I had collected from the past 4 weeks where those spams were
maliciously using the Google shortener as a way to get their spam delivered
via hiding their spammy domain names from spam filters. I started checking
those by looking up the redirect from Google's redirector, but without
actually visiting the site that the redirector was pointing to. Please note
that I was doing the lookups one-at-a-time, not starting the next lookup
until the last lookup had completed. After about ONLY 1,400 lookups, ALL of
my following lookups started hitting captchas. See attached screenshot.
Also, other than not sending from multiple IPs, I was otherwise doing
everything correct to make my script look/act like a regular browser.

I'll try spreading it out between multiple IPs in order to try to avoid rate
limits... However... This is still cause for concern about high-volume
lookups in high production systems... those may have to be implemented a
little more carefully if they're going to do these kind of lookups!

Just because small or medium production systems are able to do this... Or
just because somebody went out of their way to get more sophisticated with
it to get it to work out... doesn't mean that it's going to work in high
production systems that are trying to use "canned" software or plugins. This
is a particular challenge for anti-spam blacklists because they typically
process a very high volume of spams. Hopefully, the randomness of the ones I
process as they come in... will be sufficiently spread out enough to avoid
rate limiting?

It was my hope to start processing these live with my own DNSBL engine, so
that I could start blacklisting the domains that they redirect to... In
those cases where they were not already blacklisted... Now I'm going to have
to deal with constantly trying to make sure that I'm not hitting this
captcha, along with implementing some other strategies to hopefully prevent
that.

But this brings up a whole other issue... That is more of a policy or legal
issue... is Google basically making a statement that automated lookups are
not welcome? Or are considered abusive?

(btw, I could have collected order of magnitudes more than 55,000 of THESE
types of spams, but this was merely what was left over in an after-the-fact
search of my archives, after a lot of otherwise redundant spams had already
been purged from my system.)

PS - Once I gather this information, I will submit more details about the
results of this testing. But what is shocking right now is that less than
four tenths of 1% of these redirect URLs has been terminated, even though
they average two weeks old, with some almost a month old.




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-03-07 Thread shanew

Just FYI, it does add 3.0 points as soon as it sees any chaining at
all.  The other 5.0 points get added at 10 redirections.  That said,
I think you're guess is right that redirections start to look really
suspicious after just 3 or 4.


On Sat, 3 Mar 2018, @lbutlr wrote:


On Feb 26, 2018, at 09:55, sha...@shanew.net wrote:


This is why the DecodeShortURLs plugin has an explicit limit of 10
lookups (and penalizes such with a total of 8 points).


I’d guess more than one redirect is highly suspicious and more than two is 
probably a waste of time, just score 5.0 and be done with it.

Has anyone done any analysis on multi-redirects?




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-02-26 Thread shanew

On Mon, 26 Feb 2018, David B Funk wrote:

Just be careful how you do that "expand redirections until no more 
redirections" or you may get caught in a spammer trap.


This is why the DecodeShortURLs plugin has an explicit limit of 10
lookups (and penalizes such with a total of 8 points).

DecodeShortURLs has been on my list of must-have plugins for years, so
I was a little surprised it took so long for someone to mention it
in this thread.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-15 Thread shanew

On Thu, 15 Feb 2018, RW wrote:


On Thu, 15 Feb 2018 11:56:55 -0600 (CST)
sha...@shanew.net wrote:

So, the sample size doesn't matter when calculating the probability of
a message being spam based on individual tokens, but it can matter
when we bring them all together to make a final calculation.


It's not a matter of how they combine, smaller counts just lead to
less accurate token probabilities.

I'm not saying that it doesn't matter how much you train, I'm saying
that if you have enough spam and enough ham Bayes is insensitive to
the ratio.


I agree that past a certain minimum threshold, the ratio doesn't
matter much.  But as I understand it, larger sample size makes a
difference.

I haven't checked the math in the Bayes plugin, but it explicitly
mentions using the "chi-square probability combiner" which is
described at http://www.linuxjournal.com/print.php?sid=6467

Maybe I'm misunderstanding what that article describes, but I'm pretty
sure what it boils down to is that when the occurence of a token is
too small (he uses the phrase "rare words") it can lead to
probabilities at the extremes (like a token that occurs only once and
is in spam, so its probability is 1).  The way to address these
extremely low or extremely high probabilities is to use the Fisher
calculation (which is described in the second page of the article).

Maybe this is where I'm making a logical leap that I shouldn't, but I
think that "non-rare words" increasingly outnumber "rare words" as the
sample size of messages (and thus tokens) increases.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Train SA with e-mails 100% proven spams and next time it should be marked as spam

2018-02-15 Thread shanew

On Thu, 15 Feb 2018, RW wrote:


On Thu, 15 Feb 2018 00:01:18 +0100
Reindl Harald wrote:


Am 14.02.2018 um 23:07 schrieb RW:



My point is that an imbalance doesn't create a bias



wrong - what you tried to say was "doesn't necessarily create a bias"
- but in fact when the imbalance is too big *it does*

simply think about how bayes works makes that clear: eahc word a
token with ham/spam counter - when you have 1 Mio of one type and
1 of the other type guess how that counter start to get biased


As I said, Bayes is based on frequencies.

If a token occurs in 10% of ham and 0.5% of spam based on 10,000 hams
and 10,000 spams, what do you think is likely to happen to those
percentages with 10,000 hams and 1,000,000 spams?


Perhaps it would help to state Bayes' formula explicitly.

The probabality that a message is spam given a specific token is equal
to:

(the probabilty of a token occuring in spam) times (the probability
that a message is spam) divided by (the probabilty of that token
occuring in all messages)

The important feature in this formula is that every value being
operated on is a probability, so if a given token occurs in .5% of
10,000 spams, we would expect it to occur in .5% of 100,000 or
1,000,000.  If that assumption is true, and the .5% probability
doesn't change, the resulting calculated probability also doesn't
change.

For actual spam detection, this is complicated by the fact that we end
up with a whole stack of calculated probabilites for each token
(including the probabilities that a message is non-spam given specific
tokens), and we have to take all of them into account to calculate a
final probability.  In this process, it's not unusual that some
individual calculated probablities "matter" more than others, and one
basis for how much weight a particular probability gets is how much we
can trust that probability.  Here's where the 10,000 vs. 1,000,000
comes into play, because we can rely on the .5% probability out of
1,000,000 samples more than we can the .5% probability out of 10,000
samples, and both of those are better than a .5% probability out of
100 samples (that said, the difference in trust increases more between
100 samples and 10,000 samples than from 10,000 samples to 1,000,000
samples due to diminishing return).

So, the sample size doesn't matter when calculating the probability of
a message being spam based on individual tokens, but it can matter
when we bring them all together to make a final calculation.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: From name containing a spoofed email address

2018-01-26 Thread shanew

Just a hunch, but did you make sure to add the "$self->register..."
line inside the "sub new {" block with all the others in HeaderEval.pm?


On Fri, 26 Jan 2018, Chris wrote:


On Mon, 2018-01-22 at 10:05 -0500, Rupert Gallagher wrote:

This is my current solution for a problem that has been discussed
many times in this list. 
I wrote it last year, and it serves me well. Feel free to use it, if
you find it useful. 

This part goes into your local.cf:

header   __F_DM1 eval:from_domains_mismatch()
header   __F_DM2 From:addr =~
/\@(pec|legalmail|telecompost)(\.[^\.]+)?\.it/
meta   F_DM ( __F_DM1 && ! __F_DM2 )
describe   F_DM From:name domain mismatches From:addr domain
priority   F_DM -1
score  F_DM 5.0

This part goes into the general HeaderEval.pm:

$self->register_eval_rule("from_domains_mismatch");
[...]
sub from_domains_mismatch {
  my ($self, $pms) = @_;
  my $temp;
  $temp = $pms->get('From:addr');
  $temp =~ /@(.+)/; my $fromAddrDomain; $fromAddrDomain = "$1";
  $temp = $pms->get('From:name');
  $temp =~ /@([^\@\"\s]+)/; my $fromNameDomain; $fromNameDomain =
"$1";
  dbg("from_domains_mismatch: fromNameDomain=$fromNameDomain,
fromAddrDomain=$fromAddrDomain");
  if ( $fromNameDomain eq "" ) {
 return 0; # all well
  } else {
 if( $fromNameDomain eq $fromAddrDomain ) {
    return 0; # all well, they match
 } else {
    return 1; # mismatch, possibly spam
 }
  }
}

R.G.


Just for the heck of it I added the above to my SpamAssassin setup at
home. However my syslog shows:

rules: failed to run __F_DM1 test, skipping:
(Can't locate object method "from_domains_mismatch" via package "Mail:
[...]:SpamAssassin::PerMsgStatus" at (eval 1816) line 19.)

I did restart SA after adding this. SA version 3.4.1




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: From name containing a spoofed email address

2018-01-23 Thread shanew

Just to add to the confusion, uh, I mean options.  Here's what I've
got so far.  I'm using it in production currently, but it's still very
young code, so use it at your own risk.

https://github.com/enkidushane/sa-frommismatch/

I purposely avoided using uri_to_domain because it's in flux right
now, but I might go back and add a version check to make use of it.

As I mentioned to Paul privately, seeing others' code strengthens my
opinion that the hard part here is recognizing when an email address /
domain actually needs to be checked.  For instance, I require "@" to
be immediately followed be a valid domain character.  This avoids
false positives on things like "Events @ GA" (example from my email
stream); on the other hand, it would miss something like "Bob @
usaa.com".

If you try out my plugin, be warned that you will likely get
false-positives on yahoogroups.com.  I have yet to decide whether
detection of exceptions like this should be happening in the plugin,
or via some meta combination of rules.  If you hit other false
positives, I'd be interested to hear about them.


On Mon, 22 Jan 2018, Alex wrote:


Hi,


This part goes into the general HeaderEval.pm:

$self->register_eval_rule("from_domains_mismatch");
[...]


I'd like to try this, but this is not in the current 3.4.2 svn.



--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: From name containing a spoofed email address

2018-01-22 Thread shanew

I think what's tripping you up is what parts of the mail "From:addr"
and "From:name" refer to.  In the example you give:

From: blablabla <blabla...@gmail.com>

From:name will be "blablabla"
and
From:addr will be "blabla...@gmail.com"

Since there's no "@" in From:name, there's clearly not an email
address there, so there's nothing to compare to the domain part of
From:addr.

The "bounces.em.secureserver.net" you're referring to is part of the
EnvelopeFrom (AKA ReturnPath).  This particular check doesn't consider
that domain name in any way whatsoever.

On Mon, 22 Jan 2018, Chip wrote:


I might be wrong here understand I'm still learning, but the purpose of
the filter, from what I've been able to grasp, is that it checks  the
From:addr and From:name values in SA to find
their domain and triggering a rule hit if there is a domain in the
From:name that doesn't match the domain in the From:addr.

In the example I sent From: (as in From:name) contains the domain
"gmail.com" - blabla...@gmail.com

From:addr contains "bounces.em.secureserver.net"

Thus mismatch between From:name that doesn't match the domain in the
From:addr.

Thus it would identify this message as probably spam, which it is not.

Are people talking about a name like "bla@bla...@domain.com"? in this
thread meaning the actual "@" character in the "name" or are we
comparing domains from the From:add to the domain in the From:name?



On 01/22/2018 05:56 PM, RW wrote:

On Mon, 22 Jan 2018 17:44:00 -0500
Chip wrote:


Following is the full header with identifiable information
anonymized.

I don't see   what you are getting at, in:


  From: blablabla <blabla...@gmail.com>

blablabla doesn't  contain an "@".





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: From name containing a spoofed email address

2018-01-22 Thread shanew

This particular effort is looking at the From header, not the EnvFrom
header (though there is a check From==EnvFrom as well).  What we're
looking for here are things like:

From: "b...@usaa.com" <bgef...@gmail.com>

Or look at the pastebin example at the start of the thread.

Also, without seeing the full email, I can't say for sure, while your
example may be legitimate email, the "dmarc=fail" suggests that 
the sender is, in fact, spoofing that gmail address (as in, it lacks a

valid DKIM and/or doesn't come from a server approved by gmail's SPF
record).  It's just that spoofing isn't a sure-fire way to determine
that something is spam (if only...).



On Mon, 22 Jan 2018, Chip wrote:


So it's my understanding that SA does the following with this rule,
which is it is checking the From:addr and From:name values in SA to find
their domain and triggering a rule hit if there is a domain in the
From:name that doesn't match the domain in the From:addr.

However, when I examine the headers from many legitimate non-spoofed
emails from bulk senders such as constantcontact, madmimi, sendgrid,
etc. it is very common to find a legitimate sender with a From:addr such
as n...@gmail.com which clearly conflicts with the domain name in the
From:addr, address being, for example, with madmini bulk sending as an
example:

smtp.mailfrom=sp_12x.55xx.1.d2b65521fe5d9342...@bounces.em.secureserver.net;
   dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=gmail.com
Return-Path:
<sp_12x.55xx.1.d2b65521fe5d9342...@bounces.em.secureserver.net;>
Received: from m205.em.secureserver.net (m205.em.secureserver.net.
[1xx.xx.xxx.xx])

From: balblabla <blabla...@gmail.com>

would this rule classify that email as probably spam when in fact it
most certainly is not.

So what am I not understand here?

On 01/22/2018 10:20 AM, David Jones wrote:

On 01/22/2018 09:05 AM, Rupert Gallagher wrote:

This is my current solution for a problem that has been discussed
many times in this list.
I wrote it last year, and it serves me well. Feel free to use it, if
you find it useful.

This part goes into your local.cf:

header   __F_DM1 eval:from_domains_mismatch()
header   __F_DM2 From:addr =~
/\@(pec|legalmail|telecompost)(\.[^\.]+)?\.it/
meta   F_DM ( __F_DM1 && ! __F_DM2 )
describe   F_DM From:name domain mismatches From:addr domain
priority   F_DM -1
score  F_DM 5.0

This part goes into the general HeaderEval.pm:

$self->register_eval_rule("from_domains_mismatch");
[...]
sub from_domains_mismatch {
   my ($self, $pms) = @_;
   my $temp;
   $temp = $pms->get('From:addr');
   $temp =~ /@(.+)/; my $fromAddrDomain; $fromAddrDomain = "$1";
   $temp = $pms->get('From:name');
   $temp =~ /@([^\@\"\s]+)/; my $fromNameDomain; $fromNameDomain = "$1";
   dbg("from_domains_mismatch: fromNameDomain=$fromNameDomain,
fromAddrDomain=$fromAddrDomain");
   if ( $fromNameDomain eq "" ) {
  return 0; # all well
   } else {
  if( $fromNameDomain eq $fromAddrDomain ) {
 return 0; # all well, they match
  } else {
 return 1; # mismatch, possibly spam
  }
   }
}

R.G.




This looks like a simple and valuable approach that should be
considered for inclusion into SA for everyone.  Do you mind opening up
a bug at https://bz.apache.org/SpamAssassin/ in the Plugins section?

We could put this in for everyone with a low score and give it a trial
run before increasing the score.  I will run it locally as well and
see how it goes.




Sent with ProtonMail <https://protonmail.com> Secure Email.

 Original Message 
On 17 January 2018 8:31 PM, David Jones <djo...@ena.com> wrote:


Would a plugin need to be created (or an existing one enhanced) to be
able to detect this type of spoofed From header?

From: "h...@hulumail.com <mailto:%22h...@hulumail.com> !"
lany...@hotmail.com <mailto:lany...@hotmail.com>



    https://pastebin.com/vVhGjC8H

    Does anyone else think this would be a good idea to make a rule
    that at
    least checks both the From:name and From:addr to see if there is an
    email address in the From:name and if the domain is different
add some
    points?

    We are seeing more and more of this now that SPF, DKIM, and
DMARC are
    making it harder to spoof common/major brands that have properly
    implemented some or all of them.

David Jones








--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: From name containing a spoofed email address

2018-01-19 Thread shanew

I've got a basic plugin written for this now, but I'd like to do a
litle more testing before I make it widely available.  If you have
mail samples (ham or spam) with an "@" character in the name part of
the From field that you're willing to share, let me know.

BTW, I've already run into some false-positive situations, the most
common being things from yahoogroups, which apparently writes the
"true" sender address in the name part of From (they also dkim sign,
so not too hard to work around).  I started trying to handle these in
the plugin itself, but I'm beginning to think these would be better as
separate rules and then combined as metas to mitigate the actual
mismatch score.


On Wed, 17 Jan 2018, David Jones wrote:

Would a plugin need to be created (or an existing one enhanced) to be able to 
detect this type of spoofed From header?


From:  "h...@hulumail.com !" <lany...@hotmail.com>

https://pastebin.com/vVhGjC8H

Does anyone else think this would be a good idea to make a rule that at least 
checks both the From:name and From:addr to see if there is an email address 
in the From:name and if the domain is different add some points?


We are seeing more and more of this now that SPF, DKIM, and DMARC are making 
it harder to spoof common/major brands that have properly implemented some or 
all of them.





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Turn OFF SA spam filtering but keep ON header examination

2018-01-18 Thread shanew

I can't help but think that you'd be better of using something like
procmail, maildrop (part of Courier), or sieve if want you want is
sorting without all the overhead of checking for spam.

But maybe I'm not understanding what you want to accomplish...

On Thu, 18 Jan 2018, Chip wrote:


Newbie excited to use the features of SpamAssassin for a new project
that needs to flag inbound email for sorting into folders  (this can be
done via cpanel-level filtering) based on keywords in headers (header
search by SA).

This is a Centos 6.9 machine running cpanel/WHM 11.68.0.23 and
SpamAssassin version 3.4.1 running on Perl version 5.10.1.

I would like to TURN OFF any and all Spam Identification features and
only leave behind SpamAssassin's examination of headers and subsequent
Subject modification based on keywords in headers (such as keywords in
DKIM or SPF, etc)

1) Can this be done, and;

2) What tweaks need to be made to SA in its configuration files to make
it happen, and;

3) what else is recommended here.

Thank you.



--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Mail flagged as spam on command line getting passed through as ham

2018-01-18 Thread shanew

Most likely you've forgotten to restart spamd or maybe whatever glue
calls SpamAssassin (amavisd, for example).

As a side note, if you want it to score 7 regardless of network/bayes
tests (which is what your score line indicates), you can just use
"score SHARK_TANK 7"


On Thu, 18 Jan 2018, Andy Howell wrote:


I've been getting annoying spams for "Shark Tank". I added a simple rule in 
local.cf to check the subject line:

header SHARK_TANK   Subject =~ /\bshark tank\b/i
score SHARK_TANK 7 7 7 7

The mail still get through. In my inbox:

X-Spam-Flag: NO
X-Spam-Score: 4.148
X-Spam-Level: 
X-Spam-Status: No, score=4.148 required=6.2 tests=[BAYES_80=2, DIET_1=0.001,
HTML_IMAGE_RATIO_02=0.437, HTML_MESSAGE=0.001, SPF_HELO_PASS=-0.001,
T_REMOTE_IMAGE=0.01, T_RP_MATCHES_RCVD=-0.01, T_SPF_TEMPERROR=0.01,
URIBL_BLACK=1.7] autolearn=no autolearn_force=no

If I pass the mail through spamassasin on the command line, it gets flagged as 
spam:

spamassassin -D < spam-mail-shark-tank.txt >out.txt 2>&1

In  out.txt:

X-Spam-Flag: YES
X-Spam-Level: 
X-Spam-Status: Yes, score=20.5 required=5.0 tests=BAYES_60,DIET_1,
    HTML_IMAGE_RATIO_02,HTML_MESSAGE,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,
    
RCVD_IN_SBL_CSS,SHARK_TANK,SPF_HELO_PASS,T_REMOTE_IMAGE,URIBL_ABUSE_SURBL,
    URIBL_BLACK,URIBL_DBL_SPAM autolearn=spam autolearn_force=no 
version=3.4.1
X-Spam-Report:
    *  7.0 SHARK_TANK No description available.
    *  1.2 URIBL_ABUSE_SURBL Contains an URL listed in the ABUSE SURBL
    *  blocklist
    *  [URIs: coloringkidsus.com]
    *  3.3 RCVD_IN_SBL_CSS RBL: Received via a relay in Spamhaus SBL-CSS
    *  [107.175.23.4 listed in zen.spamhaus.org]
    *  2.5 URIBL_DBL_SPAM Contains a spam URL listed in the DBL blocklist
    *  [URIs: coloringkidsus.com]
    *  1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
    *  [URIs: coloringkidsus.com]
    * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record
    *  0.0 DIET_1 BODY: Lose Weight Spam
    *  0.4 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image 
area
    *  1.5 BAYES_60 BODY: Bayes spam probability is 60 to 80%
    *  [score: 0.7650]
    *  0.0 HTML_MESSAGE BODY: HTML included in message
    *  1.9 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
    *  [cf: 100]    *  0.9 RAZOR2_CHECK Listed in Razor2 
(http://razor.sf.net/
   *  0.0 T_REMOTE_IMAGE Message contains an external image
X-Spam-Bayes: bayes=0.7650, N=176(88-0+3), ham=(), spam=(shark, Pill, craze)

Any ideas what I'm doing wrong?

Thanks,

Andy




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: From name containing a spoofed email address

2018-01-18 Thread shanew

On Thu, 18 Jan 2018, RW wrote:

I think the hard part is handling IDNs, e.g.

"=?UTF-8?B?Zm9vQGLDvGNoZXIuY29t?=" <f...@xn--bcher-kva.com>

the display name should decode to the UTF-8 byte sequence for
foo@bücher.com, but I presume the address would be left as the ASCII
IDN.

In the short term it's probably best to avoid matching on IDNs, but that
does allow the use of homographs in spoofing ASCII domains.


Yeah, that occured to me, and I decided to set that problem aside for
now (probably someone more familiar with the issues should address
it).



BTW it's best to only match on the organizational domain, to avoid
FPs on the likes of:


Do you (or anyone, for that matter) have samples of emails like this
that they could share for me to test against?


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: From name containing a spoofed email address

2018-01-17 Thread shanew

I started working on this, and quickly realized the hard part is
determining/parsing the domain out of the From:name variable.

Is there any existing code in SA that "recognizes" email addresses
that can be called and/or re-used?

On Wed, 17 Jan 2018, David Jones wrote:

Would a plugin need to be created (or an existing one enhanced) to be able to 
detect this type of spoofed From header?


From:  "h...@hulumail.com !" <lany...@hotmail.com>

https://pastebin.com/vVhGjC8H

Does anyone else think this would be a good idea to make a rule that at least 
checks both the From:name and From:addr to see if there is an email address 
in the From:name and if the domain is different add some points?


We are seeing more and more of this now that SPF, DKIM, and DMARC are making 
it harder to spoof common/major brands that have properly implemented some or 
all of them.





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: From name containing a spoofed email address

2018-01-17 Thread shanew

I swear I came across a rule like this just the other day, but now I
can't find it, which is probably a sign of faulty memory.  In any
case, the existing HeaderEval Plugin seems like a good place for this
(it already does a check for EnvFrom and From domain mismatches).


On Wed, 17 Jan 2018, David Jones wrote:

Would a plugin need to be created (or an existing one enhanced) to be able to 
detect this type of spoofed From header?


From:  "h...@hulumail.com !" <lany...@hotmail.com>

https://pastebin.com/vVhGjC8H

Does anyone else think this would be a good idea to make a rule that at least 
checks both the From:name and From:addr to see if there is an email address 
in the From:name and if the domain is different add some points?


We are seeing more and more of this now that SPF, DKIM, and DMARC are making 
it harder to spoof common/major brands that have properly implemented some or 
all of them.





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Using fuzzy patterns

2018-01-17 Thread shanew

On Sat, 13 Jan 2018, Alex wrote:


From: "F*e dE x" <fedexdispatchl...@speedpost.com>

That address hardly resembles "Fed Ex", but how general of a rule can
we create and still catch variations such as this?

I thought something like this would work:

headerFUZZY_FEDEX   From =~
/(?!f.?e.?d.{0,3}e.?x).?.?.{0,3}.?/i


To fully debug this, I think we need to know the replace_tag
definitions you've set for these characters.  That said, the first
thing I notice is that the negative lookahead pattern matches your

From header (twice, I think).  This means that no matter what follows,

this rule will not trigger.  I suspect you want the negative lookahead
to be more strictly correct, like "(?!fed ex)".

You may also want to use "From:name =~" to limit the search to the
non-address portion of the header.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Mailsploit

2017-12-13 Thread shanew

Note that after enabling KAM.cf, you'll want to watch more closely for
false positives and possibly adjust scores as necessary.  I think it's
a great addition to the default rules, but it's primarily tuned to
Kevin's environment (though he's open to improvements) and some of the
rules/scores may not be appropriate for your environment.

On Wed, 13 Dec 2017, Groach wrote:



On 13/12/2017 20:48, Antony Stone wrote:

On Wednesday 13 December 2017 at 21:41:04, Groach wrote:

Is there any suggestions on a rule or procedure to implement that will
help defend against the MAILSPLOIT type of spoofing?

See https://marc.info/?l=spamassassin-users=151265708616825=2 and follow
-
ups?


Thanks for that.

I followed the thread you mentioned:  I see that 'Kevin' says he has a rule
in his personal KAM.cf and that there isnt anything published in base
spamassassin scores.  (Or am I missing something)?

So how does one:

a,  obtain KAM.cf  or
b,  decipher the mechanism to which Kevin uses in order we can apply similar
in our own local.cf

(All help appreciated)




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Mailsploit and RFC1342 and spoofed From

2017-12-07 Thread shanew

I managed to run a test about an hour ago on my first try, so maybe
AWS upped his limit or demand has slowed down.  Or maybe I just got
lucky...

YMMV

On Thu, 7 Dec 2017, Kevin A. McGrail wrote:


The tests are not working because of aws send limits. Unlikely to work.
Regards,
KAM

On December 7, 2017 1:57:41 PM EST, Pedro David Marco
<pedrod_ma...@yahoo.com> wrote:
  You can get tests here...

https://www.mailsploit.com/index#demo

---
PedroD.





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: spamd Will Not Create unix:socket

2017-11-27 Thread shanew

tmpfiles.d became a thing when /run became a temporary filesystem, so
it is relatively new.  And most of the time packages install the
necessary files in /usr/lib/tmpfiles.d, so admins may have never run up
against this issue since it became a thing.

As John says, you can file a bug report with RedHat.  Technically that
directory is only necessary when you're running spamd on a socket, so
they may not consider it a bug.  For what it's worth, there's no
tmpfiles.d entry on my Ubuntu or Gentoo systems (Gentoo does its
thing in the init script).

I wonder if it's worth adding a note to the wiki, or even the
--socketpath section of the spamd man-page?


On Mon, 27 Nov 2017, John Hardin wrote:


On Mon, 27 Nov 2017, Colony.three wrote:


>  I suspect you need an entry in /etc/tmpfiles.d so that directory gets
>  created at boot time.

 Indeed there is no tmpfiles in the spamassassin package. (I've never heard
 of this in 22 years)  How can this be, in the 21st Century?  As I'd
 suspected, everyone is settling for the tcp:port.

 What should I do about this, if anything?  Fix it just for myself, or let
 someone else know?


Report it to the RedHat bugzilla. The SA team doesn't handle distro-specific 
packaging issues.






--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: spamd Will Not Create unix:socket

2017-11-27 Thread shanew

I suspect you need an entry in /etc/tmpfiles.d so that directory gets
created at boot time.

Google tmpfiles.d or see this redhat blog page:
https://developers.redhat.com/blog/2016/09/20/managing-temporary-files-with-systemd-tmpfiles-on-rhel7/


On Mon, 27 Nov 2017, Colony.three wrote:


I have fought with this for days, and finally had to hotwire it.  But I'd
like to understand what's going on.

RHEL7 with spamassassin 3.4.0 and  spamass-milter-postfix 0.4.0.

/etc/sysconfig/spamassassin
SPAMDOPTIONS="--daemonize --create-prefs --max-children=5 --username=spamd
--groupname=spamd --socketpath=/run/spamassassin/spamd.sock
--socketowner=spamd --socketgroup=spamd --socketmode=660 --ipv4-only"


spamassassin.service:
[Unit]
Description=Spamassassin daemon
After=syslog.target network.target
PartOf=spamassassin-update.service
[Service]
Type=forking
PIDFile=/run/spamd.pid
EnvironmentFile=-/etc/sysconfig/spamassassin
ExecStartPre=-/sbin/portrelease spamd
ExecStart=/usr/bin/spamd --pidfile /run/spamd.pid $SPAMDOPTIONS
StandardOutput=syslog
StandardError=syslog
Restart=always
[Install]
WantedBy=multi-user.target

It simply would not create /run/spamassassin directory on boot.  It is
supposed to create it automatically like clamd does, since /run is wiped at
each boot.  To make it work I finally had to add:
ExecStartPre=/usr/bin/mkdir /run/spamassassin
ExecStartPre=/bin/chown -R spamd:spamd /run/spamassassin

SELinux is set to Permissive, so that's not it.  Any ideas?








--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Ends with string

2017-09-15 Thread shanew

On Fri, 15 Sep 2017, Robert Boyl wrote:


uri 
__KAM_SHORT/(\/|^|\b)(?:j\.mp|bit\.ly|goo\.gl|x\.co|t\.co|t\.cn|tinyurl\.com|hop\.kz|u
rla\.ru|fw\.to)(\/|$|\b)/i

Seems a bit complicated.

It would be to make this rule check that suffixes are at the end of URI.

uri __TEST_URLS /\b(\.vn|\.pl|\.my|\.lu|\.vn|\.ar)\b/i

I believe this does it, correct?

uri __TEST_URLS /\b(\.vn$|\.pl$|\.my$|\.lu$|\.vn$|\.ar$)\b/i


As Paul said, if you're just looking at uris, the enlist_uri might be
the better way to go.  And it has the advantage that you don't have to
use (some might say abuse) regular expressions.

I believe URIs as collected for the uri tests consist of more than
just the server part of the URI, but maybe I'm wrong (or maybe the
list includes the server part only as well as the full URI).  If I'm
correct, then using the "$" will not work where URIs have a local part
and might not work where there's only a trailing "/".

In the case where you're only looking at the TLD, you don't have to
worry about the front word boundary because you're explicitly
anchoring the front of the match with the "\." part.  At the end, you
need to make sure that you're not allowing characters that would
indicate the server part of the URI continues past your intended match
(to avoid things like matching "blah.com" when you're really trying to
match ".co").  In my estimation, the characters that might indicate
continuation of the URI are letters, numbers, underscores, hyphens,
and the literal ".".

So, my rule for just matching TLDs looks like:

uri __TEST_URLS  /\.(vn|pl|my|lu|vn|ar)\b[^\.-]/i

The "\b" part excludes the letters, numbers and underscore because
those wouldn't be a word boundary.  The "[^\.-]" part excludes the
hyphen and literal "." from being on the right side of that word
boundary.

And now that I'm looking at it, I'm wondering if it would match a
URI like "https://legit.domain.com/great.beer/; ("beer" being one of
the TLDs my rule contains).  Like I said, the enlist_uri method might
be worth it just to avoid regular expressions.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Ends with string

2017-09-15 Thread shanew

On Fri, 15 Sep 2017, Paul Stead wrote:


Something along the following still seems the easiest to read approach to me

enlist_uri_host (BADTLDS) vn

enlist_uri_host (BADTLDS) pl

enlist_uri_host (BADTLDS) my

enlist_uri_host (BADTLDS) lu

enlist_uri_host (BADTLDS) ar

header __TEST_URLS eval:check_uri_host_listed('BADTLDS')


If you're only looking at uris, it probably is (though I wonder a
little about processing time between a long list of such entries and a
single (if also long) regular expression).  I have rules for "bad"
tlds that look in headers as well (Received, From, Env_From being the
main ones), so these wouldn't help with that.  If there's something
similar for those cases, I'd love to know about it.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Ends with string

2017-09-08 Thread shanew

If I recall correctly (and it's been a while), I was seeing false
positives where t.co was matching t.com (or something like that) so I
was only paying attention to the need to not allow an alpha-num.
Short-sighted, I know (and I might have forgotten that \b isn't a
character match).

The regex I use to anchor tlds these days (and please tell me if this
doesn't work the way I intend) looks like:

uri  NEWTLD_URI  /\.(accountant|beer|bid|..|win|work|xyz)\b[^\.-]/i

I have slightly different regexes to match email addresses or server
names in headers, but they all basically express the rule "I need to
see a word boundary here, but certain non-word characters don't count
because it implies the domain name may continue in the given context"

On Fri, 8 Sep 2017, RW wrote:


On Fri, 8 Sep 2017 13:03:57 -0400
Kevin A. McGrail wrote:


On 9/8/2017 12:24 PM, Robert Boyl wrote:

Hello, everyone!

Is there a way to create a Spamassassin rule that checks for a
certain URL suffix such as .ru but makes sure it has to be at the
end of the URI? Ends with string.

Thanks!
Rob


Yes, it's called an anchor and Shane Williams a long time ago gave me
some advice on that I used in this rule:

uri __KAM_SHORT
/(\/|^|\b)(?:j\.mp|bit\.ly|goo\.gl|x\.co|t\.co|t\.cn|tinyurl\.com|hop\.kz|urla\.ru|fw\.to)(\/|$|\b)/i


That doesn't look right, at least not in the context of the OP's
question.

In  (\/|$|\b)  the \b seems superfluous as it will match a boundary
between a letter and a '.' so the rule will for example match

goo.gl.example.com



--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: tflags

2017-08-03 Thread shanew

Apologies, I should have used the phrase "score set" rather than
ruleset.  The "score" section of Mail::SpamAssassin::Conf talks about
it briefly, as does the this wiki page:

https://wiki.apache.org/spamassassin/WritingRules

On Thu, 3 Aug 2017, Ian Zimmerman wrote:


On 2017-08-03 10:38, sha...@shanew.net wrote:


The most common ones that I make use of are "multiple" and "maxhits"
in order to allow a rule to be scored for each time it hits, but to
stop counting after some threshold.  I also use the "net" tflag so
that RBL checks only run when a net-based ruleset is loaded.


Where is the concept of "ruleset" in general documented, and in
particular what makes it "net-based"?  Not in Mail::SpamAssassin::Conf.




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: tflags

2017-08-03 Thread shanew

The Mail::SpamAssassin::Conf man page includes a section on tflags and
their various functions, but generally speaking tflags allow you to
alter the way in which a rule is processed.

The most common ones that I make use of are "multiple" and "maxhits"
in order to allow a rule to be scored for each time it hits, but to
stop counting after some threshold.  I also use the "net" tflag so
that RBL checks only run when a net-based ruleset is loaded.

As an example, I have various uri rules to detect emails from
questionable journals.  Since it's possible that someone might be
having a legitimate mail conversation about that journal and share the
URL to their site, I want to count how many times the URL appears, so
I add a "multiple" tflag for the rule.  More appearances means the 
mail is more likely to be advertising the journal or soliciting

articles.  On the other hand, once it's been seen eight time (or 15 or
whatever), there's a diminishing return on that rule's ability to tell
me anything more about the email, so I use "maxhits=8" to keep it from
continuing to look for the uri (and to stop scoring additional points).


On Thu, 3 Aug 2017, John Schmerold wrote:


I don't understand the purpose of tflags. Where is this parameter explained?




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Feature idea: Expiring rules

2017-06-13 Thread shanew

On Tue, 13 Jun 2017, Dianne Skoll wrote:


Hi,

Something I and possibly others might find useful would be rules that
expire.  Quite often, we might make some very specific rules to handle
a particular spam run and they lose their effectiveness pretty quickly.


I would love this for private rules, especially if it could be applied
to blacklist (or whitelist, I suppose) entries.  We regularly
blacklist specific addresses when they've obviously fallen victim to
some form of compromise.  If I could set those to expire rather than
add an annotation that I have to manually remember (or more likely
forget) to remove later, it would be fantastic.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Weird Spamassassin startup behaviour on Ubuntu 16.10

2016-12-07 Thread shanew
only on a cold start of the
system?

Is it possible to configure a SA
starup dependency on the network
being
up?




--
Public key #7BBC68D9 at    | Shane
Williams
http://pgp.mit.edu/    |  System Admin -
UT CompSci
=--+---
All syllogisms contain three lines |
 sha...@shanew.net
Therefore this is not a syllogism  |
www.ischool.utexas.edu/~shanew






--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Weird Spamassassin startup behaviour on Ubuntu 16.10

2016-12-06 Thread shanew

I recently set up an email server on Ubuntu 14.10 and kept being
frustrated that on boot various filter software and related milters
were regularly starting after sendmail, sometimes by as much as five
minutes.  We don't reboot that server very often, so it took a while
to test various fixes, but in the end I added the following lines to
the INIT INFO section of various milters (it's really only the first
one that matters for startup):

# X-Start-Before:sendmail
# X-Stop-After:  sendmail

If postfix uses an /etc/init.d script like sendmail does on 14.10,
check to see what the "Provides:" part of the INIT INFO is (probably
postfix), and add an X-Start-Before line with tha value to the
spamassassin init script.  Or, if you just want to make sure that SA
starts before monit, use whatever the "Provides:" is set to in the
monit init script.

If you have a mixture of SysV (regular) and upstart script, things get
more complicated (unless 16.10 introduces functionality to make
dependencies interoperable that doesn't exist in 14.10).

On Tue, 6 Dec 2016, Michael Heuberger wrote:


Hi David

I dont know. Not sure how I can find this out whether it does some 
DNS/network stuff.


In my other response to John you can see that it takes about 5.69 sec to 
start spamassassin.


And no idea how to configure a SA startup dependency on the network being up. 
And shouldn't that come along with the package when installed via apt-get?


- Michael


On 6/12/16 11:47, David B Funk wrote:


 Could it be some kind if interaction with other system services startup?
 (in particular this feels like a network timeout issue).

 One of the things SA does during its startup process is check to see if
 DNS/network stuff is available.
 If the system hasn't yet brought up the network stack when SA starts, it
 may hang waiting for the network to stabilize.

 On a running system, if you stop/restart SA do you see the same delay or
 is it only on a cold start of the system?

 Is it possible to configure a SA starup dependency on the network being
 up?






--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Assistance needed

2016-10-18 Thread shanew

On Tue, 18 Oct 2016, Kris Deugau wrote:


I saved the message and dug up a copy of the FB_CIALIS_LEO3 rule RW
mentioned;  I note that as he said it's not part of the current live
rules, and in fact checking further it looks like it's been commented
out entirely in the rules development sandbox, so it's not even
considered for testing.

Running the saved message through SA with the rule pasted into a
temporary rules definition file, I found:

dbg: rules: ran body rule FB_CIALIS_LEO3 ==> got hit: "Calm All is"

(from "NW1826 All is Calm All is Bright")

which is probably a good example of why this rule is no longer present.


Ideally, I'd say you should ask GetResponse to remove that rule
entirely.  If they won't do that, it should at least be scored _way_
lower (less than 1 for sure, but more like 0.2 or 0.1).

If they won't (or can't) do that, then you may want to tell them that
you'll be looking for a new provider, because that tells me they
really have don't know what they're doing (that they couldn't figure
this out for you isn't impressive either).

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: RCVD_IN_SORBS_SPAM and google IPs

2016-09-09 Thread shanew

On Thu, 8 Sep 2016, RW wrote:


On Thu, 8 Sep 2016 15:53:00 -0500 (CDT)
Shane Williams wrote:


Hey all,

I'm seeing google IP ranges hit the RCVD_IN_SORBS_SPAM rule, and in
digging deeper, I realize that there are zero hits on this rule for
the two weeks prior to Aug. 31, and now I'm seeing it thousands of
times per week (not just against google IPs).

Was this rule added/changed/re-scored in a recent sa-update?


It was commented out for a long time because it had a delisting fee,
but was recently re-enabled.

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=2221#c16



Thanks for that link, as it clarifies why it just started scoring
again.

This is the first time (at least in a long time) that I've looked at
ruleqa, but it seems like
http://ruleqa.spamassassin.org/20160904-r1759058-n/RCVD_IN_SORBS_SPAM/detail
would indicate that it should be scored at zero (since its S/O is
nearly .5), but instead it's 2.399, which is a lot to add for a rule
that's been napping for the last 13 years.

Perhaps more to the root issue, I'm concerned that it looks like
listing on SORBS is based on total volume rather than percentage.
Their summary page for the IP I checked (209.85.218.48), seems to say
that there have been 28 "recent" spam entries seen from this address,
but I would imagine this is a miniscule percentage off all email sent
from that address.  If that's all it takes to get listed, I'm kind of
surprised that all of google's IPs aren't listed.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: I have some bad news

2016-08-17 Thread shanew

I'm finding this discussion interesting, because I've been trying to
wrap my head around the theoretical basis of this system.  As such,
I've noticed that several questions have been asked now that are
explained in the document Marc initially pointed to
(http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter).
Given Marc's situation, it seems reasonable to read that document
before asking too many questions.

As a way to (maybe) save Marc some time, test my own knowledge and
perhaps help move the conversation forward, I'm going to summarize
the questions I've seen so far and, as much as possible, the answers
to those questions (and Marc, correct me if I'm getting anything wrong
here):

- How do you classify an email that has tokens from both the ham and
spam set?
Whichever set (out of "only found in ham" and "only found in spam") is
larger (or "better") determines the final classification.

- What length are the tokens?
Marc's examples use multiple length tokens, capturing everything
between 1 and 4 "words", but I suspect the exact maximum token
length might be adjustable.

- What happens when spammers use "hammy" text to avoid detection?
I don't see this directly addressed, but I would guess there are
several things that mitigate against this.  Multi-word tokens
prevent the truly random word salad attempts at poisoning, and
probably help with "cuttings" from other texts because the transition
from one cutting to the next probably doesn't appear in ham, leaving
the "spam-only" aspects of the mail to push it towards a spam
classification.  The unlearning and expiration of fingerprints would
mean that such cuttings would have to appear repeatedly over time in
legitimate mail to tip an email toward a ham classification.

- Will bad spellers (or typists) be seen as spammier?
Again, I don't see this addressed specifically, but I don't think so,
unless they are such tremendously bad spellers that nearly every word
is misspelled.  To take the "let's get some lunch" example, even if I
accidentally mis-type "some" as "som", I still have other tokens to
compare against, and the tokens "som", "get som", "som lunch", "let's
get som", etc. would have to have appeared in spam (and only spam) to
pull the classification toward spam.  So I'd say the occasional typo
or misspelling would come up neutral.

- What happens to messages that have a lot of neutral tokens?
Now I'm really speculating, but unless every token is neutral, there's
still something to decide on, though it does seem that detection
becomes less reliable as the number of non-neutral tokens appraches
zero.  A similar question that I thought of is what happens to
messages where the the final sets "only found in spam" and "only found
in ham" are nearly (or exactly) the same size.  If you're using this
filter as part of SA scoring, the answer would seem to be that you
have an appropriately small score for "undetermined" (like bogofilter
does), but if it's acting as a separate filter, I don't know.

On Wed, 17 Aug 2016, Antony Stone wrote:


On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote:


What I'm doing is looking for fingerprints in email that intersect HAM
and not in SPAM - which would be a HAM result.
If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

So if I say to you, "Let's get some lunch" that's ham because spammers
never say that, but normal people do. So the way to test what "spammers
never say" is to store what they do say and see if it's NOT in the list.
(Thus the infinite set)


What length are the tokens you store in the list?  Single words (so the above
lunch example would contain 4 tokens)?  Entire phrases (so the above would be
just 1 token)?  Also how do you deal with spam which contains random cuttings
from legitimate texts (generally along with a graphic attachment and/or a URL
to get aross the "real" message)?


Similarly, there's only so many ways to misspell viagra, and good email
wouldn't have it spelled wrong.


Does this mean that people with bad spelling will more likely get classified as
spam, because they do not match the 'ham' group very well?

Also, what happens to mail contains lots of tokens which match neither set
(for example, perfectly legitimate email which happens to be in a language the
system hasn't been trained with)?


Antony.




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Is greylisting effective? (was Re: Using Postfix and Postgrey - not scanning after hold)

2016-08-01 Thread shanew

On Sun, 31 Jul 2016, Robert Schetterer wrote:


Greylisting was invented as an idea against bots. Its based on the idea
that bots "fire and forget" when they see a tmp error and dont get back.

But thats historic, bots are recoded, better antibot tecs were invented.
The only problem now is people still believe in historic stuff.


This argument ignores two important facts.

First, even if 98% of bots and viruses (and that number is pure
conjecture on my part) are now smart enough to retry, that doesn't
change that greylisting is a just about the lowest "cost" way of
preventing the ones that aren't smart enough (or aren't designed to
retry because they want to push the most amount of junk at the
lowest-hanging fruit).

Second, the ability of a bot, virus, server or any other spam source
to retry delivery after a temp failure is not the only "weakness"
greylisting takes advantage of.  A spam source might not get past my
greylist for any number of reasons, including the classic case of poor
coding/design, but also:

- It is detected and blocked (or taken offline) by the source network
  before its greylist period is up
- It make use of a compromised account, and that account is disabled
  or secured before its greylist period is up
- It is part of a distributed botnet, so subsequent attempts come from
  a different IP/network
- It sends a high volume of spam, so it doesn't come back around to
  retry again until after its entry has been removed, requiring
  a whole new greylisting period

Others could probably add to that list, but that's just off the top of
my head.  But, even if a spam source retries and successfully makes it
past the greylisting, the greylisting still provides potential
benefits, like:

- While it was waiting to retry, its IP has been added to BLs, which
  my other filters will score appropriately
- While it was waiting to retry, the phishing URL in it has been
  reported and taken down (or the URL shortener link it used has been
  removed)
- While it was waiting to retry, the virus it carries has been
  identified and pushed out to my virus definitions
- While it was waiting to retry, its registered domain has been
  removed
- While it was waiting to retry, others who received the spam have
  reported it to services like Razor and DCC, which other filters will
  act on
- If it has to keep retrying addresses to my server, I'm consuming
  resources (however minimally) that could be used to send their junk
  to others

Again, I'm sure others could add more based on their experiences.

I'm not saying greylisting is without problems, that it just works out
of the box (initial and ongoing configuration is critical), or that
everyone should be using it, but there's a lot more going on here than
just outwitting poorly written bots.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Using Postfix and Postgrey - not scanning after hold

2016-07-29 Thread shanew

On the off chance that your decision to turn off greylisting was
related to Matus Uhlar's message that concludes with:
"if you run SA, there's no point in running greylisting anymore."

That could be interpreted to read "if you run SA at all, there's no
need for greylisting at all", but I don't think that's what he meant.
I think the correct interpretation (at least the one that makes sense
to me) is "during processing of mail, it makes no sense to run
greylisting after SA does its thing".

I would generalize that even more to say that greylisting should come
before any other content-based filtering (virus scanners, defanging,
etc.).

On the other hand, you may have disabled greylisting because you're
tired of futzing with it and just want your mail to work right again,
in which case, nevermind.



On Thu, 28 Jul 2016, Ryan Coleman wrote:


Doesn’t matter. I killed it. It’s gone.

I have eliminated postgrey from the installation and things are back to “normal”


On Jul 28, 2016, at 12:53 PM, Bill Cole 
<sausers-20150...@billmail.scconsult.com> wrote:

On 19 Jul 2016, at 15:50, Ryan Coleman wrote:


strange... how do you run spamassassin from postfix?


In master.cf like everyone else…


Um, not so much...


smtp  inet  n   -   -   -   -   smtpd
 -o content_filter=spamassassin

[...]

spamassassin unix - n   n   -   -   pipe
 user=spamd argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} 
${recipient}


FWIW, that's probably roughly the 5th most common way to integrate Postfix and 
SpamAssassin. I'd guess that amavisd-new as a before-queue filter is 1st, followed by 
amavisd-new as an after-queue filter, spamass-milter, and MIMEDefang (also a milter). 
There are pros and cons for every approach but a 'pipe' content_filter using spamc's '-e' 
option probably has the fewest "pros" and has the problems described at 
https://wiki.apache.org/spamassassin/IntegratedSpamdInPostfix. Also, you probably want 
'flags=Rq' in the pipe arguments and there is no '-f' argument documented for spamc, so 
that should probably go unless you know something the spamc man page doesn't...

A possible cause of your trouble could be spamc not knowing the correct way to 
talk to spamd. In that case, the '-e' option causes spamc to bypass spamd and 
just pipe its input to the given command, exiting with a successful return code 
unless that command fails. This seems to match what you're describing.





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Bayes filter marking everything as ham

2016-06-01 Thread shanew

On Wed, 1 Jun 2016, Reindl Harald wrote:



Am 01.06.2016 um 02:32 schrieb sha...@shanew.net:

 Kind of a shot in the dark, but are you sure everyone is promptly
 moving their spam out of the inboxes?  I worry about automated
 learning like this


autolearning has nothing to do with inboxes

http://www.maiamailguard.com/maia/wiki/sa-autolearn

"autolearn=ham, autolearnscore=-0.001"
"autolearnscore=-0.001" must be a bad joke in the config

hence it's dangerous, unpredictable and will sooner or later ruin your bayes 
without having a corpus where you could kill bad samples, move them from ham 
to spam or the other direction and just rebuild the bayes-db from scratch 
based on the fixed corpus, so you will end in wipe it and start from scratch 
(and need to take care of the minimum amount of training messages until bayes 
get enabled at all again)


I wasn't referring to SA's autolearning feature, which I agree can
suffer from feedback loops if your thresholds are set wrong (I set
my ham threshold to -2 for this reason).

That's why I used the phrase "automated learning" to distunguish OP's
"automated" cron jobs that calls sa-learn.  In retrospect, I should
have used words that more clearly distinguished it from the
autolearning feature.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Bayes filter marking everything as ham

2016-05-31 Thread shanew
 on
  # shortcircuit SUBJECT_IN_WHITELIST    on
  # shortcircuit USER_IN_BLACKLIST   on
  # shortcircuit USER_IN_BLACKLIST_TO    on
  # shortcircuit SUBJECT_IN_BLACKLIST    on
  # shortcircuit ALL_TRUSTED on
  # shortcircuit BAYES_99    spam
  # shortcircuit BAYES_00    ham






--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: spamass-milter: orphaned?

2016-05-26 Thread shanew

I hope he means a mechanism by which spamass-milter will allow
specified (in the config, not in the code) SA headers to actually get
added when they pass through spamass-milter.  The current behavior is
that four(?) SA headers are kept, but everything else is discarded.

I've wanted something like that for years, though not enough to
actually ask for it ;-)


On Thu, 26 May 2016, Andy Balholm wrote:


...some other headers to be pushed to mail SA generates


What do you mean?

Andy



--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Problem with SPF plugin and MX2

2016-05-25 Thread shanew

On Wed, 25 May 2016, Dianne Skoll wrote:


On Wed, 25 May 2016 13:05:57 +0200
Support SimpleRezo <simpler...@gmail.com> wrote:


We are expecting a problem when emails are coming from our MX2 with
the SPF plugin, because the SPF test is made on the last "Received"
IP and not the first one (as we can expect for a SPF test).



Does someone has already notice this? Can this be fixed by
configuration?


Yes.  Don't run a backup MX machine that relays to a primary machine
that does spam-scanning.  It's more trouble than it's worth, particularly
as spammers sometimes specifically pick the worst MX record rather than
the best.


It also seems problematic for your backup MX to accept an email only
for your primary to potentially reject said email later on.  At that
point you can no longer reject the mail, leaving the problematic (some
might say wrong) choices to either bounce it or drop it (or deliver
it, I suppose, if you're only using SA to provide info to end users).

Running the same SA setup on your backup would seem to minimize that
risk, but not totally eliminate it, since network-based tests might
return different results given sufficient time until your backup
finally transfers to the primary.

So, for those with more experience, what is the preferred way to run a
backup MX (or two or three, etc.) without losing or breaking the
benefit of spam filtering?

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Anyone else just blocking the ".top" TLD?

2016-03-28 Thread shanew

On Mon, 28 Mar 2016, Vincent Fox wrote:




On 03/27/2016 06:58 PM, Thomas Cameron wrote:

 Has anyone actually gotten a single legit message from that domain?


Never. WTF was ICANN thinking?

I occasionally go through the lists of abused gTLD here:
http://www.surbl.org/tld/


Thanks for that link.  If there were a nice source for how many total
domains were in each TLD you could calculate a useful signal to noise
ratio.

I was recently surprised when I had a user complain that a known
correspondent with a .xyz TLD was being blocked by our filter.  I
added a whitelist entry in the user's settings, but also explained
that the domain was _the_ primary reason it was blocked because all we
ever see from it is spam.

So apparently there are some legit (if clueless) users of some of
these TLDs.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: "Received" headers for rules?

2015-10-26 Thread shanew

On Mon, 26 Oct 2015, RW wrote:


On Mon, 26 Oct 2015 11:37:58 -0500 (CDT)
Shane Williams wrote:


I've created a header rule with "Received =~ /blahblahblah/", and I
just got a false positive on it when none of the Received headers in
the mail actually match.  I had a similar situation last week, and
(I think) found in the SA code where it will treat ezmlm headers as
if they were Received headers (which explained why it hit).


I had a quick look at the code and the only mention of ezmlm was
related to gated_through_received_hdr_remover() which looks for signs
that the email passed through something that might have stripped
headers. It tests the received headers, but doesn't modify them.


In my sleuthing, I found the part of Received.pm that looks for
"received" headers that don't actually start with "Received:" and adds
them on to the @hdrs array.  I thought I'd tracked down that one of
those alternate "received" headers was the ezmlm, which is related to
the email's path through various systems, so it made sense.

Unfortunately, with a weekend between when I looked at it and now, I
no longer see what led me to think that, nor can I remember which
email started my search, so it seems likely that I came to the 
wrong conclusion.


Instead, I think what was throwing me off is the fact that the
envelope-from gets checked as part of the Received header it appears
in, but then sendmail tears that out and puts it in the Return-Path:
header.  Add the fact that I'm running SA from a milter, and basically
I had no way to know exactly what the email looked like at the point
SA was analyzing it.  John Hardin's __ALL_RECEIVED rule suggestion
created the entries in the debug log that let me have a better idea
what SA was actually seeing and running rules against.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: "Received" headers for rules?

2015-10-26 Thread shanew

On Mon, 26 Oct 2015, Reindl Harald wrote:


Am 26.10.2015 um 17:37 schrieb Shane Williams:

 I've created a header rule with "Received =~ /blahblahblah/", and I
 just got a false positive on it when none of the Received headers in
 the mail actually match.  I had a similar situation last week, and
 (I think) found in the SA code where it will treat ezmlm headers as
 if they were Received headers (which explained why it hit).

 Is there anywhere, other than the code, where I can see what all
 headers might be checked as part of a "Recevied =~" rule?


what about posting details like the headers of said message and the whole 
rule instead hope for readers crystal balls?




Because the question I asked is not specific to any one email or rule,
but rather about how SpamAssassin processes mail (specifically
headers) in general.

Thanks to John Hardin for pointing out a way to determine (on a per
email basis even) what headers count as Received.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-20 Thread shanew

On Tue, 20 Oct 2015, Rob McEwen wrote:


On 10/20/2015 12:13 PM, sha...@shanew.net wrote:

 Unlike Larry (and others) I DO want to block the vast majority of the
 new tlds, because we see nothing but spam from them (and my users tend
 toward the more false-positives than false-negatives side of the
 spectrum).  Rather than maintain a list of all the problematic tlds,
 I'd rather have a blanket block rule with the ability whitelist the
 handful that might be legit. 


Be careful about doing this for the long term. I think that spammer exploit 
new TLDs because they know that many anti-spam systems don't account for them 
correctly at first. (and/or maybe they are cheaper at first?). But in the 
longer term (years down the road).. they tend to move on to other ones, while 
the legit TLDs slowly increase. So this strategy can backfire in the long 
term. (but, of course, MMV... and some smaller hosters don't have to be as 
concerned about a few extra FPs)


I totally agree.  In fact, I assume anything I'm doing right now to
successfully block spam could change tomorrow, much less months or
years from now.  For now, though, I'm seeing almost no legitimate
traffic from most of the new ones (I'm thinking of the longer ones
especially; .work, .ninja, .site, .science, etc.).

I already have rules that score for these tlds in received or envelope
from, but I'm getting tired of making the regular expression longer
and longer (in two different places), and I know there's a smarter
way.  Whether I'm smart enough to implement that smarter way is
another matter entirely.

Is there an existing (relatively simple) plugin that behaves similarly
that I could crib from?


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-20 Thread shanew

I've got 3.4.1 installed and sa-update runs regularly.

Unlike Larry (and others) I DO want to block the vast majority of the
new tlds, because we see nothing but spam from them (and my users tend
toward the more false-positives than false-negatives side of the
spectrum).  Rather than maintain a list of all the problematic tlds,
I'd rather have a blanket block rule with the ability whitelist the
handful that might be legit.

Is anyone doing anything like this (perhaps as a plugin)?


On Tue, 20 Oct 2015, Kevin A. McGrail wrote:


If you have 3.4.1 and use sa-update then we add new tlds to a rule file that
is then parsed.

This does not block those tlds. It let's the engine recognize the urls for
further rules.

If you have a tld that is missed and you are using 3.4.1 with sa-update, let
us know.
Regards,
KAM

On October 14, 2015 3:37:58 PM PDT, sha...@shanew.net wrote:

On Tue, 13 Oct 2015, Kevin A. McGrail wrote:
 At the end of the day, if you are having problems with new TLDs, ONE soluti
on
 is to use something that uses SA 3.4.1 and has sa-update configured so you
 get updates with said new TLDs.
I think maybe people are confused about how exactly this change helps
them get rid of all the spam that's coming from the "new" TLDs.
So, in other words, having just updated to 3.4.1, how does one go from
having a list of all the new TLDs that can now be nicely maintained
with sa-update to getting rules which actually score against the vast
majority of the new TLDs (since most of them seem to be 99.99% spam)?
I had created a local rule before moving to 3.4.1 that looks for new
TLDs in the Received, From and EnvelopeFrom
headers, but it was
obvious that this wasn't going to scale well.  Did the new system in
3.4.1 make this easier for me to do, or did it just make it possible
for new TLDs to be handed off to RBLs and the like (not that that's
not a major win)?
Any elaboration (or a pointer to documentation (not the man page))
would be greatly appreciated.





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-14 Thread shanew

On Tue, 13 Oct 2015, Kevin A. McGrail wrote:


At the end of the day, if you are having problems with new TLDs, ONE solution
is to use something that uses SA 3.4.1 and has sa-update configured so you 
get updates with said new TLDs.


I think maybe people are confused about how exactly this change helps
them get rid of all the spam that's coming from the "new" TLDs.

So, in other words, having just updated to 3.4.1, how does one go from
having a list of all the new TLDs that can now be nicely maintained
with sa-update to getting rules which actually score against the vast
majority of the new TLDs (since most of them seem to be 99.99% spam)?

I had created a local rule before moving to 3.4.1 that looks for new
TLDs in the Received, From and EnvelopeFrom headers, but it was
obvious that this wasn't going to scale well.  Did the new system in
3.4.1 make this easier for me to do, or did it just make it possible
for new TLDs to be handed off to RBLs and the like (not that that's
not a major win)?

Any elaboration (or a pointer to documentation (not the man page))
would be greatly appreciated.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: DCC whitelisting

2015-06-11 Thread shanew

On Wed, 10 Jun 2015, John Hardin wrote:


On Wed, 10 Jun 2015, Shane Williams wrote:


 Two examples that I know are legitimate senders, but get caught by DCC
 (and pyzor in some cases) and other rules that push them over the
 threshold are the SourceForge.net Project of the Month list and
 various Netflix emails to customers (New Arrivals or we just added a
 show you might like).  In both those cases, the user part of the
 env_from changes, and as I understand it, the DCC Whitelist doesn't
 allow wildcards, so I can't have an entry that matches the server
 part.  Maybe I could be using the substitute List-ID: syntax, but
 neither of those has List-ID as a specific header.


Can you reliably identify those at the MTA level and tell the SA glue to skip 
them entirely?


I probably could, but that also seems kludgy.  DCC has a whitelisting
capability, so why not use it?

Am I misunderstading what DCC's whitelist is intended for?


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: FPs on RCVD_ILLEGAL_IP

2015-04-21 Thread shanew

On Tue, 21 Apr 2015, Dianne Skoll wrote:


On Tue, 21 Apr 2015 16:56:48 +0200
Matus UHLAR - fantomas uh...@fantomas.sk wrote:


what if Microsoft starts using other IP range tested by
RCVD_ILLEGAL_IP?


Then it deserves what it gets.  Market forces are intended to penalize
companies that do stupid things and if we interfere in those market
forces, it will only encourage more stupid things.

Or you could look at it this way: RCVD_ILLEGAL_IP was a really good
spam indicator until Microsoft messed up, so by using those IPs Microsoft
is helping spammers by forcing spam-fighters to reduce or abandon a
pretty good rule.  Should that sort of behavior be rewarded?


I presume detecting forged Received headers was the point of this rule
all along, so if we all toss this rule out the window (or adjust to
exclude this edge case), aren't we potentially encouraging spammers to
hide their true networks in the same way?

It occurs to me that if MS are the only people who are doing this, a
meta-rule could counteract the score in that specific case.  If it
gets used much beyond that by legitimate actors though, that's a whole
other story.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: FPs on RCVD_ILLEGAL_IP

2015-04-20 Thread shanew

On Mon, 20 Apr 2015, Axb wrote:


On 04/20/2015 08:04 PM, Dianne Skoll wrote:

 Hi,

 Not sure if this is still an issue in 3.4, but I'm seeing tons of
 FPs on RCVD_ILLEGAL_IP.  Why?  Because Microsoft (damn it to hell)
 has started using RESERVED IP ranges internally!  Have a look:

 Received: from BLUPR10MB0835.namprd10.prod.outlook.com (0.163.216.13)
by BLUPR10MB0835.namprd10.prod.outlook.com (0.163.216.13)
with Microsoft SMTP Server (TLS) id 15.1.136.25;
Mon, 20 Apr 2015 17:43:48 +

 Is anyone else seeing a sudden uptick in RCVD_ILLEGAL_IP FPs?


There is an ongoing discussion about this with MS, thru backchannels.

They're intentionally using the 0/8 to mask internal IPs.
A very VERY bad choice and they have been advised that not only SA thinks 
it's a bad idea.


Axb


I'm so glad to finally see this mentioned on here, because I was
starting to doubt my own gut reaction that putting invalid IP
addresses in Received is all sorts of broken.  We noticed it last week
after someone from Microsoft mentioned getting a rejection from our
server, and looking back the first examples I was able to find of this
was from Apr. 6.  Before that emails following similar paths through
Microsoft servers weren't doing this.

I'm also happy to know there's some discussion going on with MS.
When I mentioned it to an MS friend of mine last week he didn't seem
particularly shocked that the internal headers wouldn't comply with
expectations, but he also seemed surprised that anyone was looking at
such headers as a way of determining spam.  Hopefully MS will take
this seriously, but I'm not holding my breath.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Scroring and SPF questions

2015-04-13 Thread shanew

On Mon, 13 Apr 2015, John Hardin wrote:


On Mon, 13 Apr 2015, Shane Williams wrote:


 Somewhat related questions:

 1. If I alter a rule's score to 0 locally, my understanding is that
 the rule won't even be tested for.  Does that also mean it won't count
 toward meta-rules?


That depends on how it's used in the meta rule. If it's used as an exclusion, 
setting it to always false won't suppress the meta.


Also: setting the score of a meta to zero won't suppress evaluation of its 
component rules.


The specific case I'm wondering about is as part of an arithmetic
expression, like (__RULE1__ + __RULE2__ + RULE3__)  2.
If I set __RULE2__ to a score of 0, is it now impossible for the meta
rule to trigger (since it can never get more than two points)?



 2. Is there a way to create a local rule that uses the DKIM/SPF
 information such that I could match to other headers.  In particular,
 I'm looking to either prevent (or at least counteract) the
 HEADER_FROM_DIFFERENT_DOMAINS rule when a mailing list is
 involved.  So what I'm looking for is a way to test SPF/DKIM against
 the mailing list origination point rather than the sender's.  Or
 perhaps I'm missing some smarter way to deal with these situations.


Simple subrules combined in a neta having a negative score. There are already 
subrules for detecting mailing list headers and for detecting an invalid DKIM 
signature. Write a meta that combines those, and give it enough negative 
points to offset the positive score.


Note, however, that mailing list headers are easy for spammers to forge.


What I was getting at (but perhaps not describing well) was finding a
way to compare the mailing list domain with DKIM or SPF in order to
ensure that the mailing list at least arrives from the source we would
expect.  It doesn't exactly detect mailing list header forgery, but
could take away a few points for the ones that can be verified.  That
said, there me be some reason this totally won't work, so feel free to
tell me so.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Uptick in spam

2015-03-27 Thread shanew

Apologies if this is an overly obvious answer, but are you using any
greylisting?  This would (potentially) move your user away from the
wavefront of a spam's distribution, and give it a better chance of
triggering the network-based tests.

On Fri, 27 Mar 2015, Amir Caspi wrote:

This is my whole issue -- since my user appears to be very high up on the recipient list 
for all these spammers, and is therefore getting spams before the network checks are 
effective, how can we combat these new spams _before_ the network checks 
become effective?

Thanks.

--- Amir




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Which milter do you prefer?

2015-03-23 Thread shanew

I just wanted to report that, despite what the spamass-milter mailing
list has to say, you can in fact hand spamass-milter an inet socket in
the config and it will happily listen on the network.  That'll teach
me to not just try stuff.

Also, thanks to everyone who had suggestions on specific milters as
well as glue for multiple filters.  I knew about many, but not all
of them, so it's given me lots to investigate (and in some cases
rediscover).

On Fri, 13 Mar 2015, David B Funk wrote:


On Fri, 13 Mar 2015, Shane Williams wrote:


 I've been reviewing the current landscape of anti-spam tools since I
 haven't set up a new system in a while, and one place I'm wondering
 what people are using is milters for spamassassin/spamc.

 It seems like spamass-milter is the default go-to for most people, but
 I'd really like one that can listen on an INET socket (and
 spamass-milter doesn't as far as I can tell, but please correct me if
 I'm wrong).  Milter-spamc from SnertSoft looks promising, but it's not
 free, and a bit more complicated.  smtp-vilter also looks interesting,
 but it does more than just SpamAssassin stuff, so might be overkill.

 And I suspect there are a bunch more out there (though a lot of these
 projects seem to have stalled or died over time).

 What are your favorite (not spamass-milter) options for plugging
 spamassassin into a milter?


Looking at the source for spamass-milter it looks like they're taking
the -p socket argument and passing it directly to smfi_setconn so
you should be able to give an INET socket address if you use the
correct syntax (see docs for smfi_setconn).

13 years ago I was doing a hunt similar to yours and came across
miltrassassin from digitalanswers.org. It was not quite what I
was looking for but closer than any of the others I found, so I took
it and started developing.







--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Which milter do you prefer?

2015-03-14 Thread shanew

On Fri, 13 Mar 2015, David B Funk wrote:


Looking at the source for spamass-milter it looks like they're taking
the -p socket argument and passing it directly to smfi_setconn so
you should be able to give an INET socket address if you use the
correct syntax (see docs for smfi_setconn).


The spamass-milter mailing list says you can't do this (and I don't
think the post about it was _that_ old), but I should probably give it
a try anyway.  Worst thing that happens is that it doesn't work.


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Which milter do you prefer?

2015-03-14 Thread shanew

I just came across that in my searching yesterday, but hadn't had a
chance to dig deeper.  I had seen roundhouse, and a few other things
here and there, but they all seemed lacking.  After all, as others
have mentioned, cloning your mail stream is not to be done lightly.

On Fri, 13 Mar 2015, Ted Mittelstaedt wrote:


 All this, of course, after searching high and low for a milter, proxy,

 or some other contraption that would allow me to clone a mail stream
 to a totally separate server without disrupting the original stream
 (like port spanning or a network tap, but for SMTP),



Need a better Search Engine, what you want is here:

http://www.dv8.ro/Synonym/synonym.html

Throw that Bing crap in the trash. ;-)

Ted

On 3/13/2015 3:35 PM, sha...@shanew.net wrote:

 Well, you don't have to try very hard to start a holy war around here
 ;-) Seriously, though, I wasn't thinking at the level of amavisd,
 mimedefang, or mailscanner.

 Those may come later, but the situation I'm in right this moment is
 that I'm taking over a very idiosyncratic mail environment, and I need
 to tune and monitor it's performance before switching over. Thus, the
 least disruptive option I see is to insert a milter for spamassassin
 in front of anything else, and then score / log messages without
 tampering with them in anyway, so they can continue through the milter
 chain as if it weren't even there (except for some slight delay).
 Spamass-milter does this, but it means running the milter on the
 existing system rather than just pointing sendmail to a remote
 milter. I may end up there anyhow, but I thought I'd ask first.

 All this, of course, after searching high and low for a milter, proxy,
 or some other contraption that would allow me to clone a mail stream
 to a totally separate server without disrupting the original stream
 (like port spanning or a network tap, but for SMTP), and finding
 nothing outside of alpha or beta to do that. If anyone knows of
 something like that, I'd be interested to hear about it as well.

 On Fri, 13 Mar 2015, Kevin A. McGrail wrote:

  On 3/13/2015 5:41 PM, Shane Williams wrote:
   What are your favorite (not spamass-milter) options for plugging
   spamassassin into a milter?
 
  Trying to start a holy-war on the list? ;-)
 
  +1 for MIMEDefang.
 
  Regards,

  KAM
 
 






--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Which milter do you prefer?

2015-03-13 Thread shanew

Well, you don't have to try very hard to start a holy war around here
;-)  Seriously, though, I wasn't thinking at the level of amavisd,
mimedefang, or mailscanner.

Those may come later, but the situation I'm in right this moment is
that I'm taking over a very idiosyncratic mail environment, and I need
to tune and monitor it's performance before switching over.  Thus, the
least disruptive option I see is to insert a milter for spamassassin
in front of anything else, and then score / log messages without
tampering with them in anyway, so they can continue through the milter
chain as if it weren't even there (except for some slight delay).
Spamass-milter does this, but it means running the milter on the
existing system rather than just pointing sendmail to a remote
milter.  I may end up there anyhow, but I thought I'd ask first.

All this, of course, after searching high and low for a milter, proxy,
or some other contraption that would allow me to clone a mail stream
to a totally separate server without disrupting the original stream
(like port spanning or a network tap, but for SMTP), and finding
nothing outside of alpha or beta to do that.  If anyone knows of
something like that, I'd be interested to hear about it as well.

On Fri, 13 Mar 2015, Kevin A. McGrail wrote:


On 3/13/2015 5:41 PM, Shane Williams wrote:

 What are your favorite (not spamass-milter) options for plugging
 spamassassin into a milter?


Trying to start a holy-war on the list? ;-)

+1 for MIMEDefang.

Regards,
KAM




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew