I would like to share my 45-day experience with running spamd and my observations and how I'm allowing mail from SMTP clusters to bypass spamd. Feedback and discussion would be greatly appreciated.

I have two domains that I have been using for my businesses: one is 13 years old and the other is 8 years old. I have never had a spam problem until about six months ago. In October I was getting about 100-200 spams per day per domain. The spam rate was increasing from month to month. All mail was going directly to my OpenSMTPd. I was not using filtering of any kind so the signal-to-noise was very low, and frustrating.

So I read the spamd and related man pages and enabled spamd on my firewall on November 1. I was astonished! I literally got 6 spam emails that first week for both domains!

However, the big problem was, I also wasn't getting legitimate business emails that were sent from SMTP clusters/pools. After studying my logs, tweaking spamd(8) flags, looking to external solutions (DNSBL, SPF, reverse IP verification), I had some observations and discovered some patterns. Here's the solution I'd like to share:

I wrote two very small scripts: spamd-dnsbl and spamclusterd. These scripts work together to keep spam to a minimum while passing all legitimate email (in my case so far).

1) spamd-dnsbl: Queries a DNSBL using the IPs in spamdb(8). If an IP is on a black list it is added as a TRAPPED entry in the spamdb. The script only checks IPs which have been added since last run. Currently, only the zen.spamhaus.org DNSBL is queried because I found it to be the most true of all those listed at http://en.wikipedia.org/wiki/Comparison_of_DNS_blacklists. Alternatively, multiple DNSBLs could be queried and the results could be used in aggregate to determine spam status, thus promoted to TRAPPED.

2) spamclusterd: Queries spamdb(8) for networks to whitelist, which it adds to a pf table that bypasses spamd. So before this script gets carried away allowing IP blocks to bypass spamd, the spamdb(8) is first pruned of spammers using the spamd-dnsbl script.

I've only been running this setup for about 30 days, but I haven't missed an email yet; plus spam is still about 1 per day across both domains. I receive emails from all the common SMTP clusters, such as Gmail, Microsoft (hotmail.com, outlook.com, msn.com, etc.), and Yahoo but also US government agencies such as, mail.mil, usmc.mil, uscg.mil, irs.gov, etc.

I noticed a pattern of commonalities of these legitimate sending clusters:

1. The envelope's from and to addresses are identical across tuples.

2. The HELOs are very similar, with the TLD from each tuple almost certainly the same.

3. They make multiple attempts from different IP addresses, however, the IPs differ only by a few bits. (Caveat: I'm only using IPv4)

These 3 points are the basis of spamclusterd. How it works is, if two or more GREY tuples with matching "to" and "from" addresses, HELOs with matching TLDs, and IPs with matching network bits (/24), then add the /24 network to the spamd-cluster table in pf, which bypasses spamd.

I was going to get fancy and do an SPF lookup and try to determine the exact network to whitelist, but simply whitelisting a 256 IP block seems good enough. Once in awhile the subsequent client IP will be outside this block, but the /24 seems to work better than 90% of the time.

Currently, just two client IPs from the same /24 network is enough to get that network whitelisted, which seems like a low bar. However, with the prior DNSBL pruning, this seems sufficient for now.

## Some other observations ##

Spammers, even if sending from the same IP or IP network and regardless of the TO address, tend to randomize the FROM and/or HELO. Therefore, in the case of my spamclusterd script, whitelisting a spammer is less likely when ensuring both HELO and FROM match for multiple tuples. These IPs will then continue to deal with spamd, and it's business as usual.

I initially tried setting 1 minute passtime and 12 hour greyexp times for spamd (i.e. -G 1:12:864) in hopes to eventually whitelist a client IP, originating from a cluster, that has reattempted within that large window. However, in my first week, I missed a couple of Gmails which resent for 5+ days and ultimately failed to deliver. What was interesting was one of the Google server IPs retried after 12 hours and 3 minutes, just missing the grey window, while others retried after 24 hours. I now set -G 1:10:1080.

It seems safe to assume a spammer if reverse IP lookup returns NXDOMAIN and IP is on at least 1 reputable DNSBL or lookup returns SERVFAIL after two attempts.

Using SPF seems unreliable as of 11/22/16. Tested SPF on hundreds of IPs in spamdb using the ruby spf gem. More than half the IPs did not specify SPF or it failed in some
way.

If the envelope's "from" is our domain (i.e., to and from addresses are the same domain), it is definitely a spammer because we only send our mail to the submission port and never to the smtp port. For example, there are currently 217 grey entries and 31 meet this criteria. However, these spammers almost never resend so not worth it to blacklist them after the first connection attempt. What would be best is if we could blacklist these spammers upon first connection (for example, add flag to spamd(8) that doesn't allow email from ourselves because we authenticate and submit mail to submission port 587, which could use domains from spamd.alloweddomains).

Thank you for reading this far. Please let me know if you would like clarification or have questions. If there is interest in my scripts, I can send those as well.

Thanks to all the developers who made spamd; an amazing, simple, clever tool.

Reply via email to