I would like to share my 45-day experience with running spamd and my
observations and how I'm allowing mail from SMTP clusters to bypass
spamd. Feedback and discussion would be greatly appreciated.
I have two domains that I have been using for my businesses: one is 13
years old and the other is 8 years old. I have never had a spam problem
until about six months ago. In October I was getting about 100-200 spams
per day per domain. The spam rate was increasing from month to month.
All mail was going directly to my OpenSMTPd. I was not using filtering
of any kind so the signal-to-noise was very low, and frustrating.
So I read the spamd and related man pages and enabled spamd on my
firewall on November 1. I was astonished! I literally got 6 spam emails
that first week for both domains!
However, the big problem was, I also wasn't getting legitimate business
emails that were sent from SMTP clusters/pools. After studying my logs,
tweaking spamd(8) flags, looking to external solutions (DNSBL, SPF,
reverse IP verification), I had some observations and discovered some
patterns. Here's the solution I'd like to share:
I wrote two very small scripts: spamd-dnsbl and spamclusterd. These
scripts work together to keep spam to a minimum while passing all
legitimate email (in my case so far).
1) spamd-dnsbl: Queries a DNSBL using the IPs in spamdb(8). If an IP is
on a black list it is added as a TRAPPED entry in the spamdb. The script
only checks IPs which have been added since last run. Currently, only
the zen.spamhaus.org DNSBL is queried because I found it to be the most
true of all those listed at
http://en.wikipedia.org/wiki/Comparison_of_DNS_blacklists.
Alternatively, multiple DNSBLs could be queried and the results could be
used in aggregate to determine spam status, thus promoted to TRAPPED.
2) spamclusterd: Queries spamdb(8) for networks to whitelist, which it
adds to a pf table that bypasses spamd. So before this script gets
carried away allowing IP blocks to bypass spamd, the spamdb(8) is first
pruned of spammers using the spamd-dnsbl script.
I've only been running this setup for about 30 days, but I haven't
missed an email yet; plus spam is still about 1 per day across both
domains. I receive emails from all the common SMTP clusters, such as
Gmail, Microsoft (hotmail.com, outlook.com, msn.com, etc.), and Yahoo
but also US government agencies such as, mail.mil, usmc.mil, uscg.mil,
irs.gov, etc.
I noticed a pattern of commonalities of these legitimate sending clusters:
1. The envelope's from and to addresses are identical across tuples.
2. The HELOs are very similar, with the TLD from each tuple almost
certainly the same.
3. They make multiple attempts from different IP addresses, however, the
IPs differ only by a few bits. (Caveat: I'm only using IPv4)
These 3 points are the basis of spamclusterd. How it works is, if two or
more GREY tuples with matching "to" and "from" addresses, HELOs with
matching TLDs, and IPs with matching network bits (/24), then add the
/24 network to the spamd-cluster table in pf, which bypasses spamd.
I was going to get fancy and do an SPF lookup and try to determine the
exact network to whitelist, but simply whitelisting a 256 IP block seems
good enough. Once in awhile the subsequent client IP will be outside
this block, but the /24 seems to work better than 90% of the time.
Currently, just two client IPs from the same /24 network is enough to
get that network whitelisted, which seems like a low bar. However, with
the prior DNSBL pruning, this seems sufficient for now.
## Some other observations ##
Spammers, even if sending from the same IP or IP network and regardless
of the
TO address, tend to randomize the FROM and/or HELO. Therefore, in the
case of my spamclusterd script, whitelisting a spammer is less likely
when ensuring both HELO and FROM match for multiple tuples. These IPs
will then continue to deal with spamd, and it's business as usual.
I initially tried setting 1 minute passtime and 12 hour greyexp times
for spamd (i.e. -G 1:12:864) in hopes to eventually whitelist a client
IP, originating from a cluster, that has reattempted within that large
window. However, in my first week, I missed a couple of Gmails which
resent for 5+ days and ultimately failed to deliver. What was
interesting was one of the Google server IPs retried after 12 hours and
3 minutes, just missing the grey window, while others retried after 24
hours. I now set -G 1:10:1080.
It seems safe to assume a spammer if reverse IP lookup returns NXDOMAIN
and IP
is on at least 1 reputable DNSBL or lookup returns SERVFAIL after two
attempts.
Using SPF seems unreliable as of 11/22/16. Tested SPF on hundreds of IPs
in spamdb using the ruby spf gem. More than half the IPs did not specify
SPF or it failed in some
way.
If the envelope's "from" is our domain (i.e., to and from addresses are
the same domain), it is definitely a spammer because we only send our
mail to the submission port and never to the smtp port. For example,
there are currently 217 grey entries and 31 meet this criteria. However,
these spammers almost never resend so not worth it to blacklist them
after the first connection attempt. What would be best is if we could
blacklist these spammers upon first connection (for example, add flag to
spamd(8) that doesn't allow email from ourselves because we authenticate
and submit mail to submission port 587, which could use domains from
spamd.alloweddomains).
Thank you for reading this far. Please let me know if you would like
clarification or have questions. If there is interest in my scripts, I
can send those as well.
Thanks to all the developers who made spamd; an amazing, simple, clever
tool.