https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7953
Bill Cole <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |INVALID --- Comment #4 from Bill Cole <[email protected]> --- (In reply to Cian from comment #2) [...] > I understand that 5 is the > default threshold and I am (according to mail-tester, which I recognize now > is flawed) below it, but my mail is confirmed to be going to junk. That is always a customized local choice. SA has no facility for deciding how mail is delivered, it only provides a score, a list of matched rules, and a spam/ham judgment. > Is it > possible that sys-admins at several large organizations are using SA with a > stricter threshold? I understand that their choice to use SA in a > non-recommended way isn't your fault, but it raises the stakes on broad > rules and makes false positives more likely. It is POSSIBLE and LIKELY. However, most people who do that also understand that they need to have a lot of mitigating customizations. Just lowering the threshold to 4 without rescoring many rules and adding However, if you are talking about MS or Google or Yahoo or any other REALLY big mail operations: no. They don't use SA. They all use their own bespoke proprietary filtering tools. SA really does not fit their operational models. > >See https://ruleqa.spamassassin.org for the details of how our rules score > >against the manually classified corpora of ham and spam provided by some of > >our users. This is an open system and we are always eager to add new > >dependable sources to those corpora to get a wider sample. You can see in > >that system that the rules you see as problematic match messages that are > >97-100% spam > > Thank you for sharing that tool with me, I was not aware of it. Am I > understanding correctly that the QA for PDS_OTHER_BAD_TLD is based on 17 > corpuses? And that those corpora come from the submissions of just 9 > testers? Yes. Roughly half a million messages per day. A very small sample, relative to the actual size of "all email" but not so tiny in the scope of the mail SpamAssassin actually sees. We have no good way to know how large a footprint SA actually has in the world, so we also can't say anything about whether the sample is big enough or diverse enough. There is absolutely a degree of selection bias because it does take some effort to do the necessary analysis and reliably submit the data. > Is it possible that there are industries not represented in that > QA? It is CERTAIN, but we have no way to know where exactly those gaps are. More submissions would be great. > If none of those 9 testers happens to work within the space technology > sector, it seems natural that they would not receive much Ham from .space > domains, even though there is a whole industry where it would be expected to > receive mail from those domains. Correct. There's a conceptual oddity here. SA is not designed (and cannot be) to be equally effective and safe for all mail streams with just the default rules & scores and no local adaptation. The best we can do is to adapt to the mail streams that SA users actually have, to the extent that they provide feedback. Submitting masscheck corpora is one form of feedback, bug reports and mail to the [email protected] mailing list are others. We DO try to fix 'squeaky wheels' in many circumstances, when given some evidence that SA (and not something else) is causing specific mail to be misclassified. The aforementioned mailing list (whose archives are public) is full of examples where someone presents a problematic rule and something concrete to show that it's causing a problem, and we work to fix it. > I'm not writing this to hassle you, Bill, I'm here because my whole business > depends on it. I understand that, and I'm not trying to be dismissive. There's just not any great way to address your (real) problem without potentially causing direct failures of SA to get classifications *of spam* correct for the people who actually use SA. I have been trying for some time to think up ways to do better oversight of what's in the 'bad TLD' lists so that we can say with more grounding that a particular one still belongs there. I just have to devise test rules that will provide better data to figure that out without wreaking havoc. > I have done what research I could, followed the directions > from NameCheap and Zoho when setting up my domain and email, I set up DKIM > and DMARC and SPF, I have looked through the SpamAssassin wiki. I could > take the advice from the SA wiki and get a deliverability consultant, but > besides the fact that I can't afford it, it seems absurd to pay hundreds of > dollars to be able to send a few handcrafted emails a day to individual > recipients. Agreed. If you're not sending thousands of messages at a time, a deliverability consultant is not going to be terribly helpful and definitely won't be cost-efficient. One thing that can help is to keep your mail simple but not cryptically so. Plain text delivers better. Most 1-1 mail uses a standard format that includes both a plain text version and an HTML version, duplicating the text content. If you don't actually need fancy formatting and inline graphics, the HTML part is not really helpful. Most mail programs (Outlook, Apple Mail, etc) can also send HTML-only mail, which is a bad idea for more reasons than spam filtering but can get your mail into many spam folders. Most mail programs (other than webmail...) can also send mail as just plain text, which is generally more reliably delivered. You also may find yourself using email as a way to share files via attachments, which isn't bad per se. However, since a lot of spam is only an attachment with little or no text, sending a message with (for example) just a PDF or other image attached without a real text body is risky. Messages with empty subjects, overlong subjects, emojis or other non-ASCII characters in the subject, or a large number of recipients can also have issues of smelling too much like spam. These basic principles apply to SA, but more importantly they apply to a broad range filtering tools. In other words: if GMail or MS365 are your problems, you may be able to simplify your mail out of the spam folder. > I could write to the sys-admin of each and every organization I > want to contact and ask to be whitelisted, but I suspect I know how that > will go. You might be surprised. If this is just 1-1 messages and you can enlist eager recipients of your mail to plead your case with their admins, you may have a manageable number of sites to get fixed and not a lot of resistance. Mail admins do not like having customers upset with spam-filtering run amok. It's a reason people change providers and a reason mail admins get fired. > If you have any idea what else I can do, any other lead I can follow, I'll > go after it and be out of your hair, but I'm here because the only *hint* of > a reason that might explain why my emails aren't going through is points > deducted on SA for having a domain that ends in ".space" The best place for this conversation is the [email protected] mailing list, particularly if you have a concrete example that you can share of a message (incl. headers) that someone had to rescue from a spam folder. That list has mostly helpful people, all of whom have some knowledge of SA specifically and of spam filtering and deliverability issues more generally. I can almost guarantee that you'd get better (or at least more) helpful suggestions from that broader audience than you can via Bugzilla. -- You are receiving this mail because: You are the assignee for the bug.
