As many of you have noticed, the ASF Infra VM used by the SpamAssassin project has been clobbered hard by bots apparently inventing URLs which ultimately demand that the ruleqa.cgi script unpack compressed archived data, often giving up on the query before it can be fully answered.
For some months I've used expansive and aggressive IP blocking (using iptables) denying connections from the networks that had the most miscreants. This predictably led to a whack-a-mole arms race with the bots spreading themselves across increasingly more increasingly obscure networks. At first it was all Huawei Cloud and UCloud, but as of yesterday I was adding new /20 networks by the hundreds per hour and that was not keeping up with the new sources enough to keep the load average below 10. Today I have removed all 20k+ blocked networks and applied a different logic: all requests that match certain expensive URL patterns are forbidden (HTTP 408) unless they include a "Referer" (the misspelling is canonical) header from the ruleqa site itself. As all of the legitimate requests I could find in the logs for such URLs included a local Referer. Once upon a time, privacy paranoids tried to get everyone to stop using Referer headers but AFAICT that pattern died out because it breaks too much. That "too much" now includes the RuleQA site because of the scraper bots. I regret to some degree that we are no longer injecting the absolute garbage of RuleQA detailed stats into the sewage intake of LLM scams, but they were just too thirsty for it all. -- Bill Cole
signature.asc
Description: OpenPGP digital signature
