As many of you have noticed, the ASF Infra VM used by the SpamAssassin project 
has been clobbered hard by bots apparently inventing URLs which ultimately 
demand that the ruleqa.cgi script unpack compressed archived data, often giving 
up on the query before it can be fully answered.

For some months I've used expansive and aggressive IP blocking (using iptables) 
denying connections from the networks that had the most miscreants. This 
predictably led to a whack-a-mole arms race with the bots spreading themselves 
across increasingly more increasingly obscure networks. At first it was all 
Huawei Cloud and UCloud, but as of yesterday I was adding new /20 networks by 
the hundreds per hour and that was not keeping up with the new sources enough 
to keep the load average below 10.

Today I have removed all 20k+ blocked networks and applied a different logic: 
all requests that match certain expensive URL patterns are forbidden (HTTP 408) 
unless they include a "Referer" (the misspelling is canonical) header from the 
ruleqa site itself. As all of the legitimate requests I could find in the logs 
for such URLs included a local Referer.

Once upon a time, privacy paranoids tried to get everyone to stop using Referer 
headers but AFAICT that pattern died out because it  breaks too much. That "too 
much" now includes the RuleQA site because of the scraper bots.

I regret to some degree that we are no longer injecting the absolute garbage of 
RuleQA detailed stats into the sewage intake of LLM scams, but they were just 
too thirsty for it all.

-- 
Bill Cole

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to