How to configure FOO=-1.0 in X-Spam-Status ?
Hi I'm seeing X-Spam-Status headers from some other installation come with =$x appended to the individual matches, which evidently helps figuring out why a mail is being classified the way it is. I've spent more than an hour on googling and rtfm but couldn't figure it out. Also, grep does not turn on any occurrence of 'Spam-Status' in the source code, and I don't feel like reading all of the source code for this right now. Please tell me how I can set this up. Thanks! Christian.
Re: SPAM from our own domain
I've noticed Haraka (and I've used software by Matt Sergeant before, AxKit and qpsmtpd). I'm using qpsmtpd as the SMTP daemon already (that's how you do spam filtering in a Qmail setup, and qmail is the original delivery backend for qpsmtpd), which has the same architecture as Haraka (single process, event based, plugins). I haven't tried it, so I wouldn't know, but I'm still surprised that you suggest it is similar(?) to Qmail. It hasn't triggered my interest so far; I don't need raw speed for my purposes, and if I add a tool in another programming language then I might wait for a replacement in a language that allows for static verification (like Rust, Haskell, Ocaml) or has a process model built in (Erlang/Elixir); I know some MTAs already exist in those languages, although perhaps not stable or complete enough for me right now. Perhaps let's stop or move this discussion elsewhere now as it's probably OT. I wanted to offer Tom my help, I didn't ask for help about MTAs here. Christian.
Re: SPAM from our own domain
I don't have any problems with Qmail. I can code when needed for the missing parts. The unsolved problems I have with my mail setup would not be solved by going with another MTA. I like and know Qmail's configuration, modularity and security track record. This way my time is spent learning how things work, as opposed to how to use another MTA. Ch.
Re: SPAM from our own domain
Hi Tom, You write that you're using Qmail. If you'd like to implement DKIM signing for your outgoing mail (for possibly better deliverability and also so that you could verify incoming mail and be sure that it's not your own by verifying the DKIM signature), then perhaps you're interested in what I've worked on: I didn't want to patch Qmail for DKIM support, so I took an existing, older script that works as a wrapper for qmail-remote and improved it. It can now also add hashcash stamps (which may or may not actually be useful, but in any case doesn't hurt me since I've got plenty CPU time for it), and checks whether the mail to be sent is a bounce for a likely spam, in which case it diverts it locally to avoid sending backscatter. This has only been tested by me so far, so YMMV.. tell me if you run into problems. https://github.com/pflanze/better-qmail-remote I've just made a release v0.1, you can verify its signature with `git tag -v v0_1` using the key: http://christianjaeger.ch/pgpkey-A54A-1D7C-A1F9-4C86-6AC8--1A1F-0FA5-B211-04ED-B072.asc Cheers, Christian.
Re: Bayes Filtering
On August 2, 2015 6:40:10 PM CEST, Reindl Harald h.rei...@thelounge.net wrote: no idea what you are talking about by saying I can't find anything about this in the docs I'm talking about the bundled docs. The man / perldoc pages of Mail::SpamAssassin::Plugin::Bayes / Mail::SpamAssassin::*Bayes* and the default config files. That's where I expected this info to be. It's something simple and basic, i.e. something that the writer of the software can foresee the need for documentation, so it makes sense that it's in the same files that the programmers wrote. That's where I start looking. That's where qpsmtpd, which I'm configuring around the same time, has its basic docs. Ch.
Re: Bayes Filtering
On August 2, 2015 7:36:36 PM CEST, RW rwmailli...@googlemail.com wrote: In future start with man spamassassin which will lead you to: CONFIGURATION Mail::SpamAssassin::Conf SpamAssassin configuration files I think I've actually seen this page rececntly. I also remember having looked through the bayes_* options (about a week ago) to see whether there's one that might indicate the number of required messages learnt, but couldn't find any (now I've seen bayes_min_ham_num / bayes_min_spam_num). I don't know how that happened, perhaps I was seeing another page (perhaps online), perhaps I had too many things in my mind then at the same time and was interrupted or unconcentrated (when starting to configure these systems (DNS, qmail, ezmlm, qpsmtpd, dovecot, SA), there are just too many things to take care of to not make errors sometimes). Normally the main man page has the name of the project or its main executable. It's not normal to document how a feature is configured in the documentation for library that implements that feature. qpsmtpd is different here since it has a plugin architecture and then it definitely makes more sense to document things in the plugins, which are just modules. If spamassassin does not have such an architecture then I agree it makes sense to document options where they are processed, i.e. the module which parses them. I know I'm sometimes confusing spamassassin and qpsmtpd. Both are in Perl and used together in my setup. I've grown a habit to thinking docs are in the modules and when I was checking the SA docs again before sending my post I followed this habit without realizing that it's not the configuration of a qpsmtpd plugin in this case. Please don't judge me too hard, I'm trying to get on with things as quickly as I can like most everybody, I've got other things on my plate, too. So, I don't have a suggestion for improvement. Hopefully my post still helped the OP? Cheers, Christian.
Re: Bayes Filtering
On August 2, 2015 5:15:08 PM CEST, Reindl Harald h.rei...@thelounge.net wrote: Am 02.08.2015 um 14:57 schrieb Roman Gelfand: Could somebody post a successful bayes configuration? ?? you just need to *train* it for ham *and* spam I think I remember from past use of SA that it only uses the bayes database once a certain number of messages have been learnt. It has confused me, too, now. I can't find anything about this in the docs, though, and neither have I found a test in the sources by way of searching for 'number', but that's not a thorough check. If I remember this detail correctly, it would be a good idea to add it to the docs. Ch.
Re: Hashcash not working
On July 31, 2015 4:37:14 PM CEST, RW rwmailli...@googlemail.com wrote: SA usually gets envelope information from headers. Since there are several headers that could contain the envelope recipient, it would need to be configured, so still wouldn't work by default. That's why I mentioned RECIPIENT. The MTA knows where it's going to, the information just needs to be passed on to SA. It's probably for the best that it doesn't work by default. It would likely have been exploited by spammers if it were. Well, it seems that right now hashcash is of no use. If we actually use it then the worst that could happen (in the case that spammers can really generate hashcash as easily as legitimate senders) is that it's also of no use. But isn't there also a chance that it's not turning out as bad? Hashcash for email isn't a very good idea. Even if it were ubiquitous and email couldn't be sent without it, it wouldn't be a major impediment to spammers. If spammers don't have to add a hashcash header to everything, it doesn't even slow them down, it's just an opportunity to make some of their spam more deliverable. I don't really see the logic in your statement. It doesn't need to be ubiquitous, my thinking is that it would be useful as an additional indication for *important* email that the email isn't spam (especially if end client applications (web or otherwise) would adopt it, so that it could use something like 20 seconds of CPU time). E.g. not for mailing list emails, but for personal email where you don't want the email to be lost (have a button that says retry more forcefully or something, that you could push when you suspect the receiver didn't get a mail, or when you're contacting someone the first time and think it's important, that then does the 20 second (or more) hashcash calculation). 20+ seconds would be rather hard to compete against I'd think. If it means that a spammer could only afford say 2 seconds, and even for the 2 seconds would have to reduce sending rate to a tenth, that would already be good? If it means that they can only make *some* of ther spam as well deliverable as currently, that's also success, no? I expec t the scores to adapt so that low-effort hashcash would have zero effect on the spam score, but high-effort hashcash would still point towards ham. I think it boils down to the question of whether spammers really have enough CPU power for multi-second hashcash per recipient calculations (or, as much as legitimate senders). Others have argued that the heat/fan activity would make some people more suspicious and make them get rid of the abuser. (This by itself would already be a good thing.) I also wonder whether it wouldn't be more worthwhile for criminals to use the available CPU power for Bitcoin mining instead? Any sources for numbers? Why not simply try it? Wouldn't the worst case be that the scores would be adapted to around zero when spammers would really start using it? Is it fear of making the system more complex and then not understanding it anymore? (BTW is there a framework in SA to statistically analyze combinations of characteristics? So that by learning (sa-learn) client installations could adapt automatically? Or is that too CPU heavy? Or precalculate the data for everyone but let client installations adapt those (implicit) 'scores' through learning?) Christian.
Re: Hashcash not working
On July 31, 2015 4:51:02 PM CEST, Bill Cole sausers-20150...@billmail.scconsult.com wrote: John Levine wrote a definitive debunking of e-postage schemes including hashcash over a decade ago (http://www.taugh.com/epostage.pdf) and published an update (substantively unchanged) via Virus Bulletin in 2009 (https://www.virusbtn.com/spambulletin/archive/2009/03/sb200903-epostage.dkb?mobile_on=no). All of his points against e-postage in general and hashcash specifically have held up over time. I've read both links, they both bring the same two arguments: The technical problems are that some computers are a lot faster than others I see a social problem with this: that in principle it penalizes poor people. But let me restate: As I already said in my other email, for me hashcash seems to make sense where you really need to deliver a particular important, personal email. I don't care for a fairy dust solution that would solve sending legitimate mass email (be it mailing lists or ). I'm fine with those being filtered the way they are now. I'm caring to reduce the risk of loss of *important* emails, especially in situations where currently the risk is high, i.e. there's no whitelisting through previous communications. Those cases are few. It's easy to spend even minutes of CPU time on such cases. Or, since the article argues that grandma has a 100 Mhz computer, the ISPs could offer premium email, where the piece costs a few cents (hey, cheaper than SMS with many providers!), and then run hashcash on a few powerful servers in parallel for a minute with a total CPU budget of several minutes. Now I would expect that ISPs in 3rd world countries would offer hashcash generation for a lower margin, and hence even people there could easily afford sending important mails with hashcash. (If grandma's ISP wouldn't offer premium email, she'd have to send the email without hashcash, and it would still have a decent chance of deliverability, or she would have to let her computer up for an hour until it is sent. As I said, it would be rare to need it.) Yes, that's when user's clients get the ability to compute hashcash, and ISPs adopt it. I.e. when it really catches on. Before that point, there's a phase where we're experimenting and hashcash doesn't play a big role in spam recognition (and grandma doesn't even come into play). The article argues in an absolute that ignores possible developments. and that currently spammers have a lot more computer power at their disposal than legitimate senders do Furthermore, spammers have vast arrays of hijacked `zombie' computers at their disposal. Blacklist maintainers report adding 10,000 newly hijacked computers to their blacklists per day. No legitimate mailer has anything like 10,000 computers dedicated to sending mail, much less 10,000 additional computers a day, meaning that it would be easier for spammers to satisfy hashcash than for legitimate senders. It compares a daily differential in the numbers of hijacked computers worldwide with the numbers of computers available to a single mailer? (How many are *removed* from the blacklists per day, btw?) Please give me actual real numbers and I can do actual calculations. So where's the actual debunking? Christian.
Re: Hashcash not working
On July 31, 2015 9:13:03 PM CEST, RW rwmailli...@googlemail.com wrote: On 31 Jul 2015 17:57:28 +0200 Christian Jaeger wrote: On July 31, 2015 4:37:14 PM CEST, RW rwmailli...@googlemail.com wrote: SA usually gets envelope information from headers. Since there are several headers that could contain the envelope recipient, it would need to be configured, so still wouldn't work by default. That's why I mentioned RECIPIENT. The MTA knows where it's going to, the information just needs to be passed on to SA. You're making some assumptions about how SA is being used. When does RECIPIENT break? man qmail-command says that RECIPIENT is the envelope recipient address. Shouldn't this be the unchanged To/Cc/BCC address that the mail is currently being delivered to, assuming no forwarding was done? I can see why they went with hashcash_accept, it always works - even if the recipient is rewritten. I don't expect hashcash in forwarded email to be found without special configuration. If it finds the matching hashcash in non-forwarded configuration that sounds fine to me. I don't really see the logic in your statement. It doesn't need to be ubiquitous, In the hashcash FAQ they argue that hashcash is useful against botnets because it slows them down. But this would only be correct if hashcash were essential to delivery. If it isn't then hashcash support in spamfilters would benefit spammers because they can send a mixture of spam with and without the header. They'd get extra deliverability without any slow down at all. Hm, I see your point, they could use the CPU they have available but still saturate their network capability, too. The effect will be complicate to calculate. Possibly by sending spams without hashcash over the same network their IPs will be blacklisted enough to prevent the spam with hashcash from being delivered either. I guess their strategy will be to pregenerate as much hashcash as they can, then first send spam with hashcash, then when they've run out of hashcash send spam without, thus staying more likely in the green while they have hashcash then continue as long as they can or makes sense without. (I don't have deep insights into how spammers work, I'm just reckoning here. Hopefully at least as well as the writers of some articles.) One of the problems with hashcash is that its algorithm is well optimised for GPUs and other heavily parallel hardware. The 20 seconds on an ordinary core could be milliseconds on a machine made from just gaming hardware. Normal CPUs have SIMD instructions, and one could use all cores, then the difference shouldn't be that vast (make that number of milliseconds something in the range of thousands, then?). But agreed, scrypt would make more sense here. This is an attack on the hashing algorithm, not the concept as a whole. (Calculating hashes in browsers will eagerly await widespread support of SIMD in JavaScript; but this is again a problem that could go away if hashcash really got successful, browsers could include hashing functions implemented in C/C++/Rust/ASM.) Spammers also have the advantage that they don't have to work in real time - they can generate postdated stamps in advance of a spam run. Ok, that means they can keep their moves quick (quick bursts until IP blocked etc.), but the total amount of hashcash they can produce stays the same. (Also see the above.) Maybe the concept could be extended to use a challenge-response scheme (e.g. where the receiving SMTPd would present a challenge, then let the sender (optionally) disconnect, calculate the hashcash with the challenge as additional input, then reconnect; or provide the challenge over DNS with short TTL). Is there a(nother) good place already to discuss these concepts? (Wiki, etc.) I don't intend to 'spam' this list too much with this. But I think it's interesting to read and think more about this. (There seems to be a ML linked from hashcash.org, but the last message in the archives is from 2012.) Christian.
Re: Hashcash not working
On July 30, 2015 2:40:35 AM CEST, RW rwmailli...@googlemail.com wrote: The plugin is on by default and use_hashcash defaults to 1, but you need to set hashcash_accept to an appropriate value That's disappointing. For me that barely counts as on by default. I was thinking that implementing hashcash would help get my mail delivered to at least the spamassassin users, but this means that no, only to the subset that cares about configuring it. Does SA not know which address(es) an email is being delivered to? If it knows (knew), it could just compare those addresses, no? (E.g. qmail sets various environment variables, e.g. RECIPIENT, when running filters, can't SA use this? I'm using QPSMTPD, I suppose spamc could be modified to pass recipients, too?) If the answer is no, then I realize that there's also an accidental double-spend issue? My qmail-remote wrapper adds a X-Hashcash header for every receipt address the qmail-remote is being called with. I was thinking that the receiver could restrict itself to only look (and mark in the database) the header for the delivery that's being made. Now I worry that if I send an email with To: f...@bar.com, b...@bar.com with two X-Hashcash headers that, if SA is run separately for each sub-delivery, then it will mark both headers in the first delivery and add a penalty for used hashcash to the second. Luckily, I'm running SA from qpsmtpd, which should only run it once when it receives the double delivery. I suppose SA could prevent this issue from happening in other cases by storing the message-id together with the spent token. My decision to spend time to implement this was based on reading in wikipedia[1] that SA is checking them. I think this needs a mention that it only happens when configured. If you don't disagree, I'll change that. [1] https://en.wikipedia.org/wiki/Hashcash In any case, I've configured it now and it still doesn't work. Off again working on debugging it. Christian.
Hashcash not working
Hi I've implemented (or at least so I thought) Hashcash for my outgoing mail (in a Perl wrapper around qmail-remote that I already had to do DKIM), using the `hashcash` tool as provided by Debian, using the `-X` command-line option. This tool returns multi-line headers if the email address the hash-cash is minted for is long enough. This might be the reason that Mail-SpamAssassin-3.4.1 ignores those, I guessed, so I delved into the code. Here's an example header it generates (you could also check the source of this email): X-Hashcash: 1:23:150729:c...@a.christianjaeger.ch::BIsU5nVO5XGrvOIr:00 6t75 Now another thing I notice is that this format is longer than the examples shown in the code, e.g. X-hashcash: 1:20:040803:a...@cypherspace.org::a1cbc54bf0182ea8:5d6a0 Anyone knows if that is already a problem? Then I noticed that the following regex disallows \n in the header value; do decoded header values have a \n where they wrap, or not? Then I notice in this commit: https://github.com/apache/spamassassin/commit/a95d2cfd2cc07deac9842cfaf10d6d9a85365b12 # untaint the string for paranoia, making sure not to allow \n \0 \' \ - $hc =~ /^([-A-Za-z0-9\xA0-\xFF:_\/\%\@\.\,\= \*\+\;]+)$/; $hc = $1; + if ($hc =~ /^[-A-Za-z0-9\xA0-\xFF:_\/\%\@\.\,\= \*\+\;]+$/) { +$hc = Mail::SpamAssassin::Util::untaint_var($hc); + } if (!$hc) { return 0; } This looks like it isn't correct: before the patch, it would assign undef to $hc if the regex fails (right?), now it leaves the tainted original $hc value in place. Surely not what was meant, right? I'm planning to debug this further (hm, debugging a live daemon is always painful, actually writing a tool for that now, will defer my work here), but would welcome feedback. Christian.