Re: CVE-2015-0235 exposure via qpsmtpd?
It looks like QP core uses gethostbyaddr() but not gethostbyname(). the fcrdns plugin uses Net::DNS, and as far as I can tell Net::DNS never calls gethostbyname() either. So I *think* we're good. -Jared From: Charlie Brady charlieb-qpsm...@budge.apana.org.au Sent: Wednesday, January 28, 2015 7:24 AM To: qpsmtpd@perl.org Subject: CVE-2015-0235 exposure via qpsmtpd? As you can see in the advisory: http://www.openwall.com/lists/oss-security/2015/01/27/9 exim allows remote exploit of a buffer overflow in glibc. Has anybody done an analysis of qpsmtpd to see whether there is a code path via qpsmtpd (and plugins) and perl which allows the same exploit?
Re: Is there any example of a Qmail::Deliverable plugin
Looks like this might do what you need: https://github.com/smtpd/qpsmtpd/blob/72f1a7a9620b5d732749a2c84e71998113f5d1fc/plugins/qmail_deliverable -Jared From: Reinhard Seifert reinh...@currerius.com Sent: Wednesday, January 21, 2015 5:18 AM To: qpsmtpd@perl.org Subject: Is there any example of a Qmail::Deliverable plugin Hi, first of all, thank you guys for developing/maintaining qpstmpd, really a great piece of work. I am a newbee qpstmpd user and also new to Perl... and also quite new to qmail at that. It took me some time to dpkg-reconfigure the debian package, but now it is working excellent together with spamassassin. I found the Qmail::Deliverable module for perl and installed it using cpanminus. But what now? I looked at perldoc Qmail::Deliverable and found those code lines to use in a qpstmpd plugin quote use Qmail::Deliverable ':all'; return DECLINED if not qmail_local $recip; return DECLINED if deliverable $recip; return DENY, Who's that?; /quote I pasted those lines into /usr/share/qpstmpd/plugins/check_deliverable and added check_deliverable to the /etc/qpsmtpd/plugins config file. But when restarting the service I get an error message complaining about the $recip variable. I tried to find an example plugin which utilizes Qmail::Deliverable, but did not succeed. Can anyone give me a hint or even provide a working plugin? Also welcome are hints regarding the startup of the qmail-deliverabled daemon. Do I need to start it and will the plugin know about the port? Thanks, Reinhard
Re: Issues in the current HEAD
I'm about 51% sure that this will fix it, but I can't arrange a quick test just now because apparently xinetd is gone from os x mavericks. If you feel like testing this I would be interested in your results: index 3675829..141214c 100755 --- a/qpsmtpd +++ b/qpsmtpd @@ -17,8 +17,8 @@ delete $ENV{ENV}; $ENV{PATH} = '/bin:/usr/bin:/var/qmail/bin'; my $qpsmtpd = Qpsmtpd::TcpServer-new(); -$SIG{__WARN__} = sub { $qpsmtpd-warn_handler(@_) }; $qpsmtpd-load_plugins(); +$SIG{__WARN__} = sub { $qpsmtpd-warn_handler(@_) }; $qpsmtpd-start_connection(); $qpsmtpd-run(\*STDIN);# pass the socket like -prefork/-forkserver $qpsmtpd-run_hooks(post-connection); Master has #168 reverted so that will probably work for you now as well, but if the above change works I can re-submit this PR in a working state. -Jared From: Jared Johnson jjohn...@efolder.net Sent: Saturday, January 3, 2015 11:11 PM To: salvi...@gmx.ch Cc: qpsmtpd@perl.org Subject: Re: Issues in the current HEAD Regarding the warn_handler message, I probably managed to mess up my recently merged https://github.com/smtpd/qpsmtpd/pull/168 for xinetd mode. Unfortunately my availability will be spotty this week. As a temporary fix, you could use a different mode (I recorded prefork) or un-merge that PR. Of course this may be the least of your problems. On Jan 3, 2015 5:12 PM, salvi...@gmx.ch wrote: I'm using xinetd according to http://wiki.qpsmtpd.org/doku.php?id=deploy:start, i.e. no daemon mode but just running exec qpsmtpd -Ursprüngliche Nachricht- Gesendet: Saturday, 03 January 2015 um 23:31:06 Uhr Von: Jared Johnson jjohn...@efolder.net An: salvi...@gmx.ch salvi...@gmx.ch Betreff: Re: Issues in the current HEAD What daemon mode are you using? Prefork? -- *** DoubleCheck identified this as CLEAN. Give feedback: *** This is SPAM: http://filter.emailportal.com *** More options: http://filter.emailportal.com DoubleCheck identified this as CLEAN. Give feedback: This is SPAMhttp://filter.emailportal.com · Morehttp://filter.emailportal.com
Seeking senior perl dev with QP experience
Hi all, My employer (eFolder, Inc - www.efolder.net) is seeking a senior Perl developer to join our team working on our email security software. I respect a lot of the regulars on this list, and our software is built with qpsmtpd, so I'm hoping some among you might be available and interested in the position. If so, you can find further details here: http://goo.gl/rgecQJ Just a quick plug, this is a 100% telecommute position with amazing benefits, and in my personal experience, eFolder is the best employer I've ever worked for. Feel free to contact me directly with informal questions about the position, I would be working alongside whoever fills this role. Thanks for reading! -Jared
Re: qpsmtpd-dev
At any rate, if a branch like this exists and someone else had the time and motivation to do the leg work of applying enhancements from the various forks to the core branch, I would most definitely ask my employers to put most of our codebase at your disposal to do this. Even if the result wasn't something we could use, and we had to remain forked... at least it would be useful to _someone_ :) https://github.com/qpsmtpd-dev/qpsmtpd-dev OK, as promised, I've asked my employers to put most of our codebase at your disposal to do this :) I'm waiting on a response -- my boss is a very busy guy, but he's also very friendly to FOSS. Hopefully I'll be able to make most or all of our code available via public read-only SVN soon. -Jared
Re: qpsmtpd-dev
I am to the point where I need to deploy. I was hoping that I could get my changes merged into the main branch, but that appears impossible. I hate the prospect of forking, but I see few alternatives. I hereby propose the creation of qpsmtpd-dev. The primary aim is merging the forks maintained by myself, Steve, Chris, Jared, and others. I am willing to do much of the work to accomplish this. I believe there is much to be gained by doing so, not the least of which is saving others the pain of learning they are wasting their time trying to improve QP in any way that doesn't fit with the unstated goals and vision of the project. A secondary goal is to provide a development branch of QP in which ideas can be rapidly tested, deployed, and when desired, backported to the main branch. I like the idea, although I'm not absolutely sure that it would accomplish everything it would mean to. A primary example to me would be the divergent methods used by various forks for supporting per-recipient configuration. That thread demonstrates how each of our various methods made some sort of sense to the others, but were also completely unacceptable to the others for various reasons. The only solutions I can think of would be (1) maintain more than one 'aggressive development' fork, perhaps eventually letting the forks die that don't get used or hacked on much; (2) somehow support multiple options for how to accomplish the same thing, selected within the config; (3) pick a single option that is blessed by upstream -- forcing some to remain quite forked if the single option is not useful for them. At any rate, if a branch like this exists and someone else had the time and motivation to do the leg work of applying enhancements from the various forks to the core branch, I would most definitely ask my employers to put most of our codebase at your disposal to do this. Even if the result wasn't something we could use, and we had to remain forked... at least it would be useful to _someone_ :) -Jared
Re: validating from
In addition to whatever value it might have for Bayesian filters, it may be useful to always add an X-From: header, so that diagnosing email problems like my client with the forged From: header would be easier. I had to grep through his server logs to see how the spammer bypassed the SPF and SA tests. (SA only sees From: and SPF only uses MAIL FROM). I wonder if X-Rcpt-To should be similarly added. Has this been done before? Should it be? Our forked stuff adds X-Envelope-Recipient and X-Envelope-Sender headers to messages, and we've found them useful. But note that we inject a separate message into the queue for each recipient, which means that every message has only one X-Envelope-Recipient header which should already be known in some way to the real recipient of the message. If you use the normal method of injecting a single message into the queue for multiple recipients, and therefore include multiple X-Envelope-Recipient headers, then you're going to wind up violating the intentions of a sender who BCC'd multiple recipients to avoid letting all the recipients in on each others' email addresses. -Jared
Re: validating from
What problems might I encounter if I were to do this? I ask because I have a client who is currently getting spammed viciously by spammers who use one address in MAIL FROM (to pass SPF tests) and they use the senders email address in the From: header so they can get whitelist scoring by SpamAssassin. It's pretty clever. Having perused a lot of spam and non-spam, I would generally expect you to have problems if you were very strict about this. Our solution to self-whitelisting issues is that our whitelisting plugin ignores any whitelist with the recipient's own domain in it. You could perhaps do the same thing with some sanity checks after the fact (or before the fact, if you have a UI through which people enter SA whitelists). We got tired of tip-toeing around SA, though, so we simply changed our UI to save whitelist in our own native table rather than SA, and wrote our own plugins. Another way to solve part of this problem is that if MAIL FROM contains a local domain, reject it unless relay_client is set and the local user exists. This might still be too strict, but it would probably go over better. At least there would be recourse if there are FP's If the To header exists, shouldn't that also be validated against RCPT TO? One would think.. but again I'd expect FP's to happen because of legitimate senders doing strange, interesting, and foolish things. -Jared
Re: relay plugins
You could ask the core developers to give you direct access to the main github repo :-) It seems apparent that you're invested in moving QP forward, and have the time, talent, and motivation to do so while still playing by the rules. If I was on the core team I think I'd be fine with letting you use your own better judgment on when changes are ready and I'd rather review them after the fact at my leisure than delay them till I get around to signing off. Just my $0.02 Matt Simerson m...@tnpi.net wrote: Is there anything else I can do to move this forward? Matt On May 21, 2012, at 12:06 PM, Matt Simerson wrote: https://github.com/smtpd/qpsmtpd/pull/13 replaces 3 relay plugins (check_relay, check_norelay, relay_only) with a single plugin (relay). Same functionality. Bonus: includes tests. And a brief description of how the settings in the 3 config files interact. :) ` Matt Simerson http://matt.simerson.net/ Systems Engineerhttp://www.tnpi.net/ Mail::Toaster - http://mail-toaster.org/ NicTool - http://www.nictool.com/ ` -- *** eFolder Email Security identified this as CLEAN. Give feedback: *** This is SPAM: http://mx11.dcmx.net/ms?k=O50aNhaosAqB *** More options: http://mx11.dcmx.net/md?k=O50aNhaosAqB
Re: new plugin: naughty nice
Been there, tried that. I've had mx2.freebsd.org soft 4xx blocked for 1 day at a time for over a week. Besides the increased number of connections to my server, using a 4xx error makes almost no difference. As soon as the penalty box expires, the queued ham and spam pour in and mx2 gets tossed back into the penalty box. For most messages, a 4xx error ends up being the same as 5xx error except with a delay. I don't know if I'm typical, but I would personally be very hesitant to reject or even defer a server such as mx2.freebsd.org, which is known to send legitimate mail, even if I know that it is going to try to send some spam as well (assuming of course that it is not intentionally originating the spam or infected with a trojan/botnet, at which point it could no longer be called a legitimate server). I would be more inclined to use the 'penalty box' to filter out illegitimate servers, and rely on content filtering to do as much as can be done about spam on legitimate mailing lists, etc. A server operated or controlled by spammers or botnets is not going to retry messages indefinitely like this, so it should be much more susceptible to a 'penalty box' regardless of whether you reject or defer. -Jared
Re: new plugin: naughty nice
We do something not exactly similar to this, but which might be somewhat instructive. We modified the banner delay plugin so that the amount of delay we apply to a server depends on its previous behavior with us. The default delay is 15 seconds; if you send a message that we reject as spam or virus, we increase the delay by 5 seconds. If you send us a message that we deliver as clean (whether that's because of spam scanners, whitelists, or whatever), we reduce the delay by 5 seconds. We attached other values to some other rejections; for instance, if you try to send mail to an invalid user, the delay is increased by 1 or 2 seconds. We do not increase the delay to more than around 300 seconds, so that at least some very RFC-compliant MTA's at least have a chance to deliver, even if they manage to get themselves horribly delayed. What we've found is that the main strength of the banner delay for us is not that it exposes early talkers (it exposes some), but that it encourages early disconnectors :) Some spambots give up after just 5 seconds of delay; the one that wait out our 15 second delay will often try to send a few messages and get rejected, then once we make our way up to 20 or 30 seconds they start giving up on us altogether. So after we implemented this we noticed that what we had on our hands was a rough score card for all connecting IP's indicating how well we know them and how naughty they have been. From this we've been able to further reduce the number of transactions for which scanning is required; for instance, if a client gets the default delay (e.g. for all intents and purposes we haven't heard from them), we only let them have a single concurrent connection with us. We can also be more lenient with clients that have won and maintained a banner delay of zero by sending us some good mail and not screwing it up with bad mail. If your 'naughty' plugin is pretty successful, perhaps we could benefit from adding a 'penalty box' like this based on the delay -- clients that have earned, say, a 30 or 60 second delay by trying to send us too much spam and not enough ham get sent to the penalty box and are therefore no longer able to increase our concurrency. I had an idea similar to this a while ago (we haven't got around to implementing it), where we *randomly* disconnect at the outset a client that's earned an increased banner delay, and the frequency of disconnection depends on long high a banner delay they've earned. This would allow us to throw away connections while, again, still giving good clients a chance to eventually connect to us and send legit mail, thereby decreasing their delay. Others may or may not find it worthwhile to implement this sort of mucking with banner delays (I'll list a couple of problems with it below); but simply scoring clients might be an answer to the problem of legit clients being rejected. Rather than putting a client in the penalty box at the first violation, give them a few chances, and let clean mail put them back into good graces. For storage of these scores we have been using a table Postgres with three fields: IP, delay (aka score), and timestamp, with entries expired after a client has not talked to us at all for 24 hours. We've talked about using Cache::FastMMap for this because there is not really any benefit to keeping it in the database (we don't even try to synchronize it across clusters, if your score is effectively reset because you started talking to a new node then so be it). But honestly, we haven't seen any big bottlenecks with the DB, so we haven't bothered doing the coding and testing work to move it to obviously more efficient mechanisms. For anyone considering the delay method itself, one big weakness with this method is that since it does not get rid of naughty connections as quickly as possible, unless you're using the async daemon (we aren't), it can wind up taking up a lot of resources. Obviously it will still save plenty of CPU on account of spammers giving up instead of giving us something to scan; but without async, the sharp increase in concurrency will hurt memory usage. One thing we recently added which I think will really help in this area is that we do our RBL lookup before starting the delay, and if a server is RBL'd we skip the delay and go right to disconnecting them. Since so many servers are RBL'd this should help concurrency a lot. If the server is not RBL'd, we subtract the amount of time it took for us to do the RBL lookup (and anything else we might have done first) from their banner delay -- after all, we've already been delaying our banner for that amount of time. Aside from that, simply doing a banner delay has caused problems for a few legit servers which give up before they even get to our default delay. We've pretty much completely gotten around this by excluding all private IP's, and clients that are given relay permission by our IP-based rules, from any banner delay whatsoever. HTH!
Re: [PATCH 3/7] basicheaders, add reject option, loglevel
There's several plugins that have something like what you've suggested: plugin action [ add-header | deny | denysoft ] And here's a sampling of the arguments that various plugins use with action: add-header, log, continue, reject, deny, denysoft, accept, delete, add Reject has a simple, explicit, and universal meaning in every plugin, and thus my preference for it. This makes sense to me, but what I thought I had seen come down the line (and what I just noticed in another commit, at any rate) was that 'reject' is not exactly boolean; it's tied to the reject-type field. it seems like this field is just as easy to get crufty over time as the 'action' field, although it at least has fewer potential values. I would contend that reject=1,reject_type=defer is not really rejecting at all, which makes it confusing that you have to turn on rejection in order to defer. I have to admit I haven't been reading your patches closely, so I could be missing things that make this an obviously better idea than I'm perceiving. I might have just kept my mouth shut, but I thought I'd see what you think :) -Jared
Re: [PATCH 3/7] basicheaders, add reject option, loglevel
a little OT: 'reject N' seems a little clunky; why not something more like 'action [reject | add-header | ...]'? -Jared
Re: A basket of changes
I also changed the per-user feature to only take effect if the message has a single recipient. Perhaps it would be useful to give people the option of only accepting one recipient per SMTP transaction? This seems a bit messy/inefficient, we don't do it ourselves, but I've heard from at least one QP user with a big installation that has had success with this; anything after the first (accepted) recipient is simply deferred, so the sending MTA is forced to send messages to multiple recipients with one transaction per recipient. This would allow people to always take advantage of per-user spam rules, if they wished. I should note that our installation does institute a higher recipient limit (since we queue once per recipient we were running into problems with client timeouts being reached while waiting for our response to the DATA command...) and sending MTA's have generally obeyed this, but it has generated some questions from sending organizations as their queued take a little longer to empty out, especially in the case where our clients are doing outbound filtering. ...Actually, if it was also possible to turn of SA scanning on relay-allowed transactions, then it would perhaps be reasonable to make it a one-inbound-recipient limit rather than simply a one-recipient limit. For inbound mail it's much less common to have a whole lot of recipients in a single transactions. -Jared
Re: A basket of changes
6. spamassassin plugin: added support for per-user SA config settings, rewrote header processing logic in spamassassin plugin so it adds all the SA generated X-Spam-* headers to the message. Refactored the code for better maintainability. Adding all the X-Spam-* headers to all recipients' messages could technically be thought of as too transparent in some situations. In practical terms the worst thing I can think of would be that one recipient would be able to see that some other recipient whose preferences they are not allowed to look at has whitelisted or blacklisted some sender address. That's assuming you have added information to each header to indicate which recipient it is for -- but if you haven't, how would each recipient know which X-Spam-* header applies to them? Anyway this isn't exactly an objection, just pointing it out in case anyone can think of a more serious problem with such transparency. The only other option is to queue multiple messages, one for each recipient. This is quite a lot of trouble to go to just to avoid exposing X-Spam-* headers to other recipients... although our organization's fork does go to this trouble in order to deal with other problems, like Subject header to use when different recipients have differing spam scores and subject tagging is on :) -Jared
Re: [PATCH] tweak QP's header handling for messages with no body
There's already a special case for something similar to this: # Reject messages that have either bare LF or CR. rjkaes noticed a # lot of spam that is malformed in the header. ($_ eq .\n or $_ eq .\r) and $self-respond(421, See http://smtpd.develooper.com/barelf.html;) and return $self-disconnect; you could just make it ( /\r\r\n$/ or $_ eq .\n or $_ eq .\r ) maybe? and update the link to point to a URL that explains both? -Jared Yup there's a lot of this going around right now. Just to be explicit though, the header lines end in \r\r\n. Worth rejecting the bloody lot, frankly :) Chris Lewis mailto:cle...@nortel.com August 15, 2011 4:21 PM As a FYI, I've been seeing bot-emitted spam that appears to have extra \r at the end of _all_ header lines, and the qpsmtpd parser seems to be treating all of it as part of the _body_. IOW: except for the received line inserted by qpsmtpd, qpsmtpd doesn't see _any_ headers. This implementation is backrev (0.80 I think), and as it's only spam from one particular bot, we don't care about that particular wierdness enough to investigate further. But it's worth being aware of. Jared Johnson mailto:jjohn...@efolder.net August 15, 2011 3:39 PM Hi, We got a bug report from someone using IBM's Lotus suite (I think for both their MUA and MTA). Their users would often send messages where all the content was in the subject and they didn't bother sending any message content. I'm not sure if it's due to an apparently uncommon behavior for their particular MUA or their MTA, but every one of these messages was coming through with data in a form that looked like this: Subject: howdy\r\n.\r\n Rather than including the blank line that one might expect to follow headers, since it's required in the event that a message body is present: Subject: howdy\r\n\r\n.\r\n The customer reported these messages were having their subjects stripped; additional testing indicted all existing headers were being stripped. It looks like this is because the loop that processes message data in Qpsmtpd::SMTP::data_respond() and creates a Mail::Header object which is later used to write out the header in delivery, only works if a blank line exists after the header, e.g. the second form above. The following is all I could find in RFC 5322 that elaborated on this blank line, which obviously must exist if a message body is included: A message consists of header fields (collectively called the header section of the message) followed, optionally, by a body. The header section is a sequence of lines of characters with special syntax as defined in this specification. The body is simply a sequence of characters that follows the header section and is separated from the header section by an empty line (i.e., a line with nothing preceding the CRLF). I read this as implicitly allowing the exclusion of this blank line if there is no message body: the specification for the blank line is only mentioned in the description of the body, which is itself described as optional. Considering we haven't run into this bug in years of usage, I assume it's unconventional to exclude the blank line, but it looks like it is legitimate syntax. At any rate this was effecting multiple legitimate end users so we put together the attached patch, which pulls the header building into its own sub which is then called inside the loop if we reach the blank line indicating the header section is complete; otherwise, it's called outside of the loop if we have no more message data, indicating the header section is complete. Sorry I'm not putting this on a github fork, I still don't have my git stuff together, I may never get around to it but I thought you guys might find this useful. -Jared
Re: [PATCH] tweak QP's header handling for messages with no body
True, I was led astray by the comment that seemed to indicate it was all about headers. if the intention of the comment is correct, then it should probably if ( $in_header and ... ), whether or not the new regex is added. That block would also need to be moved to after the block that sets $in_header = 0 if we've reached the end, to avoid catching the blank line that separates headers and newlines. At any rate, Chris has a point -- I was actually thinking the other day that code like that original deferral would be better of sitting in a plugin, so I could muck around with it in a more straightforward manner in order to, say, log what we did to our own database, or change it to a rejection, or whatever :) Anyway, this is all kind of minutia compared to the original patch, which I feel I should remind everyone has to do with accepting legitimate mail from Lotus software without inadvertently stripping the headers :) -Jared That line of code doesn't look at the headers though, just at the final dot at the end-of-data. Jared Johnson mailto:jjohn...@efolder.net August 16, 2011 3:00 PM There's already a special case for something similar to this: # Reject messages that have either bare LF or CR. rjkaes noticed a # lot of spam that is malformed in the header. ($_ eq .\n or $_ eq .\r) and $self-respond(421, See http://smtpd.develooper.com/barelf.html;) and return $self-disconnect; you could just make it ( /\r\r\n$/ or $_ eq .\n or $_ eq .\r ) maybe? and update the link to point to a URL that explains both? -Jared Matt Sergeant mailto:m...@sergeant.org August 16, 2011 11:28 AM Yup there's a lot of this going around right now. Just to be explicit though, the header lines end in \r\r\n. Worth rejecting the bloody lot, frankly :) Chris Lewis mailto:cle...@nortel.com August 15, 2011 4:21 PM As a FYI, I've been seeing bot-emitted spam that appears to have extra \r at the end of _all_ header lines, and the qpsmtpd parser seems to be treating all of it as part of the _body_. IOW: except for the received line inserted by qpsmtpd, qpsmtpd doesn't see _any_ headers. This implementation is backrev (0.80 I think), and as it's only spam from one particular bot, we don't care about that particular wierdness enough to investigate further. But it's worth being aware of. Jared Johnson mailto:jjohn...@efolder.net August 15, 2011 3:39 PM Hi, We got a bug report from someone using IBM's Lotus suite (I think for both their MUA and MTA). Their users would often send messages where all the content was in the subject and they didn't bother sending any message content. I'm not sure if it's due to an apparently uncommon behavior for their particular MUA or their MTA, but every one of these messages was coming through with data in a form that looked like this: Subject: howdy\r\n.\r\n Rather than including the blank line that one might expect to follow headers, since it's required in the event that a message body is present: Subject: howdy\r\n\r\n.\r\n The customer reported these messages were having their subjects stripped; additional testing indicted all existing headers were being stripped. It looks like this is because the loop that processes message data in Qpsmtpd::SMTP::data_respond() and creates a Mail::Header object which is later used to write out the header in delivery, only works if a blank line exists after the header, e.g. the second form above. The following is all I could find in RFC 5322 that elaborated on this blank line, which obviously must exist if a message body is included: A message consists of header fields (collectively called the header section of the message) followed, optionally, by a body. The header section is a sequence of lines of characters with special syntax as defined in this specification. The body is simply a sequence of characters that follows the header section and is separated from the header section by an empty line (i.e., a line with nothing preceding the CRLF). I read this as implicitly allowing the exclusion of this blank line if there is no message body: the specification for the blank line is only mentioned in the description of the body, which is itself described as optional. Considering we haven't run into this bug in years of usage, I assume it's unconventional to exclude the blank line, but it looks like it is legitimate syntax. At any rate this was effecting multiple legitimate end users so we put together the attached patch, which pulls the header building into its own sub which is then called inside the loop if we reach the blank line indicating the header section
Re: [PATCH] tweak QP's header handling for messages with no body
Oh and FYI, this patch is well tested on our own variant of SMTP.pm, but that variant is slightly forked, and this version is itself not specifically tested. I'm fairly positive it'll work though :) Hi, We got a bug report from someone using IBM's Lotus suite (I think for both their MUA and MTA). Their users would often send messages where all the content was in the subject and they didn't bother sending any message content. I'm not sure if it's due to an apparently uncommon behavior for their particular MUA or their MTA, but every one of these messages was coming through with data in a form that looked like this: Subject: howdy\r\n.\r\n Rather than including the blank line that one might expect to follow headers, since it's required in the event that a message body is present: Subject: howdy\r\n\r\n.\r\n The customer reported these messages were having their subjects stripped; additional testing indicted all existing headers were being stripped. It looks like this is because the loop that processes message data in Qpsmtpd::SMTP::data_respond() and creates a Mail::Header object which is later used to write out the header in delivery, only works if a blank line exists after the header, e.g. the second form above. The following is all I could find in RFC 5322 that elaborated on this blank line, which obviously must exist if a message body is included: A message consists of header fields (collectively called the header section of the message) followed, optionally, by a body. The header section is a sequence of lines of characters with special syntax as defined in this specification. The body is simply a sequence of characters that follows the header section and is separated from the header section by an empty line (i.e., a line with nothing preceding the CRLF). I read this as implicitly allowing the exclusion of this blank line if there is no message body: the specification for the blank line is only mentioned in the description of the body, which is itself described as optional. Considering we haven't run into this bug in years of usage, I assume it's unconventional to exclude the blank line, but it looks like it is legitimate syntax. At any rate this was effecting multiple legitimate end users so we put together the attached patch, which pulls the header building into its own sub which is then called inside the loop if we reach the blank line indicating the header section is complete; otherwise, it's called outside of the loop if we have no more message data, indicating the header section is complete. Sorry I'm not putting this on a github fork, I still don't have my git stuff together, I may never get around to it but I thought you guys might find this useful. -Jared
Re: thoughts about a new module called check_spammer_connect
I thought about a module which learns from the plugin dnsbl. Maybe we call it check_known_dnsbl_spammer ;-) and use the module http://search.cpan.org/~robm/Cache-FastMmap-1.39/lib/Cache/FastMmap.pm Hi Aleks, We discussed doing this once at my organization and someone astutely pointed out that considering that this is DNS, the better solution would really be to run a local caching DNS server, e.g. bind9 and point resolv.conf to 127.0.0.1 with forwarders to your ISP's nameservers. Well, what they actually pointed out was that since we already do this everywhere, we're already getting in-memory caching that's designed expressly for DNS results, for free :) running a local caching DNS server is basically a good idea anyway, and obsoletes the need for caching DNS results in any particular software. -Jared
[Fwd: Re: [Fwd: STARTTLS vulnerabilty and qmail-spamcontrol ucspi-ssl qpsmtpd]]
I've been otherwise occupied but I forwarded this to the rest of our dev team and our resident security guru had this to say Original Message Subject: Re: [Fwd: STARTTLS vulnerabilty and qmail-spamcontrol ucspi-ssl qpsmtpd] From:Peter Samuelson psamuel...@efolder.net Date:Thu, June 2, 2011 1:32 pm To: dc...@efolder.net -- Subject: [Fwd: STARTTLS vulnerabilty and qmail-spamcontrol ucspi-ssl qpsmtpd] From:Matt Sergeant m...@sergeant.org Heh, this class of vulnerability made the rounds some months ago - a (perhaps) surprising number of open source software out there that deals in SSL/TLS had this class of bug. I'm pretty sure this is enough to fix it for async: diff --git a/plugins/tls b/plugins/tls index 37fbc9a..f850d2c 100644 --- a/plugins/tls +++ b/plugins/tls @@ -275,6 +275,7 @@ sub upgrade_socket { my UpgradeClientSSL $self = shift; unless ( $self-{_ssl_started} ) { +$self-{_stashed_qp}-clear_data(); IO::Socket::SSL-start_SSL( $self-{_stashed_qp}-{sock}, { SSL_use_cert = 1, Yeah, looks right to me. But for the non-async scenario I think it's a lot more complex because the caching would be done at the C-level, so a fix more like the fix (posted below) for postfix is required (switch to non-blocking, get whatever data is remaining, flush it, switch back off non-blocking). I ... disagree. From my reading of plugins/tls, it looks like there is no problem at all, in the non-async code path. It resets STDIN and STDOUT to a socket created from scratch by the IO::Socket::SSL module. I haven't looked at IO::Socket::SSL to see if it has this sort of issue, but it seems unlikely to me. Peter
Re: smtp proxy to external smtp server
We do the sort of signing that is a huge doozy, and Matt is right, it's a doozy :) There are a couple of ways we've accomplished re-writing the body from a MIME::Entity. Honestly it seems a bit non-standard to me but we have some special requirements, like leaving the original body around; I could simplify the examples but I might as well show you known-working code :P We currently do: use File::Temp qw(tempfile); ( $txn-{_body_file}, my $filename ) = tempfile( DIR = $self-temp_dir() ); $txn-body_fh-print(\n); $mime_entity-print_body( $txn-body_fh ); $txn-{_body_size} = (stat($txn-{_body_file}))[7]; $txn-{_body_start} = 0; An older method that we used to use (I can't remember whether we switched to the above because it's more efficient, or because of some of the other weird things we're doing -- at least the above doesn't require stringify_body() which is probably slightly undesirable): $txn-body_resetpos; $txn-{_body_file}-truncate($txn-{_body_start}); $txn-{_body_size} = $txn-{_body_start}; $txn-body_write( $mime_entity-stringify_body() ); Hope this helps. -Jared On 05/20/2011 04:19 AM, Matt Sergeant wrote: What do you mean by signed? Signing with a gnugp key. Actually by signing the original email I get a new MIME::Entity which then I need to pass on the queue to get delivered to the real smtp server. In other words, the qpsmtpd proxy signs the email and delivers it to the real smtp server. -- Thanks for any hint. Mike
Re: smtp proxy to external smtp server
On 05/20/2011 02:56 PM, Jared Johnson wrote: We do the sort of signing that is a huge doozy, and Matt is right, it's a doozy :) There are a couple of ways we've accomplished re-writing the body from a MIME::Entity. Honestly it seems a bit non-standard to me Why do you think this is non-standard? From a conceptual perspective or just from a tooling view? If conceptually, which other approach would you recommend? just non-standard in that I don't use public API's, so e.g. if they decided to store the body file in $txn-notes('body_file') instead of $txn-{_body_file}, for instance, you'd be SOL. Other than that, though, we haven't had any problems with our method :) I'm not sure if it can be done with a public API, but if you want to look for a method, i'd check out the POD for Qpsmtpd::Transaction, for the various body* methods. I did some preliminary tests and it works. That also means there is no API or any other way to put an email into the queue, isn't it? well, the standard way to put a message into the queue is to have the message in a state you want it in before the queue hook gets ahold of it, and then let the queueing plugin do its work... I think smtp-forward is an example of a queueing plugin? Well, can I use this approach in a productive environment or could there be any side effects? YMMV, but we've been using that method in production a while with no side effects that I know of :) -Jared
Tumbleweed TLS compatibility patch
Hi, We've been chasing issues with Tumbleweed (now Axway) MailGate appliances sending messages using TLS through us. Finally Tumbleweed told one of our clients that this was probably due to a bug in their software when the receipient MTA makes multiple writes to the socket. On a guess, we patched QP to send multi-line EHLO reponses in a single string rather than multiple strings. This seems to have cleared up the issue! I'm attaching the patch, in hopes that (1) it is useful to you guys; and (2) if there's anything stupid about it that we haven't been able to catch, someone will notice and help us improve it, since we're about to include it in our forthcoming point release :) -Jared--- Prefork.pm.orig 2011-02-22 17:35:39.809587002 -0600 +++ Prefork.pm 2011-02-22 17:34:58.849587003 -0600 @@ -59,10 +59,12 @@ sub respond { return(0); } - while (my $msg = shift @messages) { -my $line = $code . (@messages?-: ).$msg; -$self-log(LOGINFO, $line); -print $line\r\n or ($self-log(LOGERROR, Could not print [$line]: $!), return 0); + $messages[$_] = $code-$messages[$_] for 0..$#messages-1; + $messages[-1] = $code $messages[-1]; + $self-log(LOGINFO, $_) for @messages; + unless (print join '', map { $_\r\n } @messages) { +$self-log(LOGERROR, Could not print [ . join(' | ', @messages) . ]: $!); +return 0; } return 1; }
Re: rolling uribl and rbl plugins into a single plugin
I agree that sharing code should be possible here (and that would be desirable); but as Matt suggested using the plugin inheritance stuff would be better than (eventually) making the One True Spam Fighting plugin. :-) OK, I agree. Hopefully the integration will happen for our next release (which we're going to test pretty soon), and then someday i'll get the async bit right and dnsbl lookups will live happily ever after :) -Jared
rolling uribl and rbl plugins into a single plugin
So our organization is planning doing some big changes to the rbl plugin and it dawned on us that it seems a lot easier to just add an earlier hook to the existing uribl plugin (and rename it to rbl? or bl? or something?). But of course I still have in mind that someday I'll get the plugin completely in shape and it will be in QP, and we'll get to share code and all. Would this be an undesirable direction to take the uribl plugin in for QP proper, in the event that the uribl plugin was integrated into QP proper? It seems like it would be handy in terms of both code organization and features (the main feature I can think of is proper mask evaluation along with custom actions per mask, just like in the uribl plugin). Thoughts? -Jared
Re: lastest uribl plugin
Specifically regarding the quoted-printable stuff.. my diff must have been off an older version. This is probably a better representation of your changes: @@ -574,15 +574,22 @@ my $line; my @qp_continuations; while ( $line = $txn-body_getline ) { +chomp $line; if ( $line =~ /(.*)=$/ ) { push @qp_continuations, $1; next; } elsif ( @qp_continuations ) { -$line = join '', @qp_continuations, $line; +$line = join '', @qp_continuations, $1; @qp_continuations = (); } $self-find_uris($line); } +if ( @qp_continuations ) { +$self-log(LOGINFO, uribl: WARNING: scan_body exiting with line continuations left. Bad Email?); +$line = join('', @qp_continuations, $line); +@qp_continuations = (); +$self-find_uris($line); +} } The first and last change should certainly go in, but what about this: if ( $line =~ /(.*)=$/ ) { push @qp_continuations, $1; next; } elsif ( @qp_continuations ) { -$line = join '', @qp_continuations, $line; +$line = join '', @qp_continuations, $1; @qp_continuations = (); } Won't $1 only be defined when the condition in the 'if' is met, thus making your change equivalent to: $line = join '', $qp_continuations; I'm not sure but it seems like that change doesn't belong.. -Jared
Re: How to get both sender and recipient address for mail-hook
Pretty sure he meant: my ( $self, $transaction, $recipient) = @_; my $sender = $transaction-sender; Then you can do: my $recipientdomain = $recipient-host; my $senderdomain = $sender-host; -Jared Steve Freegard steve.freeg...@fsl.com wrote: Hi Tapio, On 11 Jan 2011, at 11:34, Tapio Salonsaari wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello! I'm building an web-based gui for qpsmtpd and currently I'm writing/modifying plugin for white/blacklists per domain. My web gui stores black / white lists into psql-database and the idea is to do an SQL-search like: SELECT * FROM blacklist WHERE sender=$senderdomain AND recipient=$recipientdomain; Then if the query returns any rows the plugin would return DENY. If there's no rows on blacklist search the same for whitelist -table and process the data accordingly. I'm using http://devin.com/qpsmtpd/whitelist as an base (don't know if there's something more suitable to start with) and currently I'm stuck with sub mail_handler, since I can't figure out how to pass both sender and recipient addresses to function. You can't do anything with the recipient addresses in the mail handler because they haven't been received yet! In an SMTP transaction the recipients are sent after the MAIL command (see rfc5321). If you're going to do black/whitelisting using SQL like that; then you'll need to do it in the hook_rcpt handler. As per the wiki docs at http://wiki.qpsmtpd.org/api:plugin_hooks the hook_rcpt function is passed: my ($self,$transaction, $sender) = @_; # $sender: an Qpsmtpd::Address object for sender of the message You can access the sender in the same format (as a Qpsmtpd::Address object) via $transaction-sender at this point. Hope that helps. Kind regards, Steve.
Re: white-listing in hosts_allow plugin
IIRC, the connections-per-ip code isn't in a plugin right now -- it's part of the main binary -- because it requires some sort of shared state which we don't have for plugins. The checks do exist in the binaries but they don't have to, you can use $arg{child_addrs} and $arg{remote_ip} in hook_pre_connection to get at it, at least in prefork, we've been doing it forever. Our code is heavily modified for our db-based prefs but it's a lot like this: my %per_ip_limits = ( # fake magic per-ip config hashref 1.2.3.4 = 10, ); my %class_c_limits = ( 1.2.3.4 = 20, ); sub ip_to_int { my ( $ip ) = @_; $ip = inet_aton($ip) if length($ip) 4; return unpack 'N', $ip; } sub hook_pre_connection { my ( $self, $txn, %arg ) = @_; return NEXT unless @{ $arg{child_addrs }; my $ip_limit = $per_ip_limits{ $arg{remote_ip} } || 5; return DENYSOFT_DISCONNECT, 'Too many connections from your IP' if ( grep { $_ eq $arg{remote_ip} } @{$arg{child_addrs}} ) $ip_limit; my $c_limit = $class_c_limits{ $arg{remote_ip} } || 10; my $ip = ip_to_int( $arg{remote_ip} ); my $mask = $ip 0xff00; return DENYSOFT_DISCONNECT, 'Too many connections from your class C' if ( grep {(ip_to_int($_)0xff00) == $mask} @{$arg-{child_addrs}} ) $c_limit; return NEXT; } another interesting thing we support is allowing people to set a custom max concurrency override for an arbitrary cidr; when an IP matches the CIDR, we check how many other child_addr's match the same CIDR instead of checking the other limits. We use Net::CIDR::Lite::Span for this business though so it's probably less efficient than the above. We also use hook_pre_connection to implement global connection limits, which allows us to defer and write an entry to our DB logs when the limit is exceeded, rather than just not responding. This is way less efficient though, and we've had to contend with it snowballing when the system becomes so busy that the db is slow. Oh well, someday we'll switch to async :) -Jared
Re: white-listing in hosts_allow plugin
return DENYSOFT_DISCONNECT, 'Too many connections from your class C' if ( grep {(ip_to_int($_)0xff00) == $mask} @{$arg-{child_addrs}} ) Though this code is still untested in its heavily-modified-from-our-version form, I feel like I should point out that should be @{ $arg{child_addrs} }, not @{ $arg-{child_addrs} } :) $c_limit; return NEXT; } -Jared
Re: Transaction.pm body_fh() badly broken [was: qpsmtpd-v0.84 seems to ignore size_threshold]
commit a52660a646012691f993cca821c00fe05cff08bb Author: jaredj jar...@nmgi.com Date: Wed Mar 25 07:38:05 2009 -0500 Spool body when $transaction-body_fh() is called Qpsmtpd::Transaction::body_filename() calls $self-body_spool() if the message body has not already been spool to disk. This adds the same check to Qpsmtpd::Transaction::body_fh() but I don't remember the background behind it. Jared, do you remember why you changed this? Looking at the history in our internal SVN history, it looks like this was part of an effort to dup() body_fh. IIRC this was because we were passing it to an external library that close()'d it but we don't want the original FH closed. BUT... at some point internally we noticed that this caused bugs! I don't remember what sort, but our commit log says it confuses some methods of writing to the fh. So we reverted the changes to body_fh() and added a dup_body_fh() to do what we want: sub body_fh { return shift-{_body_file}; } sub dup_body_fh { my $self = shift; open ( my $fh, '=', $self-body_fh ); return $fh; } I guess by the time we discovered it was buggy I had forgotten I had submitted the buggy change upstream. Sorry! We only use dup_body_fh() in one place right now, so though adding such a function might be useful upstream, but I don't think there's any need to preserve the old, apparently undocumented, behavior; seems safe to just revert. -Jared
Re: Auth issues with turnpike
With this in mind is it possible without having to patch qpsmtpd to influence the order that auth methods are presented in from 250 AUTH PLAIN LOGIN CRAM-MD5 such that LOGIN is the first available option? Apparently in turnpike you can't specify the method you use and it always picks the first one in the list. I appreciate this is a rather odd request and a much better option would be to not use turnpike, but can anyone help with fixing this at the server end without resorting to altering the core qpsmtpd code? Altering plugins is fine. It looks like you would need to modify Qpsmtpd/SMTP.pm, at least with the way things are set up now; it creates the AUTH line outside of $txn-notes('capabilities'), and uses a hash so the sorting is random. This diff would probably work but is untested: --- dc-smtpd/lib/Qpsmtpd/SMTP.pm(revision 13227) +++ dc-smtpd/lib/Qpsmtpd/SMTP.pm(working copy) @@ -243,7 +243,8 @@ # Check if we should only offer AUTH after TLS is completed my $tls_before_auth = ($self-config('tls_before_auth') ? ($self-config('tls_before_auth'))[0] $self-transaction-notes('tls_enabled') : 0); if ( %auth_mechanisms !$tls_before_auth) { -push @capabilities, 'AUTH '.join( ,keys(%auth_mechanisms)); +push @capabilities, 'AUTH '.join( ,grep { exists $auth_mechanisms{$_} } + qw( PLAIN LOGIN CRAM-MD5 )); $self-{_commands}-{'auth'} = ; } Seems like if at all possible it would be better to have the auth plugin figure this stuff out and then add to $txn-notes('capabilities') like everything else does. Then you could just do it in an extra plugin, like: sub hook_ehlo { my ( $self, $txn ) = @_; my $auth = grep { /^AUTH / } @{ $txn-notes('capabilities') } or return DECLINED; my %methods = map { $_ = undef } split / /, $auth; $auth = join( ' ', grep { exists $a{$_} } qw( AUTH PLAIN LOGIN CRAM-MD5 ) ); $txn-notes( capabilities = [ ( grep { { ! /^AUTH / } @{ $txn-notes('capabilities') } ), $auth ] ); } even though it amounts to more work, it would be nice to be able to use plugins for such things and not patch SMTP.pm according to one's desires. And most things you'd want to hack the AUTH line for would be simpler. Indeed, if you aren't worried about dealing with unpredictable states of the AUTH line, the plugin would probably be reduced to about two lines. -Jared
URIBL update
I've made a bunch more changes to the uribl plugin locally; man, we _really_ need to get some kind of svn-to-gig thing going. Or at least I need to re-educate myself on git and start putting things in my github again. If I don't manage to do this by the time (soon) that things are settled down a bit and we have some production testing, I'll submit a new one to the list again; but hopefully by the time I'll have found time to git git going again and be able to point people to that. I got permission from Dallas @ URIBL to use the datafeed data, but also got his opinion on the matter which is that using tld_lists the way I am is not going to gain much, and introduces the risk that a new spammer 'haven' could be missed entirely. After talking with my team we're going to go with a full TLD list right now, and perhaps later we'll collect our own stats to verify Dallas is right about the tiny benefit (he probably is). tld_lists has been updated to reflect this, though if anyone feels more bold than me and wants updates to the 'pruned' list let me know. I modified the parse_mime plugin as discussed previously on the list, now the uribl plugin isa_plugin('mime_parser') and does lazy parsing. I'll probably remove 'semicolon munging detection'; as Devin said, if real (current) data doesn't show it's being used why bother. I'd like to go over a larger sampling of current data first though, which I plan to do soon. I've re-arranged the code slightly to allow not only the async plugin but our own local plugin to easily take advantage of plugin inheritance to avoid code duplication. Our own plugin is now just 40 lines or so, thus it gets to inherit the other 600 lines of uribl without any forking :) There are some additional changes in the works that I'm curious for input on if anyone cares: - We're finally getting more URIBL datafeeds. I'd like to use this data to verify how static TXT results are for each service and, if applicable, generate templates for them in the same script that generates tld_lists.pl (and probably rename that to fit its more general purpose). So for services that do indeed have very static TXT templates, we could optionally skip TXT lookups and instead generate our own response (e.g. links) without the cost of one more DNS query. A couple of additional brainstorms on this topic: - Dynamically generate the TXT template by going ahead and doing TXT lookups until the first one we get back for each service, at which point we cache the template and don't do any more. (This would have to be re-done in every new child process) - Dynamically verify the validity of the statically-set template by doing TXT lookups until we get one and then checking. Still has to be per-child-process. - If we do either of the above dynamic checks, or if we don't choose to do any TXT-avoiding magic at all, we ought to launch TXT requests in the callback that receives matches on A lookups, that way we do two queries per _hit_, rather than two queries per URI. I've already added the independent option to just turn off TXT queries, for anyone who wants to save on DNS traffic at the cost of providing links on rejection. - We're interested in optionally resolving URL shortening links (e.g. tinyurl, bit.ly, etc.) using HTTP::Async My boss is still deliberating whether the URL shortening resolution thing would be contributed, or if we would consider it part of the 'special sauce'. I'm hopeful he'll be in favor of contributing it. - We'd like to change the check_headers directive to take more args than just true/false. 0 would still mean 'don't check headers', 1 would still mean 'check all headers', but 'all' would also mean 'check all headers'; anything else would be interpreted as the header(s) to be checked (comma-delimited if in list form). So you could do check_headers = 'subject' or check_headers = 'subject,received'. The default should probably be off. This is mainly because I noticed check_headers automatically checks the Received header, which is even more interesting when combined with the SBL-XBL service; basically, this plugin is now a replacement for SA's RCVD_IN_SBL rule, etc. This is probably a good thing... as long as you actually wanted it :) But you should be able to avoid it if you like and just specify the headers you're interested in. Any comments on these? Do they sound worthwhile? What should the defaults be? -Jared
Re: per-recipient configuration
I have stuff that might wind up looking like $rcpt-persistent-{statistics_receivedmsgs_total}++; $rcpt-persistent-{statistics_receivedstatsbysender}{$NormalizedSenderAddress}++; Perhaps $rcpt-storage could be provided that could be tied... it just seems like this is the less conventional accessor in qp-land. notes('foo') can be accessed with -{_notes}-{foo}, but this is not the general convention. When I want to increment a note i just do $foo-notes( blah = $foo-notes('blah') + 1 ). That said, some of my colleagues have also complained that notes() even exists as a method and think it would be fine for it to just be a reserved namespace accessed directly as a hash key. in addition to reading configuration. Nothing's exactly stopping you from reading configuration directly with a persistence hook, it just doesn't seem that natural to me, and I think the needs of a hook_user_config are a subset of the needs of a persistence hook. I wouldn't really cry if I had to change every $rcpt-config() call in my codebase to $rcpt-get() or $rcpt-storage() etc, it just seems less intuitive and less in line with the existing convention of $qp-config() -Jared
Re: per-recipient configuration
colleagues have also complained that notes() even exists as a method and think it would be fine for it to just be a reserved namespace accessed directly as a hash key. or at least that notes() should return the {_notes} hashref if it doesn't have any callers -Jared
Re: per-recipient configuration
Another thing that nothing stops us from, in a late-bound programming language without private namespaces, is adding additional methods to base objects without changing the code that declares the base objects, as long as the implementation of the base objects promises to remain the same. I do think it would be best though to come to decisions upstream to provide methods that most people will find sufficient and intuitive, because once they find it insufficient or unintuitive and add their own methods, they'll start using them in their code and the code becomes that much more forked. I find the $qp-config() style method intuitive for this myself; adding a hashref accessor to at least the notes() method and perhaps to this persistence-method-that-might-also-want-to-cover-config-needs, might be considered intuitive by many. -Jared
URIBL plugin 'action' defaults
Would anyone object to setting the default action to 'deny' on certain reliable low-fp URIBL lists? Probably Spamhaus SBL-XBL and DBL and URIBL Black. It seems like a new user turning on uribl checks would expect them to do something more than adding headers, as long as the services it rejects for are reliable. But this also means that anyone who was already using the plugin without setting any actions would see a change in behavior on upgrade. -Jared
Re: Rewritten URIBL plugin
Jared Johnson wrote: I think we should probably consider putting support for parsed messages into core, with the parsing done lazily if requested by the API. I forgot, we did kinda think of a couple of reasons not to want an API. depending on where you put it, you may find QP in general depending on MIME::Parser even if it never uses it. The benefit of the plugin is that /etc/qpsmtpd/plugins controls whether you wind up having to install, and take up your memory with, MIME::Parser One thing we've discussed in the past (at least in my imagination) although not quite figured out how to implement, is making plugins act a little more like normal modules, so that one plugin can use another. So if you're interested in the parsed mime functionality, your plugin can plugin_use util/parsed-mime and the right magic happens. Oh yeah that's right, someone *did* implement what you're talking about. You can do it with 'plugin inheritance' (which ironically i knew nothing about until looking at the async uribl plugin the other night): # /usr/share/qpsmtpd/plugins/mime_parser use MIME::Parser; sub parse_mime { my ( $self, $txn ) = @_; return $txn-notes('mime_body') if $txn-notes('mime_body'); a bunch of code to create a $msg ... $txn-notes( mime_body = $msg ); return $txn-notes('mime_body'); } # /usr/share/qpstmpd/plugins/some_plugin_that_might_want_mime_parsing sub init { my ( $self, $qp, %args ) = @_; return unless $qp-config('parse_mime') or $args{parse_mime}; $self-isa('mime_parser'); $self-{can_parse_mime} = 1; } sub hook_data_post { my ( $self, $txn ) = @_; if ( $self-{can_parse_mime} ) { $self-some_recursive_mime_function( $self-parse_mime ); } else { $self-regular_body_getline_function( $txn ); } } Voila! A lazy 'plugin model' that is *also* and API, which doesn't 'use MIME::Parser' until you *want* to use it; furthermore, you don't have to just put the official mime_parser plugin in there, with a little modification (or alternatively some care to use the same filename within an alternate directory) you could use your home-rolled messier-but-more-efficient version that doesn't use MIME::Parser at all. I could easily modify my own stuff to do this and test it. I'd even be interested in pulling a couple of our own home-grown MIME recursion methods into it. If this is indeed the will of the council ;) p.s. I'm kind of stoke about plugin inheritance these days. I'm becoming convinced that after some code churn on our side it will allow us to finally switch to async without having to do just rewrite all our plugins, switch to the async daemon, and see what happens. And with a little churn on the upstream side, if that manages to happen, I also think it could allow us to un-fork 90% of the QP plugin code we currently have re-written and instead submit patches to QP that have some guarantee to be tested and aren't surrounded by completely useless context :) -Jared
Re: [Fwd: DoubleCheck DataFeed access]
Also: we recently contributed a new URIBL plugin to the Qpsmtpd project, which makes use of our pruned TLD lists that use your datafeed data. They had some question of whether this was something you would be comfortable with having distributed publicly. Note no actual URIBL data is distributed, just a list of TLDs that happens to *not* include top-level TLDs that would be extremely unlikely to generate hits against your service. This is used to limit the number of extraneous queries to your public mirrors, which I'm guessing you would consider beneficial. Could you verify whether we have your permission/blessing to distribute such a list gleaned from your data? Ya, fine. It doesnt sound like it would have significant impact on volume to me, as the top 25 tlds (including ipv4 volume) that are queried represent 91% of the total query volume, and the top 100 tld represent 99% of the volume. If you tell me which tlds are suppressed, I can give you an idea of query volume savings according to mirror traffic. For example, suppression of .mil and .int would result save 4/100th of a percent (0.00039). Now, if there was a hacked webserver in .mil and spammers used it as a drop page or redirector, our temporary listing of it would never hit for you if you suppress them.I guess you have to weigh the savings versus the potential for abuse. You wouldnt want to supress a TLD that becomes the next spammer haven and have to scramble to release an update. Thinking about recent history such as .tk (Tokelau), .st (Sao Tome), .im (Isle of Man), and others. There are 135 excluded tlds: ac ad af ag ai al an ao aq arpa asia aw ax bb bf bi bj bm bn bo bt bw cd cf cg ci ck coop cr cu cv dj dm do dz edu er et fj fk fm fo ga gf gh gi gl gm gn gov gp gq gt gu gw gy ht int iq jm jo jobs kh ki km kn kp kw ky lb lc lk lr ls lu ly mc mg mh mil ml mm mo mp mq mr mt museum mv mw mz na nc nf ng ni nr om pa pf pg pn ps pw qa re rw sb sd sh sl sm sn sr sv sy sz td tel tf tg tj tm tn travel ug va vg vi vu wf ye yu zm zw ... I also used the feed data to prune two and three level TLDs from the SURBL list, which is pretty obviously not based on any data: coop.br coop.tt gov.ae gov.am gov.ar gov.as gov.au gov.az gov.ba gov.bd gov.bh gov.bs gov.by gov.bz gov.ch gov.cl gov.cm gov.co gov.cx gov.cy gov.ec gov.ee gov.eg gov.ge gov.gg gov.gr gov.hk gov.hu gov.ie gov.il gov.im gov.in gov.io gov.ir gov.is gov.it gov.je gov.jp gov.kg gov.kz gov.la gov.li gov.lt gov.lv gov.ma gov.me gov.mk gov.mn gov.mu gov.my gov.np gov.ph gov.pk gov.pl gov.pr gov.pt gov.py gov.rs gov.ru gov.sa gov.sc gov.sg gov.sk gov.st gov.tl gov.to gov.tp gov.tt gov.tv gov.tw gov.ua gov.uk gov.vc gov.ve gov.vn gov.ws gov.za jobs.tt tel.no tel.tr act.gov.au nsw.gov.au nt.gov.au pa.gov.pl po.gov.pl qld.gov.au sa.gov.au so.gov.pl sr.gov.pl starostwo.gov.pl tas.gov.au ug.gov.pl um.gov.pl upow.gov.pl uw.gov.pl vic.gov.au wa.gov.au There were also about _1400_ reserved .us TLDs prune from the list, IIRC You make some pretty good points. It may well not be worth the trouble, at least for one-level TLDs. Thanks, Jared Johnson Software Developer DoubleCheck Email Manager
Re: per-recipient configuration
so inside QP, the qpsmtpd::address object would have a known parameter that brings up the per-address persistence hash, which would be a flat hash. Something like .. See in my mind, per-recipient config and persistent data storage are more separate. Maybe part of the reason I look at it this way is that in my own implementation, I never really write config for a recipient, I only read it (from my persistent storage, the db, of course). I don't see it necessary to be able to say $rcpt-config( spam_threshold = 10 ) from QP, I'd do it from the UI). Stored things are always written from QP (logging) and sometimes read (auto whitelist ). A hook_user_config plugin would even be likely to make use of the persistent storage itself, but I still see the concepts as split, and you could implement and benefit from either one without the other. When I went to write a reference plugin for hook_user_config, I just thought of one where an admin can just drop config for users into directories on the fs in the event that he wants to override what's already set on the global level; I think the hook structure even fell through to $qp-config() so it could really just begin with an extension of /etc/qpsmtpd and go from there, just like $qp-config() does. That said, for persistent storage I would like to see a more straight-forward API and skeleton API; something like: $txn-get, $txn-store, $rcpt-get, $rcpt-store, and a corresponding hook_get and hook_store that are passed ( $self, $txn, [ $rcpt ] ) so it knows what to key on, though it can even ignore it if it wants. Extra points if QP provides a safe-but-reusing-connections-appropriately-depending-on-the-forking-model DBI handle via $self-qp-dbh, and maybe a $self-qp-cache() for a 'persistent' Cache::FastMmap or Memcached cache, but the plugin could be required to establish the actual data store for itself. Then the plugin goes to town. It can store everything in a generic way, key = value or whatever, or it could map the key names to your own business logic. The reference persistent storage plugin could still implement the type of store you're talking about, and that would work out of the box for people who just want to have greylisting out of the box or etc; but if I'm reading it correctly, even if it's really awesome it's likely a lot of developers would just have to scrap it in favor of what they're already doing, at which point the more flexibility hook_store plugins have the better. # in stock upstream plugins/greylist $rcpt-store( greylist = $self-qp-connection-remote_ip ); # in our internal storage plugin that overrides the generic one # I don't think anyone would actually want this in particular :) sub init { shift-{dbh} = $dbi-connect(...) } sub dbh { shift-{dbh} } sub hook_store { my ( $self, $txn, $rcpt, $conf, $value ) = @_; return DECLINED unless $conf eq 'greylist'; return OK unless $rcpt-notes('user_uid'); my $sth = $self-dbh-prepare_cached( INSERT INTO bobsgreylist (ip,helo,rcpt) VALUES (?,?,?) ); $sth-execute( $value, $self-qp-connection-hello_host, $rcpt-notes('user_uid') ); return OK; } Unlike per-recip config, though we don't have any API etc. written in-house to support generic persistent storage writing, for now we just stick our DBI directly in our plugins, making them even more forked; so this is all purely theoretical and David has the advantage of speaking from some experience :) -Jared
Re: per-recipient configuration
My experience porting the Advenge SMTPD to be a qpsmtpd plug-in indicated that the current interface is entirely adequate for per-recipient things, as long as the plug-in manages its own persistence. I don't think any new hooks are required. On principal I'd agree that it's possible to do it all in the plugins, but if we're talking about making per-recip generally supported officially, it seems best to make that as easy as possible. I really like having the user_config and a $rcpt-config() method to go along with the hook_config and corresponding $self-qp-config(). And the more API support we have, the less extra code is added to plugins to support per-recipient stuff, e.g. the less forked they are from the non-per-recip counterparts. IMO Could you show some code examples? What do you do with the queue plugin(s) if recipients ended up having different headers or bodies due to tagging etc.? Per-recipient features can be provided as a layer consisting of coordinating handlers for existing hooks, the most that needs to be done in my opinion is perhaps to establish a registry of name spaces in the existing per-message notes data, to prevent different plugins from clobbering e/o's data. Is this still necessary if Qpsmtpd::Address::notes() exists? One of the problems is that SMTP doesn't support per-recipient results after DATA. Is QPSMTPD used widely enough that we could introduce new ESMTP extensions with a chance that they could become widely adopted? I believe I have seen a proposed per-recipient results after DATA ESMTP specification, but ... okay I'll search for it ... it was in a thread started by me, in 2006. http://www.ietf.org/mail-archive/web/asrg/current/msg12816.html This would be flippin awesome. Although even if QP has more pull than I think it does, it would take a bazillion years for all of the interwebs to adopt any protocol change. Nevertheless the sooner it starts to change, the better :) -Jared
Re: per-recipient configuration
I assume you meant to CC the list originally, I've added it :) Could you show some code examples? not immediately, but sure What do you do with the queue plugin(s) if recipients ended up having different headers or bodies due to tagging etc.? Advenge only deals with deliver/defer with some fanciness on the defer side because the bounce message asks for some grease. That makes the problem simpler but may be an incomplete solution for some. If we wanted to punt on the problem in mainline QP for a while, I suppose plugins could just only ever modify the global body and avoid or generalize any modifications that wanted to be different per-recipient. But again this is extra burden on the plugins Per-recipient features can be provided as a layer consisting of coordinating handlers for existing hooks, the most that needs to be done in my opinion is perhaps to establish a registry of name spaces in the existing per-message notes data, to prevent different plugins from clobbering e/o's data. Is this still necessary if Qpsmtpd::Address::notes() exists? Is that a persistent data structure keyed on address? Well it's persistent data stored in the Qpsmtpd::Address object itself: sub notes { # in Qpsmtpd::Address.pm my ($self,$key) = (shift,shift); # Check for any additional arguments passed by the caller -- including undef return $self-{_notes}-{$key} unless @_; return $self-{_notes}-{$key} = shift; } I submitted that patch a while ago and I think it was accepted. What's efolder.net? My new employer (it used to be @nmgi.com). They bought DoubleCheck and all its IP, infrastructure, and employees from Network Management Group. Their own product is a cloud backup solution, at some point we'll probably be working on email archiving integrated with that. It was really a nice deal, since the switch I get to work at home and I have a new awesome boss :) -Jared
Re: Rewritten URIBL plugin
I think we should probably consider putting support for parsed messages into core, with the parsing done lazily if requested by the API. It's a good idea, it's only that I haven't yet had the time and justification to write and to test a move from the plugin model (see the attached plugin in my newer post) to an internal API model. The plugin I attached represents a trivially-edited version of a plugin we've been testing in production for quite some time. It probably wouldn't be *that* difficult or risky, though, to take the very same code and put it into the lazy API of your choice. And I think it'd be great, even though we wouldn't actually benefit from the laziness ourselves ;) If you don't hate storing the resulting parsed mime in $txn-notes('mime_body'), though, that would rock, since then it wouldn't break our existing plugin-based forked code ;) -Jared
Re: Rewritten URIBL plugin
I think we should probably consider putting support for parsed messages into core, with the parsing done lazily if requested by the API. I forgot, we did kinda think of a couple of reasons not to want an API. depending on where you put it, you may find QP in general depending on MIME::Parser even if it never uses it. The benefit of the plugin is that /etc/qpsmtpd/plugins controls whether you wind up having to install, and take up your memory with, MIME::Parser You could conditionally 'require', although that does do different things to memory management that I don't understand Another thought, there are different reasons people might want mime to be parsed that benefit from different implementations. If you only want the text data, maybe you want to optimize for that somehow and throw away the binary attachment data, or vice versa (unfortunately there doesn't seem to be a way to do this with MIME::Parser that i've found though...). If you only want the text data *and* you don't mind having it not perfectly reconstructed (like the URIBL) plugin, then perhaps you could come up with a way to parse without MIME::Parser that is more efficient and more lazy (not in the sense you were suggesting earlier :P) -Jared
Re: Rewritten URIBL plugin
- Introduces support for URIBL services that may not have worked right, at least out of the box, before. Defines the subtle differences between various known URIBL services in order to maximize compatibility. Is it worth pulling some of this config out of the code and putting it into some sort of config file? Probably would be worth it, and not that difficult to do. Any new config scheme should probably be both backward compatible and non-spaghetti. The current scheme looks like: multi.uribl.com 2,4 deny the new plugin does add one additional feature, you can now do: multi.uribl.com 2 deny multi.uribl.com 4 log but $zones defines a bunch of other things - labels for the zone in general and specific masks, check_ip, check_host, host_wildcards_supported, lookup_a_record, lookup_ns_record. Ignoring the label issue, all the other directives are related only to the zone; perhaps we could add a new scheme on top of the mask definition scheme (which we would continue to read in the old way), getting even further away from the one-line-per-zone style: multi.uribl.com action deny multi.uribl.com check_ip 1 multi.uribl.com check_host 1 multi.uribl.com lookup_a_record 0 multi.uribl.com lookup_ns_record 0 multi.uribl.com host_wildcards_support 0 If nobody hates this scheme it does seem like something I could add pretty easily send an updated copy - Uses Net::DNS::Async to simplify the code, and also to ensure the afore-mentioned A and NS lookups will prompt new URIBL lookups in an efficient and simple manner via callbacks Does the code still work with the async qpsmtpd cores? Well... I didn't change the existing plugin a whole lot; I did change from data_handler() registered via register(), to plain old hook_data_post(); this may need changed considering that the comments above each sub (which I also removed) suggested that hook_data_post() and register() were 'not used' for async. These things could easily be changed if needed. However we don't run async anywhere, not even in test environments, so I'm not very qualified to know if they need changed, or give any guarantees. And I'm confused by that bit. From what I can tell, the async/uribl plugin ignores plugins/uribl entirely and uses Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS. Now I noticed the async rhsbl plugin used Q::P::A::D and that this used ParaDNS before and looked at it briefly to see if I could use it for my purposes, but at the top of the docs it's mentioned that only A lookups are supported, so I skipped it and went to Net::DNS::Async. I don't really know the strongpoints of N::D::A vs. ParaDNS but it did note that the latter more expressly claims to be interested in performance and scalability. So in conclusion: (1) I think I'll look into using ParaDNS and maybe even Q::P::Async::DNSBLBase if appropriate, even though my plugin is not in async; it might still have advantages, and the interesting logic is in the callbacks and not the module use anyway (2) I don't know if it matters whether this particular plugin works with async, I guess it depends on whether the old one really ever did. At least Net::DNS::Async should not make it any more blocking than the old plugin's collect_results(). At any rate somebody who plays in async territory should probably comment before any action is taken on the new plugin :) Attached also is tld_lists.pl, a companion file that needs to be dropped in lib/Qpsmtpd/ which provides the list of first, second, and third level TLDs that we care about. It's derived from our URIBL datafeed as well as Do the owners of that data care about it being used this way? You may be violating any agreement with them. Would they be ok if this was released as an independent CPAN module?Either way, can we structure this as an API instead of just an include file? (1) I suspect that anyone who runs a URIBL service would be thrilled with this information being available for this particular use, because it minimizes unnecessary queries to their services. Consider the old method that sent 2 or 3 queries instead of one because it didn't know which one was right. The fact that SURBL and URIBL both publish the two and three level lists and encourage their use is evidence of this :) (2) First-level TLDs are public information, of course, and with the two-level and three-level lists being public, the interesting business is that the first-level list and the SURBL second and third level lists are pruned based on URIBL's data. So it comes down to whether anyone can be considered to have a claim on the privilege of knowing how to *exclude* results. I guess maybe they could if it came down to it... (3) I'm pretty sure I could get URIBL's explicit permission, but it would take a long time. The founder and sole owner of URIBL was also the original author of DoubleCheck, so I feel we would have a bit of an in :) However even so he always takes forever to get back to us in
Re: Rewritten URIBL plugin
Do the owners of that data care about it being used this way? You may be violating any agreement with them. Would they be ok if this was released as an independent CPAN module?Either way, can we structure this as an API instead of just an include file? Forgot to mention, yes, I'd be happy to add the few lines it would take to make this an API, even if it's just in lib/Qpsmtpd. It might be nice for us to get datafeeds with other services (we're looking to at least get one from DBL, possibly SURBL and imvURI) before publishing to CPAN, mainly so that we know for sure that we have not over-pruned the first level or SURBL second third level lists. As they are though, I consider the lists good enough to warrant production use of this method over the current method in the existing QP uribl plugin. After all any TLD that manages to make it onto other services but has garnered *zero* hits on URIBL, probably has very very few entries on those other services. Anyway, if it's a bit of a licensing/copyright problem to use the data from URIBL, it'll be more of one to use the data from other services as well, but we can probably work that question in from the get-go -Jared
Re: Rewritten URIBL plugin
On Sun, Jul 25, 2010 at 04:43:36AM -0500, Jared Johnson wrote: The plugin has the following advantages over the original: Based on a single read-through of the code, I like most of it. Some assorted observations, though: - Parsing out the MIME sections is worthwhile, though it's really something that ought to be done in the core libs when plugins indicate they need it, or in a utility plugin if that's practical. This was something I meant to bring up a while back -- we have quite a few different content-scanning plugins, and if they're all obliged to invoke MIME::Parser repeatedly it'll be even more costly than that module already is. Not that I have time to do it myself, though. :P Our internal plugin actually just uses $txn-notes('mime_body') instead of creating a new $msg, because we do indeed have such a utility plugin. A bit short of being transparently handled by internals, but I could at least include code in the plugin to use $txn-notes('mime_body') instead of parsing when that exists. This would also allow our internal plugin to be slightly less forked from what winds up in upstream :) Your idea of doing this internally sounds cool to me, but we're unlikely to get around to implementing it ourselves any time soon given the system that already exists is working OK. And unfortunately our utility plugin is really just one aspect of a bigger plugin that wouldn't really be useful to QP, and we have no method of 'asking' for MIME parsing from said plugin since for our purposes it's *always* necessary to parse mime. - For messages below a few hundred kb or so, consider MIME::Parser-output_to_core(), so we can avoid the fs churn. Annoyingly, MIME::Parser doesn't provide a mechanism to only consider particular mimetypes, or we could skip decoding non-text bodies. The MIME::Parser docs are pretty insistent that using output_to_core is detrimental to performance in general, and should only be bothered with when there are actual disk constraints. I've never tested this theory for really small messages, but it's kept me away from trying it :) If it's not very difficult, perhaps it could be optional via a plugin argument, and later via the handy dandy general mime-parsing-needed-feature if it's ever written - It looks like you're scanning inside all parts of multipart MIME, where doing so only on text parts might be preferable. No, in scan_body() we do: return unless $part-effective_type ~~ [qw( text/plain text/html )]; I think that's what we want anyway... - The config really needs to live in a config file -- changing or expanding the format is fine if needed, but config that lives in code can't really be edited. I'm not bothered by having per-list defaults in the plugin, but one should be able to fully add a new list or alter an existing one without changing Perl code. I agree. If you don't object to the spec I proposed in my reply to Robert, I can get that changed soon. - In the unwinding of HTML entity escaping -- I suggest doing amp; last. amp;gt; is gt;, not . Good idea, thanks :) the SURBL two-level and three-level lists -- but the SURBL lists are pruned to exclude stuff that we're pretty sure is never going to actually get listed -- when was the last time you saw a spammer use .mil? uribl.com never has, according to our datafeed :) We'll probably see compromised .gov and .mil sites hosting malware, much as we see in other domains. They're not that useful for classical spammer purposes since they can't be registered in the usual fashion. Ideally, we should only bother querying services for URIs that have some chance of being listed on those services, regardless of their chance of actually being used in spam. Thus, if the datafeeds don't have any hosts with .mil, there's no point checking them. However, (1) we're only deriving our list from URIBL right now, and even as we expand we can never say that we derived our list from every service that someone might use; (2) even if I ran my tld_list generating script every day against every service and sent in an updated list as soon as something got listed on .mil, there would still be the lag time to get a release out with the updated list and then have people upgrade, during which time there would be some misses. I think that using 'pruned' lists is eventually going to cause some misses somewhere. For our own use, since we have control of update timelines, it's worth the risk, and we also don't have to worry about legal issues with using the data. If either of these is a big enough problem for QP in general, though, you could work around it by using an un-pruned tld_lists.pl. This would give %firstlevel_tlds 263 entries instead of 128; %surbl_secondlevel_tlds would have 3,801 instead of 2,426; and %surbl_thirdlevel_tlds would have 155 instead of 135. If you're worried about the legal issue but *not* the timeliness issue, you could still exclude a few
Re: Rewritten URIBL plugin
And I'm confused by that bit. From what I can tell, the async/uribl plugin ignores plugins/uribl entirely and uses Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS. In that case, it's possible (and likely) that it may never have worked with async. When time is short, it's easy to just ask questions :) OK, I looked closer and I guess the async plugin actually 'inherits' from the non-async one, so it uses its methods. That's a new topic for me but I think I can come up with the necessary changes to the non-async plugin and maybe a patch to the async one to use it correctly. But I won't be able to test the theory :) -Jared
Re: per-recipient configuration
Having a common API for per-recipient things have long been on the todo list. We've talked about it a couple times before; but the requirements/needs/wishes never sunk in deep enough in my head to do anything about it. So: How do you use this? My thoughts all along have been to just have some way to make a find user data plugin that'll fetch a per-recipient blob of information that other plugins can, uh, plug into. The big stumbling block is what the common infrastructure for dealing with diverting needs of the recipients should be. What would we need in terms of per-recipient header rewriting, post-DATA processing etc? It is kid of a big scary can of worms. In large part because AFAICT, a number of shops including my own have grown their own solutions, often leaving things just as forked as we are. It seems that everybody's doing it in different, incompatible ways which each shop is fully pleased with and would find it difficult to move back way from, and really each method seems valid enough. It may be workable to try to support multiple different treatments, or it may be necessary to choose one and go with it. The latter would mean that some existing users might just remain forked, but if you made the right choices then probably new users would never bother to move away. There was a time when this was more at the forefront of my mind and I was interested enough to 'put together a proposal', I sent some ideas to the list at the time but now I'm not so sure we even follow those ideas in our own fork; we've moved beyond, and yet what we have now is still a work in progress. So to 'do it right the first time' some new thinking would have to take place. But FWIW, here's some about what we do now: - Qpsmtpd::Address has a config() method which passes through to plugins that call 'hook_user_config'. Note that I used to call this 'hook_rcpt_config' but really it's quite valid to call $txn-sender-config(...) when you've determined that this is an outbound message and you want to know something about the sender's preferences instead. The way I implemented this is a bit of a mess. At any rate I think I submitted a patch to do this sort of thing, but I might not have managed to stick with the review process long enough to answer everyone's concerns/questions. Or maybe it was accepted! I can't really remember ;) - Qpsmtpd::DSN is heavily modified so that it can be used to store information in Qpsmtpd::Address (which itself has methods to manipulate such things) about what we have decided we want to do with each recipient and why we have chosen to do it; e.g. ( Rejected = 'Spam' ). - Qpsmtpd::Transaction has been enhanced so that recipients() returns only the recipients that we have (thus far) decided we're going to be delivering to, and all_recipients() and rejected_recipients() have been added to give information about recipients that we have decided not to accept. Keeping these lists around is useful for logging but may also be necessary depending on how one decides to deal with the issue of we've reached the end of DATA and want to reject this recipient and accept this one based on their different preferences, but we can only give one response. Obviously we respond with '250' in this case; as for the 'rejected' recipients, some choose to drop them silently; some have a 'quarantine' method in place and choose to quarantine for these recipients; some choose to bounce (this last method is the most reviled, but what can I say? I don't want to do it, but my lead developer does. And the situation is pretty rare anyhow). The content of these methods in our forked code might shed a little light on how they work with the DSN data stored in each recipient: sub recipients { my $self = shift; @_ and $self-{_recipients} = [...@_]; return () unless $self-{_recipients}; return grep { ! $_-dsn or $_-dsn-action ~~ [qw( Accepted Delivered Queued Quarantined )] } @{$self-{_recipients}} } sub rejected_recipients { my $self = shift; @_ and $self-{_rejected_recipients} = [...@_]; return () unless $self-{_recipients}; return grep { $_-dsn and ! $_-dsn-action ~~ [qw(Delivered Queued Quarantined)] } @{$self-{_recipients}} } sub all_recipients { my $self = shift; return () unless $self-{_recipients}; return @{$self-{_recipients}}; } - All of our per-recip-pref-aware post-data scanning plugins loop through each to-be-accepted recipient and determine what we want to do for each recipient. Then a single plugin afterward handles the results and responding to the client. So basically, an empty $txn-recipients() becomes a short-circuit for whatever post-data plugins are left in the mix. We actually don't do this with the DSN objects; we use a separate 'class' note to denote that something i 'spam', 'clean', 'whitelisted', etc. Then the last plugin sets the DSNs for each recipient. Maybe we should have figured a way to do that, idunno. - Each recipient object can
Re: per-recipient configuration
sub rejected_recipients { my $self = shift; @_ and $self-{_rejected_recipients} = [...@_]; I noticed after posting this that the line above is a no-op. Ha :) return () unless $self-{_recipients}; return grep { $_-dsn and ! $_-dsn-action ~~ [qw(Delivered Queued Quarantined)] } @{$self-{_recipients}} } -Jared
Re: Rewritten URIBL plugin
Attached also is tld_lists.pl, a companion file that needs to be dropped in lib/Qpsmtpd/ which provides the list of first, second, and third level TLDs that we care about. It's derived from our URIBL datafeed as well as Would they be ok if this was released as an independent CPAN module?Either way, can we structure this as an API instead of just an include file? I've already pretty much done the work to access tld_lists as an API, but I'm now doubting the use unless it is released to CPAN: it really makes the most sense to have Qpsmtpd::TLDLists.pm itself still pull in the contents of tld_lists.pl as I posted them, in a 'require', because those contents are created by an automated script. It seems silly to require the modification of said automated script if we want it to generate different logic. But when Qpsmtpd::TLDLists require's tld_lists.pl, then comes the question 'if Qpsmtpd::TLDLIsts is the only thing that accesses this data, and plugins/uribl is the only thing that uses Qpsmtpd::TLDLists, then why doesn't plugins/uribl just read the data itself? You guys can have it whichever way you want it but I say, don't be afraid to treat a .pl file as if it were basically a static, shipped config file in a format easily read by perl plugins :) -Jared
Re: URIBL plugin and 'semicolon insertion munging'
This seems to be a big improvement at least on the 3 million lines of random traffic i tested with, and it's a smaller patch: [snip] Well, it may have been an improvement over my own data, but a colleague pointed out the following case: check out spamsite.com;it's awesome! And this didn't deal with Devin's example of http://spamsite.com;.net either At this point it's a bit beyond testing on the original plugin for me, but here's what I came up with given my own modified hostname re loop (the minimal entity decoding + stripping is still needed) while ( m{ ( (?: [a-zA-Z0-9:./-]+ @ )? [a-zA-Z0-9][a-zA-Z0-9.;-]+\.$tld ) (?! \.?\w ) }gxo ) { my $host = lc $1; # Deal with inserted-semicolon munging, e.g. 'http://foo;.com' if ( my @split = $host =~ /(.*?);(.*)/ ) { my @h = split /\./, $split[0]; if ( $h[-1] =~ /^$tld$/ ) { # 'foo.com;.net', 'foo.com;.net;.foo', 'foo.com;it's great' $host = $split[0]; } else { $split[1] =~ /([^.;]+)/; $split[1] = $1 if $1; # 'foo;.com;.net', 'foo;.com' $host = $1 ? $split[0].$1 : $split[0]; } } This works at least for the test cases i've come up with, and doesn't otherwise change processing from what _I_ had before for the other few million lines i'm checking against, but is not yet in production and the change alone is not tested against the vanilla code. If anyone's interested, the loop continues, dealing with excluding email addresses while including user:p...@site: if ( $host =~ s/.*\.\.// ) { next unless $host =~ /\./; } if ( $host =~ s/^\w{3,16}:\/+// ) { # 'http://realsite.com/cgi/em...@email.host.com' # 'http://user:p...@realsite.com' $host =~ s/\/.*//; $host =~ s/.*@//; } elsif ( $host =~ /(.*):/ ) { # 'user:p...@realsite.com' # 'mailto:em...@emailhost.com' next if $1 =~ /mailto/; $host =~ s/.*@//; } else { # 'realsite.com/cgi/em...@email.host.com' $host =~ s/\/.*//; next if $host =~ /@/; } This is the main reason for the different RE above. Also it obsoletes the need for two RE loops. -Jared
Re: URIBL plugin and 'semicolon insertion munging'
while ( m{ ( (?: [a-zA-Z0-9:./-]+ @ )? [a-zA-Z0-9][a-zA-Z0-9.;-]+\.$tld ) (?! \.?\w ) }gxo ) { my $host = lc $1; # Deal with inserted-semicolon munging, e.g. 'http://foo;.com' if ( my @split = $host =~ /(.*?);(.*)/ ) { my @h = split /\./, $split[0]; if ( $h[-1] =~ /^$tld$/ ) { and I forgot to mention how $tld is defined. my %firstlevel_tlds = map { $_ = undef } # 128 entries qw( ae aero am ar as at au az ba bd be bg bh biz br bs by bz ca cat cc ch cl cm cn co com cx cy cz de dk ec ee eg es eu fi fr gd ge gg gr gs hk hm hn hr hu id ie il im in info io ir is it je jp ke kg kr kz la li lt lv ma md me mk mn mobi ms mu mx my name ne net nl no np nu nz org pe ph pk pl pr pro pt py ro rs ru sa sc se sg si sk st su tc th tk tl to tp tr tt tv tw tz ua uk us uy uz vc ve vn ws za ); my $tld = my $tld_lc = join '|', keys %firstlevel_tlds; $tld =~ s/(\w)/[$1\u$1]/g; $tld = (?:$tld); the list is derived from our URIBL datafeed, e.g. we're only interested in tld's that actually hit some lists in the first place, therefore skipping things like .mil which spammers don't seem to have the guts to use. This admittedly makes it not a *perfect* solution for munging, but close enough IMO. We don't (yet) have DBL or SURBL datafeeds, unfortunately, but the SURBL two-level and three-level lists are derived from the interwebs and not from spam traffic AFAICT, and DBL doesn't have a list because they support wildcards. If anyone has an SURBL and/or DBL datafeed and is interested, I could provide a script to generate a list from those to see if anything should be added. The substitution business is to make the RE into a big fat ugly mess with things like '[cC][oO][mM]', so that we can take out the /i, which makes things about twice as fast. but it's not strictly necessary :) -Jared
Re: URIBL plugin and 'semicolon insertion munging'
Ah, I tried it in my own FF and the trick still works :) It seems like all you have to do to get around the nbsp; etc. problem is to wait a little longer before applying the fixup -- allow the semicolon to match in the hostname search and then strip it out. Attached a patch, untested (though the same change is tested on my now-greatly-forked uribl plugin)... I'm sad to say I haven't been on github for a while so it may be an age before i get around to submitting a patch according to the standards, but if anybody else is interested in picking it up here it is :) Also, was it intended that the URI finding sub should find RHS hostnames in email addresses as well? I'm pretty sure it does... although again, the testing I'm doing is no longer on the official code. It seems undesirable to me, I'm currently working on avoiding such hits while still getting things like http://citibank.com:f...@phishingsite.com I got approval from my boss yesterday to submit my updates to the plugin to the ML, btw, which I'll do when it's a bit closer to finished, probably the beginning of next week. It's a huge change though, and notably adds dependancies on MIME::Parser and Net::DNS::Async and completely changes the config file format. I don't really have approval to spend hours creating digestible individual patches... but posting the whole plugin is better than not posting any code, right? :) -Jared On Wed, Jul 21, 2010 at 07:54:30AM -0500, Jared Johnson wrote: Unlike the other bits of dodge this sort of munging operations, examining my test results and asking uncle google has not made it clear to me what inserted-semicolon munging really is. Can anyone shed light on My memory of this is fuzzy, but my SVN log indicates it was an attempt to deal with a style of munging that exploited different browser behavior on encountering semicolons in the hostname component of URLs. At the time, the form I was considering was http://domain;.com/;. Under firefox, the semicolon is taken to mean the end of the hostname and the start of the path, thus triggering the implied-.com behavior and ending up at http://domain.com/;.com/;. I forget what IE does/did, but given the IE/FF market share balance in 2005 when I added that feature, it was probably similar. In fact the problem is a lot more complex, because the five major browsers out there all deal differently with deviations from the norm in URLs, and spammers exploit those deviations to mislead parsers. For the case I had in mind, stripping semicolons out would be the right thing, but will be misled by the munge pattern http://domain.com;.com/foo;. The uribl plugin deals with a lot of the munging tricks that were common back when it was written, but it's probably not comprehensive today and it's definitely suceptible to picking up bogus hostnames (e.g. the nbsp;.net behavior you note) based on what's left after the known munging tricks are unwound. http://code.google.com/p/browsersec/wiki/Part1#Uniform_Resource_Locators has a summary of some of the deviations, although it doesn't address this specific one. -- Devin \ aqua(at)devin.com, IRC:Requiem; http://www.devin.com Carraway \ 1024D/E9ABFCD2: 13E7 199E DD1E 65F0 8905 2E43 5395 CA0D E9AB FCD2 --- uribl.orig 2010-07-23 17:06:10.894320796 -0500 +++ uribl 2010-07-23 17:08:34.314345909 -0500 @@ -290,8 +290,6 @@ $l =~ s/[=%]([0-9A-Fa-f]{2,2})/chr(hex($1))/ge; # Undo HTML entity munging (e.g. in parameterized redirects) $l =~ s/#(\d{2,3});?/chr($1)/ge; -# Dodge inserted-semicolon munging -$l =~ tr/;//d; while ($l =~ m{ \w{3,16}:/+# protocol @@ -339,7 +337,7 @@ } while ($l =~ m{ ((?:www\.)? # www? - [a-zA-Z0-9][a-zA-Z0-9\-.]+\. # hostname + [a-zA-Z0-9][a-zA-Z0-9;\-.]+\. # hostname (?:aero|arpa|asia|biz|cat|com|coop|# tld edu|gov|info|int|jobs|mil|mobi| museum|name|net|org|pro|tel|travel| @@ -347,6 +345,8 @@ )(?!\w) }gix) { my $host = lc $1; +# Dodge inserted-semicolon munging +$host =~ tr/;//d; my @host_domains = split /\./, $host; $self-log(LOGDEBUG, uribl: matched 'www.' hostname $host); @@ -372,7 +372,7 @@ \w{3,16}:/+ # protocol (?:\S+@)? # user/pass ( - [a-zA-Z0-9][a-zA-Z0-9\-.]+\. # hostname + [a-zA-Z0-9][a-zA-Z0-9;\-.]+\. # hostname (?:aero|arpa|asia|biz|cat|com|coop|# tld edu|gov|info|int|jobs|mil|mobi| museum|name|net|org|pro|tel|travel| @@ -380,6 +380,8 @@ ) }gix) { my $host = lc $1; +# Dodge inserted-semicolon munging +$host =~ tr/;//d
Re: URIBL plugin and 'semicolon insertion munging'
It seems like all you have to do to get around the nbsp; etc. problem is to wait a little longer before applying the fixup -- allow the semicolon to match in the hostname search and then strip it out. My bad.. I guess the plugin currently only fixes up '#\d\d\d' encoding, not nbsp; etc. maybe i'll work on that... -Jared
Re: URIBL plugin and 'semicolon insertion munging'
It seems like all you have to do to get around the nbsp; etc. problem is to wait a little longer before applying the fixup -- allow the semicolon to match in the hostname search and then strip it out. My bad.. I guess the plugin currently only fixes up '#\d\d\d' encoding, not nbsp; etc. maybe i'll work on that... This seems to be a big improvement at least on the 3 million lines of random traffic i tested with, and it's a smaller patch: --- uribl.orig 2010-07-23 17:06:10.894320796 -0500 +++ uribl 2010-07-23 19:57:39.304321519 -0500 @@ -289,7 +289,13 @@ # Undo URI escape munging $l =~ s/[=%]([0-9A-Fa-f]{2,2})/chr(hex($1))/ge; # Undo HTML entity munging (e.g. in parameterized redirects) -$l =~ s/#(\d{2,3});?/chr($1)/ge; +$l =~ s/#(\d{2,4});?/chr($1)/ge; +# Un-encode a few common important named entities and discard the rest +$l =~ s/nbsp;/ /go; +$l =~ s/amp;//go; +$l =~ s/gt;//go; +$l =~ s/lt;//go; +$l =~ s/\w{2,6};//go; # Dodge inserted-semicolon munging $l =~ tr/;//d;
URIBL plugin and 'semicolon insertion munging'
Hi, I'm working with Devon Carraway's URIBL plugin and have been testing its effectiveness in finding URI's using 6 million lines or so of email traffic from a day in the life of our mail servers. My testing has shown that the following line in the while ( $l = $transaction-body_getline ) loop within lookup_start is problematic: # Dodge inserted-semicolon munging $l =~ tr/;//d; Unlike the other bits of dodge this sort of munging operations, examining my test results and asking uncle google has not made it clear to me what inserted-semicolon munging really is. Can anyone shed light on how semicolons could be used to obfuscate URIs so the URIBL plugin can't detect them? If I have an understanding of this, perhaps I can come up with a safer alternative. I'll paste some of the output of my test script to demonstrate the effect of tr/;//d. The 'Original result' is what we find if we're using tr/;//d, the 'New result' is what we find without it. TDnbsp;BRequired:/BBRBRFONT size=2 face=Tahomanbsp;.NET/FONTBRFONT size=2 face=Tahomanbsp;Blah/FONTBRFONT size=2 face=Tahomanbsp;Blah/FONTBR/TD Results differ! Original result: nbsp.net New result: no match Wichita, KS 67204, USA nbsp;nbsp;www.somesite.comnbsp;nbsp; = Results differ! Original result: nbspwww.somesite.com New result: www.somesite.com ... you gets the picture. -Jared
[PATCH (maybe)] Don't reset_transaction between queueing and responding
So, our fork has a few logging and bayes training mechanisms that run in hook_reset_transaction. They are all very fast, most of the time; but recently one of them broke and started running very slowly. The result: QP would inject an incoming messages into the queue, enter hook_reset_transaction, and 20 minutes later try to respond '250 Queued!'. Of course, by then clients had given up and assumed their messages were not delivered, resulting in duplicate messages and bounces and such. Here's the patch that I applied to guard against this. On principal, we should do as little as possible between queueing a message and telling the client we queued; if a plugin really needs to something at this juncture, it should probably be using hook_queue_post. Am I right? Or is this going to break something and bite me later :) I haven't pushed this to my git yet, but I will if nobody points out any fatal flaws or other reasons to change the patch first -Jared diff --git a/lib/Qpsmtpd/SMTP.pm b/lib/Qpsmtpd/SMTP.pm index 258282c..7d75d4e 100644 --- a/lib/Qpsmtpd/SMTP.pm +++ b/lib/Qpsmtpd/SMTP.pm @@ -818,10 +818,9 @@ sub queue_pre_respond { sub queue_respond { my ($self, $rc, $msg, $args) = @_; - # reset transaction if we queued the mail - $self-reset_transaction; - if ($rc == DONE) { +# reset transaction if we queued the mail +$self-reset_transaction; return 1; } elsif ($rc == OK) { @@ -840,6 +839,7 @@ sub queue_respond { $msg-[0] ||= 'Queuing declined or disabled; try again later'; $self-respond(451, @$msg); } + $self-reset_transaction; # And finally run any queue_post hooks $self-run_hooks(queue_post);
RE: Released 0.83
From: Ask Bjørn Hansen [...@develooper.com] Sent: Tuesday, September 15, 2009 5:22 PM To: qpsmtpd@perl.org ML Subject: Released 0.83 prefork: More robust child spawning (Jared Johnson) You might as well credit this one to Peter Samuelson while you're at it. That's my co-worker who gave me the patch. He wanted it submitted upstream but didn't feel like getting on github :) -Jared
Re: Regarding config plugins...
Patches submitted where? -Jared On 09/09/2009 04:43 PM, Jason Mills wrote: Okay patches submitted. For for the time delay but I had a bit of hard drive scare last night after I finished the code but before I could spin the patch. Any input, suggestions and or scorn is welcome :) - J On Tue, 2009-09-08 at 21:22 -0700, Jason Mills wrote: Working on it right now :) On Tue, 2009-09-08 at 21:17 -0700, Ask Bjørn Hansen wrote: On Sep 8, 2009, at 21:05, Jason Mills wrote: So to the point of the question: Would anyone mind if I patched the configuration stuff within Qpsmtpd to allow configuration minded plugins to alter/hook/stack into how Qpsmtpd discovers and utilizes cached configuration values? Show us a patch! Changes to make it easier or more powerful to write plugins are generally welcome. (patches with unit tests are even better...) - ask
Re: [PATCH] Log even when we aren't in a transaction
On 08/23/2009 04:03 PM, Matt Sergeant wrote: On Thu, 20 Aug 2009 10:18:39 -0500, Jared Johnson wrote: It looks like logging/file doesn't like the empty hashref returned by Qpsmtpd::transaction(). I never understood why it did that. Any reason it can't return either undef or (preferably) a new Transaction object? I don't really understand it either, and I fear that which I don't understand, so I worry about taking it out and breaking some hackage that depends no it :) BTW, a bit of a cleaner work-around to this problem that we've tested successfully is to add 'sub transaction { return shift }' to Transaction.pm -Jared
Re: [PATCH] Log even when we aren't in a transaction
This should allow the logging/file plugin to log even if it isn't called from within a transaction Thanks, applied and pushed. The conditional around where you made the first change are a bit of a mess (indicating that there's a better way to do whatever it does), but I didn't look closer ... :-) Well, this is in production now and hasn't actually fixed the issue.. now I realize that we're getting Can't call 'notes' on unblessed method reference.. not Can't call 'notes' on undef.. I guess I'll have to look closer :) -Jared
Re: [PATCH] Log even when we aren't in a transaction
On 08/14/2009 04:31 PM, Ask Bjørn Hansen wrote: On Aug 14, 2009, at 14:14, Jared Johnson wrote: This should allow the logging/file plugin to log even if it isn't called from within a transaction Thanks, applied and pushed. The conditional around where you made the first change are a bit of a mess (indicating that there's a better way to do whatever it does), but I didn't look closer ... :-) Well, I didn't have a closer look at logging/file itself, but I found that it was Qpsmtpd::Transaction::DESTROY() that was trying to log with $self-log() and failing. It looks like logging/file doesn't like the empty hashref returned by Qpsmtpd::transaction(). The following patch does clear up the issue for us, although a more general solution might be in order. Thoughts? index 125c9a5..6053259 100644 --- a/lib/Qpsmtpd.pm +++ b/lib/Qpsmtpd.pm @@ -408,8 +408,10 @@ sub run_hooks_no_respond { my ($self, $hook) = (shift, shift); if ($hooks-{$hook}) { my @r; +my $transaction = $self if ( ref $self ) =~ /Transaction$/; +$transaction ||= $self-transaction; for my $code (@{$hooks-{$hook}}) { -eval { (@r) = $code-{code}-($self, $self-transaction, @_); }; +eval { (@r) = $code-{code}-($self, $transaction, @_); }; $@ and warn(FATAL PLUGIN ERROR [ . $code-{name} . ]: , $@) and next; if ($r[0] == YIELD) { die YIELD not valid from $hook hook; -Jared - ask -- -BEGIN ANTISPAM SIGNATURE- DoubleCheck identified this message as CLEAN. If this is incorrect, help improve accuracy: Classify this message as SPAM: http://mg1.nmgi.com/message/spam?token=rv.R1EgQhrnd More options: http://mg1.nmgi.com/message/details?token=rv.R1EgQhrnd -END ANTISPAM SIGNATURE-
patch for minor logging plugin crash
We noticed that logging/file sometimes emitted this error: XX: Can't call method notes on unblessed reference at /usr/share/qpsmtpd/plugins/logging/file line 272. It appears this is because QP internal sometimes legitimately calls -log before a transaction object exists. The patch I'm sending should prevent logging/file from crashing in that event. I'm not really sure how to get git send-email to attach this description to my actual patch email, so stay tuned for the patch itself :) -Jared
[PATCH] Log even when we aren't in a transaction
From: Jared Johnson jar...@jello.nmgi.com This should allow the logging/file plugin to log even if it isn't called from within a transaction --- plugins/logging/file |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/plugins/logging/file b/plugins/logging/file index 31292ad..5717fe5 100644 --- a/plugins/logging/file +++ b/plugins/logging/file @@ -263,13 +263,15 @@ sub hook_logging { # - It's not already open # - We're allowed to split sessions across logfiles # - We haven't logged anything yet this session +# - We aren't in a session if (!$self-{_f} || !$self-{_nosplit} || +!$transaction || !$transaction-notes('file-logged-this-session')) { unless (defined $self-maybe_reopen($transaction)) { return DECLINED; } -$transaction-notes('file-logged-this-session', 1); +$transaction-notes('file-logged-this-session', 1) if $transaction; } my $f = $self-{_f}; -- 1.6.0.4
Re: invalid sender creates havoc
On 07/30/2009 07:11 AM, Peter J. Holzer wrote: On 2009-07-29 14:07:53 -0500, Jared Johnson wrote: We recently noticed a message in our postfix queue that thought it was addressed to foo.com. After examining it, it turned out that Qpsmtpd accepted a MAIL FROM command formatted like so: MAIL FROM:u...@d.com,foo.com Weird. Ah, I see the problem. In line 187 of Qpsmtpd::Address the regexp for the domain is initialized like this: my $domain = (?:$address_literal|$subdomain(?:\.$subdomain)*); (looks like I didn't know about qr// when I wrote that code) That's a double quoted string, so \. needs to be written as \\.. Ah, that fixes it. Incidentally, I think that might have been introduced by a patch that I submitted. I think I recall there was a bit of discussion over using qr// but there was some good reason not to, but the reason itself completely escapes me :) -Jared -- Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
Re: invalid sender creates havoc
On 07/29/2009 09:39 PM, Ask Bjørn Hansen wrote: On Jul 29, 2009, at 12:07, Jared Johnson wrote: We recently noticed a message in our postfix queue that thought it was addressed to foo.com. After examining it, it turned out that Qpsmtpd accepted a MAIL FROM command formatted like so: MAIL FROM:u...@d.com,foo.com Weird tha postfix would munge it. I thought don't make stuff up was one of the reasons why everyone upgraded from sendmail. Anyway... :-) To be fair, I realized later that postfix doesn't munge it, our postfix queue parsing stuff munged it :) -Jared -- Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
invalid sender creates havoc
We recently noticed a message in our postfix queue that thought it was addressed to foo.com. After examining it, it turned out that Qpsmtpd accepted a MAIL FROM command formatted like so: MAIL FROM:u...@d.com,foo.com When it passed it to postfix, it transmographied it first into the sender 'u...@domain.com, something.com' and then into the sender 'u...@domain.com' and the recipient ' something.com'. That's a bit silly, but it seems that QP never should have accepted the message in the first place. I tested using vanilla git Qpsmptd::Address as follows: perl -MQpsmtpd::Address -le \ print Qpsmtpd::Address-new('u...@d.com,foo.com') u...@d.com,foo.com Taking a look at Qpsmtpd::Address, I noticed some comments and code that seemed to be looking for similar syntax: line # A-d-l = At-domain *( , A-d-l ) # ; Note that this form, the so-called source route, # ; MUST BE accepted, SHOULD NOT be generated, and SHOULD be # ; ignored. ... # strip source route $path =~ s/^...@$domain(?:,\...@$domain)*://; It looks to me like it's trying to find syntax like: MAIL FROM:u...@d.com,@foo.com and make it equivalent to just MAIL FROM:u...@d.com That doesn't seem to be what it actually does, though: perl -MQpsmtpd::Address -le \ print Qpsmtpd::Address-new('u...@d.com,@foo.com') u...@d.com\,@foo.com Does anyone have a good enough grasp on rfc2821 and Qpsmtpd::Address::canonify() to know just what canonify() is actually supposed to be doing with source routes and why it's managing to allow this obviously invalid syntax through instead? -Jared -- Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
Re: problems with authentication and qpsmtpd-prefork
This thread may or may not be applicable: http://www.nntp.perl.org/group/perl.qpsmtpd/2008/07/msg8134.html -Jared On 07/09/2009 07:15 AM, Stefan Priebe wrote: Hi! I've massive problems with authentification and qpsmtpd. After some time i always get a you're already authorized. This is cause by not deleting the key: $qpsmtpd-{_auth} after a client close his connection. Then later when the next client connects to the same child it is already authorized by $qpsmtpd-{_auth}. I could solve this by adding a delete($qpsmtpd-{_auth}); to qpsmtpd-prefork after # continue to accept connections until old age is reached for (my $i = 0 ; $i $child_lifetime ; $i++) { But i don't think this is the correct way. Or do i miss something? Regards, Stefan -- -- Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
Re: PATCH [1/5] Disconnect badhelo
One could however offer a alternate standard set of plugins implementing the discussed ideas, I think. One could. I've been considering it, but mine's not as configurable (in terms of handling mailer behaviour on a hit) as I'd like. Probably wouldn't be hard to rework that bit. The project already maintains multiple different daemon architectures, and a separate set of plugins for -async... IMO it would be an unnecessary burden to maintain an alternate set of plugins. I agree with Chris that methodology changes done right would make plugins simpler, though, eliminating the need for this. Of course the semi-related issue of per-recip configuration directives does seem like something that could complicate plugins, unless that complexity were moved to core somehow (e.g. the plugin says DENY and core possibly decides to do something else) -Jared
Re: PATCH [1/5] Disconnect badhelo
I like this approach, but it's had its complications for us because we don't want to make the compromise of doing the rest of the checks at a given stage needlessly; e.g. if our RBL plugin hits, we don't want to bother doing RDNS and SPF lookups only to say 'sorry, you were on an RBL' afterward. Have you come up with an intuitive way to deal with this sort of thing? We have come up with some ways, but I don't know that I'd call it intuitive, yet :) One thing we've done that dealt with the problem somewhat was to expand the idea of the DSN object to be more customizable and to specify all the information we need for logging, and for deciding what to do when a particular filter hits. It hasn't really addressed the issue completely, we've got three or four different methods in place dealing with variations on the same problem right now... -Jared On 06/19/2009 12:12 PM, Chris Lewis wrote: Charlie Brady wrote: It's not as though there are any SMTP clients out there which will try again with a different helo host. Unless there is a spambot tuned for this qpsmtpd plugin. There are plenty of bots that do that. I don't understand your statement here. In what way would any client be better off with a connection help open following the DENY, than having DENY_DISCONNECT? It would be better for qpsmtpd in the long run for ALL of the filter decision plugins to note what they found in -notes(), and always issue DECLINED. Leave it up to a single meta-plugin run last in every SMTP phase to decide how to change the SMTP state (deny, deny_disconnect, score, etc). The reaction to a plugin filter hit should be orthogonal to setting a filter hit. Meaning: a filter hit doesn't change SMTP protocol state, that's another plugin's job. In this way, you retain the flexibility to set the engine's behaviour - whether it be deny immediately, deny at RCPT TO, deny after DATA, score, whitelist, etc _independently_ of the plugin rule hit. By doing this you don't have to add what to do when you get a hit configuration to every plugin - they don't have to know about whitelisting mechanisms or what state to go to. Simplifying the filtering plugins greatly. I had to rewrite (usually from scratch) every one of the qpsmtpd plugins to make that happen in our implementation.
Spaces in MAIL FROM/RCPT TO commands
We've been considering blocking messages that break RFC compliance by including a space before or after the colon in MAIL FROM: and RCPT TO: commands. From RFC 5321 Section 3.3: Since it has been a common source of errors, it is worth noting that spaces are not permitted on either side of the colon following FROM in the MAIL command or TO in the RCPT command. The syntax is exactly as given above. I wrote up the following statistics in an attempt to see just what would happen if we started enforcing this syntax. As it turns out, we would be able to skip a whole lot of RBL, SPF, and address lookups as well as content scans, and also be able to block a number of spams that would have otherwise gotten through our filter... but we would also block *some* legit mail. Most of the mail that would have been blocked was sent by email marketers, though, and wouldn't have necessarily been missed by our customers; and if the senders knew what they were doing, they could always do something about it. Anyway, I thought I'd post it here and see if anyone has any opinions on whether it's acceptable to enforce this. It's certainly very tempting. Within a period of about 24 hours, the node that I examined processed 22,523 transactions that would have violated strict syntax checks WRT spaces. --- Rejections: invalid recip 15,881 rbl 4,856 spamassassin 683 spf 387 syntax168 (missing angle brackets, etc.) relay denied 38 blacklisted sender 23 no rdns14 rdns mismatch 2 phishing2 --- Deferrals: spf defer 24 --- Quarantined: 211 Delivered:192 --- Total Blocked: 22,265 (99.1%) Total Delivered:192 (0.9%) This node processed a total of 33,553 transactions overall, 3,796 of which resulted in delivery. 66.9% of these transactions would have been blocked outright by blocking on improper spaces in mail from/rcpt commands. Doing so would also have decreased the number of delivered messages by 5%, and decreased the number of quarantined messages by 32% -- but a few of these new blocks would have been false positives. I examined 71 of the delivered messages that would have been blocked due to RFC-ignorance. There were some definite false positives. Most of these false positives were addressed with 'bounce@' and similar addresses and were obviously sent by direct marketing companies. Three messages, however, were sent by normal end-users who would have received bounces if we were blocking based on RFC ignorance. Of the remaining delivered messages with invalid syntax, nearly all *appeared* to be sent by direct marketers. Upon investigation, however, most of these appeared very likely to be illegitimate. I suspect that a small handful of popular direct marketing email software applications use this flawed syntax, and are used by some legitimate marketers, as well as by a large number of spammers, and perhaps a few small marketing companies that don't realize (and probably don't care) that their customers are sending spam through their servers. StrongMail, the apparent MTA for multiview.com, is one candidate: http://www.strongmail.com/what-we-offer/email-deliverability-solutions/ At any rate, some obviously legitimate mass email campaigns would have been blocked due to RFC ignorance from the following organizations: bounce.rd.com (Reader's Digest): 3 cmpgnr.com (Campaigner) 5 email.joann.com (Joann Fabric): 2 service.govdelivery.com: 2 medtech.cfmvmail.com:1 mktg.artfulhome.com: 1 strongmail.multiview.com:1 Marketing Senders: 15 Human Senders: 3 Total: 18 So, the samples that I examined had a 23.4% false positive rate, although FWIW if you don't count mass-marketers, it would be a 3.9% FP rate. The overall FP rate, without taking collisions with other rejection methods into account, would be around 0.2%. Considering that this syntax is clearly prohibited by RFC, and the majority of senders are professional organizations who can do something about the issue if they want their messages to get through our systems, it may be appropriate to reject outright in spite of the possibility of blocking some legitimate mail. -Jared
Re: prefork 'orphaned child' messages
One of my associates suggested this patch. The idea is to reduce loop_sleep if we had to spawn a lot of children last time through the loop. This way on restart (since $created_children is initialized to $idle_children), and on sudden concurrency spikes, we loop quickly enough to spawn the children we need very quickly; whereas under normal conditions, we don't bother. I still decreased loop_sleep by half in our own version, so that the longest wait clients will experience during a drastic concurrency increase would be just slightly above 15 seconds. Would there be any interest in accepting this if I tested it and put it in my git? -Jared On 06/02/2009 08:09 AM, Jared Johnson wrote: Why not set --idle-children at start-up to something higher (or just 0 to disable)? Setting it higher is a bit bothersome because our QP children use too much memory right now (mainly from using DBIx::Class), so it would be a bit unfortunate to have more memory-consuming kids around that aren't needed. As for disabling idle-children.. what exactly would the behavior be with disabled idle children? In a situation where concurrency is saturated, it seems that QP spawns $idle_children processes every $loop_wait seconds. Would QP instead spawn a new child _immediately_ when a new connection attempt came in? Or if it still waits $loop_wait seconds, how many children does it spawn when it sees some are needed? -Jared --- a/qpsmtpd-prefork +++ b/qpsmtpd-prefork @@ -75,7 +75,7 @@ my $max_children= 15; # max number of child processes to spawn my $idle_children = 5;# number of idle child processes to spawn my $maxconnip = 10; my $child_lifetime = 100; # number of times a child may be reused -my $loop_sleep = 30; # seconds main_loop sleeps before checking children +my $loop_sleep = 15; # seconds main_loop sleeps before checking children my $re_nice = 5;# substracted from parents current nice level my $d_start = 0; my $quiet = 0; @@ -339,10 +339,11 @@ sub reaper { #arg0: void #ret0: void sub main_loop { +my $created_children = $idle_children; while (1) { # if there is no child death to process, then sleep EXPR seconds # or until signal (i.e. child death) is received -sleep $loop_sleep unless @children_term; +sleep $loop_sleep / ($created_children * 2 + 1) unless @children_term; # block CHLD signals to avoid race my $sigset = block_signal(SIGCHLD); @@ -379,9 +380,9 @@ sub main_loop { } # spawn children -for (my $i = scalar(keys %children) ; $i $chld_pool ; $i++) { -new_child();# add to the child pool -} +$created_children = $chld_pool - keys %children; +$created_children = 0 if $created_children 0; +new_child() for 1..$created_children; # unblock signals unblock_signal($sigset);
Re: prefork 'orphaned child' messages
Why not set --idle-children at start-up to something higher (or just 0 to disable)? Setting it higher is a bit bothersome because our QP children use too much memory right now (mainly from using DBIx::Class), so it would be a bit unfortunate to have more memory-consuming kids around that aren't needed. As for disabling idle-children.. what exactly would the behavior be with disabled idle children? In a situation where concurrency is saturated, it seems that QP spawns $idle_children processes every $loop_wait seconds. Would QP instead spawn a new child _immediately_ when a new connection attempt came in? Or if it still waits $loop_wait seconds, how many children does it spawn when it sees some are needed? -Jared
Re: prefork 'orphaned child' messages
Even if you're not near max children the parent will only spawn max idle children, then it sleeps until an event, wake-up and see if more children are needed. Debug log should give some clue, if this is the reason. Bingo. Looking further into things, it was apparent that a freshly restarted node was grossly underprovisioned WRT child processes. On intial restart, it had 5... 30 seconds later, it had 10. I manually set $loop_sleep=5 (originally hard-coded to 30) and this addressed the issue. Is this drastically smaller loop likely to have any side-effects? If not, it might be a good idea to set this upstream. -Jared
Re: prefork 'orphaned child' messages
No I don't think it's normal. What are you doing in your plugins? Wasn't there some issue we uncovered a while ago to do with MySQL? I don't recall hearing of issues with mysql.. we use postgres here and do, well, lots of stuff: lookups for for global and ip-based max concurrency rules in hook_pre_connect, lookups for various preferences in hook_rcpt.. if it's a plugin that's breaking things then it's almost certainly our own code, as we've re-written most of qp's plugins for our uses :) Any clue what's going on in general? Do children wind up 'orphaned' when they go completely unresponsive, e.g. they're hanging out in some infinite loop somewhere or something similarly evil? The basic problem we've been encountering is that very rarely, all of our dozen QP nodes inexplicably introduce long delays before answering with a banner (no banner delay involved); watching the logs, it doesn't look like any child is spawned during this wait, and we aren't anywhere near max-children. Eventually, a child is spawned and then handles the connection normally, assuming the client hasn't timed out. And eventually, every slave recovers and starts behaving normally. It seems like this issue would have to be network related in some way, as every node is affected around the same time; DNS was the first obvious choice but we don't seem to be having any problems with that. One we find said network problem this issue might go away, but it seems that our smtpd should not be so fragile to whatever issue is hanging it up.. -Jared Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
Re: prefork 'orphaned child' messages
What's orphaned is not a child process, but a shared mem hash record for a process which no longer exists. I suspect that code is racy. Hrm, then if we're getting a whole lot of these, does this mean child processes are going away at a high rate? I wouldn't expect such a condition using prefork with relatively low concurrency, maybe this reflects a problem during the end of the connection.. -Jared Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
Re: prefork 'orphaned child' messages
Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com Do we have to be exposed to this spam? *blush* Since I administer our qp installation that adds those, I suppose I could exempt myself without anybody noticing :) -Jared
Re: [qpsmtpd] 0.81 Can prefork listen on more than one port?
On 05/21/2009 02:56 PM, Ask Bjørn Hansen wrote: On May 21, 2009, at 12:30, J wrote: The prefork in 0.81 can only listen on one. The next release will support listening on multiple addresses ports in prefork, just as forkserver does. Any idea when that might be? Probably in about 4-6 weeks. My plan is to do a release at least every 3 months; or when there are significant changes/bugfixes - whichever comes first. You might want to consider the multiple-addresses/ports thing to be significant enough to warrant an early release; I noticed a bit ago that the stock debian package had switched to prefork, but had to revert back to forkserver because of the regression on that issue. Food for thought. Of course, someone could also just advice the debian maintainer to apply that patch to the debian package :) -Jared
Re: [qpsmtpd] 0.81 Can prefork listen on more than one port?
I'm the Debian maintainer. :) Fancy meeting you here :) OT, you might want to patch the debian init script now that QP supports SIGHUP. Here's my local patch, but I haven't bothered scrutinizing and un-preforkifying and therefore officially sending it in: --- dc-smtpd/debian/dc-smtpd.init (revision 8819) +++ dc-smtpd/debian/dc-smtpd.init (revision 8820) @@ -112,6 +112,12 @@ fi } +qpsmtpd_reload() +{ +start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE +return 0 +} + case $1 in start) echo -n Starting qpsmtpd: @@ -123,7 +129,12 @@ qpsmtpd_stop echo qpsmtpd-prefork. ;; - restart|reload|force-reload) +reload|force-reload) +echo -n Reloading qpsmtpd: +qpsmtpd_reload +echo qpsmtpd-prefork. +;; + restart) qpsmtpd_stop qpsmtpd_start ;; -Jared
Re: [qpsmtpd] from 0.32 to 0.81, forkserver to pre-fork
First issue: If the plugins (Most are highly customized) run under forkerserver, will they be expected to run under pre-fork without modification? I'm trying to recall some of the discussions, and I think the issues were with pollserver - something about async code(?). If I recall our own similar migration correctly, things went off nearly without a hitch from forkserver to prefork. You might want to think about whether your customized stuff makes any assumptions that every new connection gets a new process, e.g. if you have some sort of ugly 'our' stuff that's supposed to be per-transaction; if so, your code is kinda broken anyway :P IIRC, our hook_pre_connect stuff did assume it was running before the fork, and (IIRC again) on prefork it runs after the fork; at any rate, we moved some things to register() that belonged there in the first place to alleviate the problems we had with hook_pre_connect -Jared Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
Re: [qpsmtpd] port while in pre_connection hook?
Hi, In current qp it would be about like so: sub hook_pre_connection { my ($self,$transaction,%arg) = @_; my $port = $arg{local_port}; } -Jared b-sub-bdf0...@rope.net wrote: On Tue, 5 May 2009, Hanno Hecker wrote: On Tue, 5 May 2009 17:20:13 + (UTC) b-sub-bdf0...@rope.net wrote: qpsmtpd 0.31 We are using the following to determine the local port when in the 'connect' hook: my $localport = $self-qp-connection-local_port ; We would like to do the same thing in the 'pre_connection' hook. The above line doesn't work, and shows 'uninitialized value' in the logs. Is there some other way to get the local port at this stage, or do we have to move what we want to do to the connect hook, instead? The pre-connection hook was not working until 0.32... (or SVN rev 599 / 601). This is more than 3 years now... and lot's of bugs were fixed .) My mistake - version 0.32. We will be upgrading shortly. However, I still need to get the local port number in the pre_connection hook, or I have to move the functionality to later than what would be ideal :-/ Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
Re: Qpsmtpd::Transaction::body_fh
So, after some more extensive testing, we've found that dup'ing body fh's all over the place and then writing to different ones can cause some problems. So I think I will submit my dup() enhancement as a separate function, dup_body_fh(). When I updated the documentation I noticed that body_fh() is actually documented to return undef in the event that the body is not yet spooled to disk. I assumed that it was an oversight that body_fh didn't call body_spool() similarly to body_filename, but it seems there was some thought given to it. Is this the preferred behavior? -Jared Jared Johnson wrote: Based on my testing, it looks like body_resetpos() behavior would be unaffected by the dup(). I didn't realize this was the case, so my mention of 'if we happen to seek() to the wrong place bad things won't happen' is not exactly accurate. But this is probably for the best; although I would normally expect someone to do '$txn-body_resetpos; my $fh = $txn-body_fh;' -- it is probably best to not break when they do 'my $fh = $txn-body_fh; $txn-body_resetpos;'. So the main accomplishment of calling dup() is that we can pass $fh to a module that calls $fh-close() without clobbering things :) I'll send in a patch soonish. -Jared Robert Spier wrote: Most plugins call body_resetpos before using the FH. -R Jared Johnson wrote: I'm considering submitting a patch that enhances Qspmtpd::Transaction::body_fh() to (1) call body_spool() if the message was not already spooled to disk; and (2) dup the filehandle before returning it, so that if a plugin that calls body_fh, or an external module that the plugin passes the fh to, happens to seek() to some weird place or close() the fh, bad things won't happen to plugins that use the fh later (body_fh, body_resetpos, body_getline). My question is, are there times when a plugin might be expecting such modifications to body_fh, justifying splitting the second functionality into a separate function, say, body_fh_dup() or so? I'd lean toward making the changes directly to body_fh() but wanted to see if there were any objections to this. -Jared -- Jared Johnson Software Developer and Support Engineer Network Management Group, Inc. 620-664-6000 x118
perltidy
first of all, kudos on the frequent releasing :) I've attached a suggested patch to .perltidyrc. I've been playing around with perltidy'ing all QP code and some results I don't like. This doesn't fix all the things that I don't like, but it's been sitting on my laptop for a month so I thought I'd submit it to the ML for discussion. I haven't committed it to my git since I'd just revert it if it wasn't accepted :) Here's the relevant documentation to go along with the additions: -ce, --cuddled-else Enable the cuddled else style, in which else and elsif are follow immediately after the curly brace closing the previous block. The default is not to use cuddled elses, and is indicated with the flag -nce or --nocuddled-else. Here is a comparison of the alternatives: if ($task) { yyy(); } else {# -ce zzz(); } if ($task) { yyy(); } else {# -nce (default) zzz(); } [-nbcc], -bbc, --blanks-before-comments A blank line will be introduced before a full-line comment. This is the default. Use -nbbc or --noblanks-before-comments to prevent such blank lines from being introduced. -msc=n, --minimum-space-to-comment=n Side comments look best when lined up several spaces to the right of code. Perltidy will try to keep comments at least n spaces to the right. The default is n=4 spaces. -Jared diff --git a/.perltidyrc b/.perltidyrc index 65b29f2..9ffed84 100644 --- a/.perltidyrc +++ b/.perltidyrc @@ -10,6 +10,10 @@ -lp # line up with parentheses -cti=1 # align closing parens with opening parens (closing token placement) +-ce # cuddled else +-nbbc # don't insert blank lines before comments +-msc=1 # minimum one space before side comments (default 4) + # -nolq # don't outdent long quotes (not sure if we should enable this)
Re: perltidy
[-nbcc], -bbc, --blanks-before-comments typo'd my interposition there, it's -nbbc -Jared
[PATCH] Close spamd socket when we're finished with it
From: jaredj jar...@nmgi.com QP's connection to spamd unnecessarily persists beyond the run of the spamassassin plugin itself. This closes the socket as soon as we're finished using it. --- plugins/spamassassin |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/plugins/spamassassin b/plugins/spamassassin index 1c4aa85..6865efe 100644 --- a/plugins/spamassassin +++ b/plugins/spamassassin @@ -185,6 +185,7 @@ sub hook_data_post { # check_spam } my $tests = SPAMD; + close SPAMD; $tests =~ s/\015//; # hack for outlook $flag = $flag eq 'True' ? 'Yes' : 'No'; $self-log(LOGDEBUG, check_spam: finished reading from spamd); -- 1.5.6.3 -- Inbound and outbound email scanned for spam and viruses by the DoubleCheck Email Manager v5: http://www.doublecheckemail.com
Re: Qpsmtpd::Transaction::body_fh
Based on my testing, it looks like body_resetpos() behavior would be unaffected by the dup(). I didn't realize this was the case, so my mention of 'if we happen to seek() to the wrong place bad things won't happen' is not exactly accurate. But this is probably for the best; although I would normally expect someone to do '$txn-body_resetpos; my $fh = $txn-body_fh;' -- it is probably best to not break when they do 'my $fh = $txn-body_fh; $txn-body_resetpos;'. So the main accomplishment of calling dup() is that we can pass $fh to a module that calls $fh-close() without clobbering things :) I'll send in a patch soonish. -Jared Robert Spier wrote: Most plugins call body_resetpos before using the FH. -R Jared Johnson wrote: I'm considering submitting a patch that enhances Qspmtpd::Transaction::body_fh() to (1) call body_spool() if the message was not already spooled to disk; and (2) dup the filehandle before returning it, so that if a plugin that calls body_fh, or an external module that the plugin passes the fh to, happens to seek() to some weird place or close() the fh, bad things won't happen to plugins that use the fh later (body_fh, body_resetpos, body_getline). My question is, are there times when a plugin might be expecting such modifications to body_fh, justifying splitting the second functionality into a separate function, say, body_fh_dup() or so? I'd lean toward making the changes directly to body_fh() but wanted to see if there were any objections to this. -Jared
git/svn sync
Hi, This isn't really a git ML, but perhaps some of you have had similar struggles in dealing with the switch to git and can give me some suggestions. I used to keep my organization's SVN version of QP up-to-date with `svn merge` pretty easily. Now that QP is in Git, I'd like to see if there's a way to continue to effortlessly synchronize with Git. I've looked into git-svn, which looks like it makes it very easy to create an SVN branch with every single Git commit represented as an SVN commit. However, my organization's SVN Czar is not thrilled with the idea of ballooning our history by a thousand or so commits. What I would really like to do is 'merge everything from commit XYZ to HEAD' in a single commit to SVN, with either git or svn presenting any conflicts that might arise. Can anyone suggest how I can accomplish this? Thanks in advance -Jared
Re: Noise on the Commit List
However, in general the mails github creates seem pretty useless. I find them very useful. I can see exactly when you merge someone else's work and when you make a release. It was just the large flood of things with 5 year old dates that was not useful. I agree that these updates are generally useful, as an easy passive way to keep tabs on master and as quick links to view commits on github -Jared
Re: persistence?
right now notes is per-connection (is that correct?) so there may be things that would break if the scope of notes were to change. Well, $self-qp-connection-notes() is per-connection, $transaction-notes() is per-transaction, and (if/when my address notes patch is accepted), $addr-notes() is per address. But what if we give plugin authors access to hook_connection_notes, hook_transaction_notes, hook_address_notes, which passes the object in question, allowing the plugin to horse around with the note handling however it wants to, including widening the scope: sub hook_transaction_notes_get { # store tarpit and some other thing in cache::fastmmap my ($self,$txn,$key) = @_; my %cfm_keys; @cfm_keys{qw(tarpit otherthing)} = (); return DECLINED unless exists $cfm_keys{$key}; my $cache = Cache::FastMmap-new(...) or return DECLINED; return OK,$cach-get($key); } Possibly a hook_notes() could be provided to let you override all scopes of notes, either for convenience or for other strange purposes: sub hook_notes_set { # I'm picky, so I want notes to be stored in {_my_notes} # rather than {_notes}, except for $connection objects my ($self,$obj,$key,$value) return DECLINED if ref $obj eq 'Qpsmtpd::Connection'; $self-{_my_notes} = $value; } What kind of locking and/or transactioning do we want in persistent notes? I suggest letting the backend handle both. Typically we'll be using something that was designed to let not just multiple QP processes but also multiple non-QP processes interact with the store, and so it already has to implement its own locking and atomicity. If we absolutely need to implement our own, we can do so in the plugin. -Jared
address notes patches
I've finally got around to attempting to learn to navigate Git. Attempting to follow the suggested workflow in development.pod, I've created a QP fork on Github: http://github.com/jaredj/qpsmtpd/tree ... and created three topic branches, each with a single commit, representing my new effort to add address notes and clean up the code just a bit while I'm at it, but without breaking notes('arg',undef): Shorten/clarify Connection and Transaction notes() http://github.com/jaredj/qpsmtpd/commit/46a045e3d22a1ef2160835edb4c3c7c1f51f4658 Consistent Connection and Transaction notes() POD http://github.com/jaredj/qpsmtpd/commit/f15c7d687fc5d8921edaf7d6d0f0add70ec0e553 Add Qpsmtpd::Address::notes() method http://github.com/jaredj/qpsmtpd/commit/40f6d68b0274891abb7f224b08b2f9bdc9eed031 ... and merged them to my fork. Any suggestions making this process handier for you guys to review suggested changes would be appreciated. Three topic branches may be a bit overkill, but I figured better overkill than underkill :) I was a bit impressed with Git today; normally when I work without connectivity, I work right up to the point that I would be ready to commit and then either stop or begin working on manually created copies of files, since I can't begin editing SVN history and making SVN branches until I can talk to the central repo. With Git, I was able to put together a decent set of branches and commits in the car on the way to work this morning with no internet connection; Nice :) -Jared
reset_transaction between AUTH and MAIL_FROM
I've begun working on a custom auth plugin and tried setting some transaction notes after successful authentication based on the credentials that were used. Of course this didn't work, as the transaction is reset as soon as MAIL FROM is issued. I can probably get around this with connection notes, of course, but I was just wondering whether it was intentional for transactions to be reset between MAIL FROM and AUTH, rather than at the first issue of *either* MAIL FROM *or* AUTH? -Jared
Re: Error: 503 but you already said AUTH
The attached diff should fix it for you. I'll commit later if nobody complains. It appears this still hasn't been applied. I just applied it myself and it works for me; is there anything stopping this from being committed? Also note that you might want to wipe out the tab that's in the midst of all those non-tab indents while you're at it :) -Jared
Re: Error: 503 but you already said AUTH
Does git support 'hooks' like SVN does? Here at work we have some very spiffy SVN hooks that don't let us commit code if it fails perltidy, contains tabs, trailing whitespace, long lines, etc. It can be overridden by the comitter, but often it catches things we didn't actually mean to commit. Especially whitespace problems are easy to miss. -Jared Hanno Hecker wrote: On Fri, 20 Feb 2009 10:21:22 -0600 Jared Johnson jar...@nmgi.com wrote: The attached diff should fix it for you. I'll commit later if nobody complains. It appears this still hasn't been applied. I just applied it myself and it works for me; is there anything stopping this from being committed? No, I'm running this on one machine myself without problems. Also note that you might want to wipe out the tab that's in the midst of all those non-tab indents while you're at it :) We should run perltidy now and write a .pod what our preferred coding styles are. (e.g. running perltidy / podtidy before any commit) Hanno
Re: persistence?
David Nicol wrote: Okay then. Anyone for collaborating on setting one up, with a unified accessor syntax similar to what we have for notes ? http://advenge.com/2009/products/downloads/qpsmtpd/plugins/verp is a working plugin that uses DirDB persistence library to rewrite the envelope return addresses of all traffic except traffic that is exempt due to relaying-okay or the verp_no note getting set. a persistent notes -- maybe pnotes like notes, but persistent -- would be more general and could be replaced with a backend as needed. As for lasting between two and three weeks, well, what all is needed? It sounds like a good idea in general. We currently keep a few persistent things in our database; in the past we've kept some in Cache::FastMmap (we later decided this was premature optimization and did away with it); we are looking at keeping some things in memcached. A universal accessor for all of these types of persistent store would make it easier to generalize out plugins like, say, greylisting, and still allow people to set up whatever sort of persistent store makes sense. I would envision the implementation in Qpsmtpd::store(), hook_store, hook_store_get, and a convenience function in Qpsmtpd::Address::store()... how about about a fake example implementing imaginary adjustable tarpit at connect time and RCPT time based on, say, the frequency of spam for a given connecting IP, and another based on frequency of spam for a given recipient (not that this is a particularly great idea..) hook_pre_connection { my ($self,$txn) = @_; # This would be %arg or something if I read the docs my $ip = $self-qp-connection-remote_ip; _delay_connection($self-store(ip_tarpit_seconds_$ip)); } sub hook_rcpt { my ($self,$txn,$rcpt) = @_; # Demonstration of a convenience method for a store keyed on address _delay_connection($rcpt-store('tarpit_seconds')); } sub hook_data_post { my ($self,$txn) = @_; my $hit; for my $rcpt ($txn-recipients) { if _scan_for_spam($txn) { my $old = $rcpt-store('tarpit_seconds') // 0; $rcpt-store('tarpit_seconds',$old + 5); $hit = 1; } } return DECLINED unless $hits; my $ip = $self-qp-connection-remote_ip; my $old = $self-store(ip_tarpit_seconds_$ip); $self-store(ip_tarpit_seconds_$ip,$old + 5); } sub hook_store { my ($self,$txn,$addr,$key,$value) = @_; if ($addr) { # A Qpsmtpd::Address or compatible object was passed, # probably because we were called with $rcpt-store() # I want to do funky stuff with it to build my real key $key = $addr-notes('uid') . '_' . $addr-format . _$key; } eval { my $cache = Cache::FastMmap-new(...); $cache-set($key,$value); }; return $@ ? DECLINED : OK; } sub hook_store_get { my ($self,$txn,$addr,$key) = @_; if ($addr) { ... } # juju like above my $cache = Cache::FastMmap-new(...); return OK,$cache-get($key); } Some things I'm not so sure about: - Two hooks, one for retrieval and one for storage? I think that's probably the way to go - Do we always pass a possible address object to the hooks? Or do we leave it out and let people manually muck with their keys? It seems to me that it would be very useful to give people means to construct keys in interesting ways without having to expose that on the caller end; but is this the best way to do so? You could go as far as to allow *any* number of arguments, which get passed to the hook, which takes the last argument as the value and uses the rest after $self and $transaction to construct the key. That would probably a bit overboard, but maybe there are other ideas. Thoughts? And, just out of curiosity, where would DirDB be more appropriate to use than memcached or Cach::FastMmap? -Jared
Re: [PATCH] Add 'address notes'
- my $self = shift; - my $key = shift; - @_ and $self-{_notes}-{$key} = shift; - #warn Data::Dumper-Dump([\$self-{_notes}], [qw(notes)]); - $self-{_notes}-{$key}; + my ($self,$key,$value) = @_; + $self-{_notes}-{$key} = $value if defined $value; + return $self-{_notes}-{$key}; I originally considered these functionally identical, but they are not. The new code, called with, say, $txn-notes('discard',undef), would result in evaluation as if it were a 'get' method rather than setting the 'discard' note to undef. That seems quite dangerous. I suggest either reverting the language back to the '@_ and' model, or else doing something like: my ($self,$key,$value) = @_; $self-{_notes}-{$key} = $value if scalar @_ 2; return $self-{_notes}-{$key}; I'd be happy to submit a patch for either sort of language. Incidentally, I came across this when looking into a bug that was caused when I did $txn-recipients(grep {$_ ne $rcpt} $txn-recipients) -- expecting it to clear the recipient list if grep returned an empty list. Instead, it predictably behaved as a get method, e.g. a noop. I solved this by adding a clear_recipients() method to Qpsmtpd::Transactions, then thought better of it and added remove_recipients() instead. I'll try to get around to submitting these as well if they seem useful :) -Jared
Re: End of headers hook
And voila. Hook renamed to data_headers_end, documentation in README.plugins with a BIG warning. Just a thought -- perhaps the methodology for throwing away further input as cleverly described by Jose Martinez should be thrown around and tested and then documented in README.plugins together with suggestions for adding an appropriate transaction note and doing the real denying in hook_data_post? -Jared
RE: End of headers hook
That sounds intriguing... it may not be very RFC-compliant to actually, say, interrupt a sender halfway through DATA transmission to tell them that you're rejecting based on headers, though. I can't say that has always stopped me... -Jared From: Karl Y. Pradene [k...@ailleurs.org] Sent: Saturday, February 07, 2009 8:25 AM To: qpsmtpd@perl.org Subject: End of headers hook Hi, is it possible add a hook at the end of headers? -- Karl Y. Pradene
Re: [PATCH] config() method for Qpsmtpd::Address objects
Robert Spier wrote: What's the goal of this? It also needs some documentation or examples. The idea is to allow plugins to do something like: for my $rcpt ($transaction-recipients) { do_stuff() if $spam_score $rcpt-config('spam_threshold'); } IOW, per-user prefs. This is the first, obvious requisite to this sort of handling. I documented config() in the POD for Qpsmtpd::Address itself; I can add some documentation and examples to README.plugins as well, is there any other location that could use documentation? Also, I noticed some more deficiencies in this patch today, namely that Qpsmtpd::config() needs to pass the Qpsmtpd::Address object to run_hooks('rcpt_config'). Although I did some basic testing to make sure it was functional, I suggest holding off on application until the code has seen more real-world use from me and I submit a follow-up. Thanks! -Jared
[PATCH] Make remote_host available in hook_pre_connection
Just one more... I've found it useful to have remote_host available in my pre-conection plugin (specifically for enforcing concurrency limits specified in a rule based on host patterns). Attached is the patch I have applied to make this possible. Now I'm off to bed! -Jared Index: qpsmtpd-forkserver === --- qpsmtpd-forkserver (revision 967) +++ qpsmtpd-forkserver (working copy) @@ -246,8 +246,13 @@ # get local/remote hostname, port and ip address my ($port, $iaddr, $lport, $laddr, $nto_iaddr, $nto_laddr) = Qpsmtpd::TcpServer::lrpip($server, $client, $hisaddr); +# set enviroment variables +($ENV{TCPLOCALIP}, $ENV{TCPREMOTEIP}, $ENV{TCPREMOTEHOST}) + = Qpsmtpd::TcpServer::tcpenv($nto_laddr, $nto_iaddr); + my ($rc, @msg) = $qpsmtpd-run_hooks(pre-connection, remote_ip= $nto_iaddr, + remote_host = $ENV{TCPREMOTEHOST}, remote_port = $port, local_ip = $nto_laddr, local_port = $lport, @@ -296,9 +301,6 @@ ::log(LOGINFO, Connection Timed Out); exit; }; -# set enviroment variables -($ENV{TCPLOCALIP}, $ENV{TCPREMOTEIP}, $ENV{TCPREMOTEHOST}) = Qpsmtpd::TcpServer::tcpenv($nto_laddr, $nto_iaddr); - # don't do this! #$0 = qpsmtpd-forkserver: $ENV{TCPREMOTEIP} / $ENV{TCPREMOTEHOST}; Index: qpsmtpd-prefork === --- qpsmtpd-prefork (revision 967) +++ qpsmtpd-prefork (working copy) @@ -647,10 +647,15 @@ my %children; shmem_opt(\%children, undef, $$, $iaddr); +# set enviroment variables +($ENV{TCPLOCALIP}, $ENV{TCPREMOTEIP}, $ENV{TCPREMOTEHOST}) + = Qpsmtpd::TcpServer::tcpenv($nto_laddr, $nto_iaddr); + my ($rc, @msg) = $qpsmtpd-run_hooks( pre-connection, remote_ip = $nto_iaddr, + remote_host = $ENV{TCPREMOTEHOST}, remote_port = $port, local_ip= $nto_laddr, local_port = $lport, @@ -694,9 +699,6 @@ exit; }; -# set enviroment variables -($ENV{TCPLOCALIP}, $ENV{TCPREMOTEIP}, $ENV{TCPREMOTEHOST}) = Qpsmtpd::TcpServer::tcpenv($nto_laddr, $nto_iaddr); - # run qpmsptd functions $SIG{__DIE__} = 'DEFAULT'; eval {
Re: [PATCH] config() method for Qpsmtpd::Address objects
I noticed the previous iteration of this patch would allow cached global config values to override per-recipient config values. The attached patch adds the following change to avoid this: --- lib/Qpsmtpd.pm (revision 967) +++ lib/Qpsmtpd.pm (working copy) @@ -142,7 +142,7 @@ my ($self, $c, $type) = @_; #my $timer = $SAMPLER-(config, undef, 1); - if ($_config_cache-{$c}) { + if ($_config_cache-{$c} and ref $type ne 'Qpsmtpd::Address') { return wantarray ? @{$_config_cache-{$c}} : $_config_cache-{$c}-[0]; } I don't know if it would be useful/advisable to try to cache per-recip config results keyed to the address or something -- I think that would probably be potentially unsafe. rcpt_config plugins can always do their own caching if they want. -Jared Jared Johnson wrote: Adding a config method for Qpsmtpd::Address objects turned out to be relatively straightforward. I was reluctant to subclass Qpsmtpd, but this actually seems like it could be useful in the future; at any rate, it gets the job done pretty nicely. This patch will only be useful for people who define 'rcpt_config' plugins, but if anyone shares my sentiments then there will be folks chomping at the bit like I am to write these. Feedback is welcome... -Jared Index: lib/Qpsmtpd.pm === --- lib/Qpsmtpd.pm (revision 967) +++ lib/Qpsmtpd.pm (working copy) @@ -142,7 +142,7 @@ my ($self, $c, $type) = @_; #my $timer = $SAMPLER-(config, undef, 1); - if ($_config_cache-{$c}) { + if ($_config_cache-{$c} and ref $type ne 'Qpsmtpd::Address') { return wantarray ? @{$_config_cache-{$c}} : $_config_cache-{$c}-[0]; } @@ -150,7 +150,8 @@ #warn SELF-config($c) , ref $self; - my ($rc, @config) = $self-run_hooks_no_respond(config, $c); + my $hook = ref $type eq 'Qpsmtpd::Address' ? 'rcpt_config' : 'config'; + my ($rc, @config) = $self-run_hooks_no_respond($hook, $c); @config = () unless $rc == OK; if (wantarray) { Index: lib/Qpsmtpd/Address.pm === --- lib/Qpsmtpd/Address.pm (revision 967) +++ lib/Qpsmtpd/Address.pm (working copy) @@ -1,6 +1,8 @@ -#!/usr/bin/perl -w package Qpsmtpd::Address; use strict; +use warnings; +use Qpsmtpd; +our @ISA = qw(Qpsmtpd); =head1 NAME @@ -317,6 +319,18 @@ return $self-{_host}; } +=head2 config($value) + +Looks up a configuration directive based on this recipient, using any plugins that utilize +hook_rcpt_config + +=cut + +sub config { +my ($self,$key) = @_; +$self-SUPER::config($key,$self); +} + sub _addr_cmp { require UNIVERSAL; my ($left, $right, $swap) = @_; Index: lib/Qpsmtpd/Plugin.pm === --- lib/Qpsmtpd/Plugin.pm (revision 967) +++ lib/Qpsmtpd/Plugin.pm (working copy) @@ -4,7 +4,7 @@ # more or less in the order they will fire our @hooks = qw( -logging config post-fork pre-connection connect ehlo_parse ehlo +logging config rcpt_config post-fork pre-connection connect ehlo_parse ehlo helo_parse helo auth_parse auth auth-plain auth-login auth-cram-md5 rcpt_parse rcpt_pre rcpt mail_parse mail mail_pre data data_post queue_pre queue queue_post vrfy noop