Re: Rewritten URIBL plugin
Jared Johnson wrote: I think we should probably consider putting support for parsed messages into core, with the parsing done lazily if requested by the API. I forgot, we did kinda think of a couple of reasons not to want an API. depending on where you put it, you may find QP in general depending on MIME::Parser even if it never uses it. The benefit of the plugin is that /etc/qpsmtpd/plugins controls whether you wind up having to install, and take up your memory with, MIME::Parser One thing we've discussed in the past (at least in my imagination) although not quite figured out how to implement, is making plugins act a little more like normal modules, so that one plugin can use another. So if you're interested in the parsed mime functionality, your plugin can plugin_use util/parsed-mime and the right magic happens. Oh yeah that's right, someone *did* implement what you're talking about. You can do it with 'plugin inheritance' (which ironically i knew nothing about until looking at the async uribl plugin the other night): # /usr/share/qpsmtpd/plugins/mime_parser use MIME::Parser; sub parse_mime { my ( $self, $txn ) = @_; return $txn-notes('mime_body') if $txn-notes('mime_body'); a bunch of code to create a $msg ... $txn-notes( mime_body = $msg ); return $txn-notes('mime_body'); } # /usr/share/qpstmpd/plugins/some_plugin_that_might_want_mime_parsing sub init { my ( $self, $qp, %args ) = @_; return unless $qp-config('parse_mime') or $args{parse_mime}; $self-isa('mime_parser'); $self-{can_parse_mime} = 1; } sub hook_data_post { my ( $self, $txn ) = @_; if ( $self-{can_parse_mime} ) { $self-some_recursive_mime_function( $self-parse_mime ); } else { $self-regular_body_getline_function( $txn ); } } Voila! A lazy 'plugin model' that is *also* and API, which doesn't 'use MIME::Parser' until you *want* to use it; furthermore, you don't have to just put the official mime_parser plugin in there, with a little modification (or alternatively some care to use the same filename within an alternate directory) you could use your home-rolled messier-but-more-efficient version that doesn't use MIME::Parser at all. I could easily modify my own stuff to do this and test it. I'd even be interested in pulling a couple of our own home-grown MIME recursion methods into it. If this is indeed the will of the council ;) p.s. I'm kind of stoke about plugin inheritance these days. I'm becoming convinced that after some code churn on our side it will allow us to finally switch to async without having to do just rewrite all our plugins, switch to the async daemon, and see what happens. And with a little churn on the upstream side, if that manages to happen, I also think it could allow us to un-fork 90% of the QP plugin code we currently have re-written and instead submit patches to QP that have some guarantee to be tested and aren't surrounded by completely useless context :) -Jared
Re: [Fwd: DoubleCheck DataFeed access]
Also: we recently contributed a new URIBL plugin to the Qpsmtpd project, which makes use of our pruned TLD lists that use your datafeed data. They had some question of whether this was something you would be comfortable with having distributed publicly. Note no actual URIBL data is distributed, just a list of TLDs that happens to *not* include top-level TLDs that would be extremely unlikely to generate hits against your service. This is used to limit the number of extraneous queries to your public mirrors, which I'm guessing you would consider beneficial. Could you verify whether we have your permission/blessing to distribute such a list gleaned from your data? Ya, fine. It doesnt sound like it would have significant impact on volume to me, as the top 25 tlds (including ipv4 volume) that are queried represent 91% of the total query volume, and the top 100 tld represent 99% of the volume. If you tell me which tlds are suppressed, I can give you an idea of query volume savings according to mirror traffic. For example, suppression of .mil and .int would result save 4/100th of a percent (0.00039). Now, if there was a hacked webserver in .mil and spammers used it as a drop page or redirector, our temporary listing of it would never hit for you if you suppress them.I guess you have to weigh the savings versus the potential for abuse. You wouldnt want to supress a TLD that becomes the next spammer haven and have to scramble to release an update. Thinking about recent history such as .tk (Tokelau), .st (Sao Tome), .im (Isle of Man), and others. There are 135 excluded tlds: ac ad af ag ai al an ao aq arpa asia aw ax bb bf bi bj bm bn bo bt bw cd cf cg ci ck coop cr cu cv dj dm do dz edu er et fj fk fm fo ga gf gh gi gl gm gn gov gp gq gt gu gw gy ht int iq jm jo jobs kh ki km kn kp kw ky lb lc lk lr ls lu ly mc mg mh mil ml mm mo mp mq mr mt museum mv mw mz na nc nf ng ni nr om pa pf pg pn ps pw qa re rw sb sd sh sl sm sn sr sv sy sz td tel tf tg tj tm tn travel ug va vg vi vu wf ye yu zm zw ... I also used the feed data to prune two and three level TLDs from the SURBL list, which is pretty obviously not based on any data: coop.br coop.tt gov.ae gov.am gov.ar gov.as gov.au gov.az gov.ba gov.bd gov.bh gov.bs gov.by gov.bz gov.ch gov.cl gov.cm gov.co gov.cx gov.cy gov.ec gov.ee gov.eg gov.ge gov.gg gov.gr gov.hk gov.hu gov.ie gov.il gov.im gov.in gov.io gov.ir gov.is gov.it gov.je gov.jp gov.kg gov.kz gov.la gov.li gov.lt gov.lv gov.ma gov.me gov.mk gov.mn gov.mu gov.my gov.np gov.ph gov.pk gov.pl gov.pr gov.pt gov.py gov.rs gov.ru gov.sa gov.sc gov.sg gov.sk gov.st gov.tl gov.to gov.tp gov.tt gov.tv gov.tw gov.ua gov.uk gov.vc gov.ve gov.vn gov.ws gov.za jobs.tt tel.no tel.tr act.gov.au nsw.gov.au nt.gov.au pa.gov.pl po.gov.pl qld.gov.au sa.gov.au so.gov.pl sr.gov.pl starostwo.gov.pl tas.gov.au ug.gov.pl um.gov.pl upow.gov.pl uw.gov.pl vic.gov.au wa.gov.au There were also about _1400_ reserved .us TLDs prune from the list, IIRC You make some pretty good points. It may well not be worth the trouble, at least for one-level TLDs. Thanks, Jared Johnson Software Developer DoubleCheck Email Manager
Re: per-recipient configuration
On Wed, Jul 28, 2010 at 2:35 PM, David Nicol davidni...@gmail.com wrote: On Wed, Jul 28, 2010 at 2:14 PM, Jared Johnson jjohn...@efolder.net wrote: I think we've had a thread about this before, but how do you see the API for a standard hook for persistence working? either the tie or overload interface is invoked by the plug-in, [...] so inside QP, the qpsmtpd::address object would have a known parameter that brings up the per-address persistence hash, which would be a flat hash. Something like $object-{persistent} = ( eval { $PERSISTENCEFRAMEWORK-new(Address = $object) } or {} ); at the end of the address object constructor, possibly even more generic. The persistent element would default to a non-persistent version when $PERSISTENCEFRAMEWORK isn't set to something that works, and when it has been configured, per-address config can be loaded or altered via return HARDFAIL if $AddressObject-{persistent}-{AcceptableSpamScore} $Message-getSpamScore or such
Re: per-recipient configuration
so inside QP, the qpsmtpd::address object would have a known parameter that brings up the per-address persistence hash, which would be a flat hash. Something like .. See in my mind, per-recipient config and persistent data storage are more separate. Maybe part of the reason I look at it this way is that in my own implementation, I never really write config for a recipient, I only read it (from my persistent storage, the db, of course). I don't see it necessary to be able to say $rcpt-config( spam_threshold = 10 ) from QP, I'd do it from the UI). Stored things are always written from QP (logging) and sometimes read (auto whitelist ). A hook_user_config plugin would even be likely to make use of the persistent storage itself, but I still see the concepts as split, and you could implement and benefit from either one without the other. When I went to write a reference plugin for hook_user_config, I just thought of one where an admin can just drop config for users into directories on the fs in the event that he wants to override what's already set on the global level; I think the hook structure even fell through to $qp-config() so it could really just begin with an extension of /etc/qpsmtpd and go from there, just like $qp-config() does. That said, for persistent storage I would like to see a more straight-forward API and skeleton API; something like: $txn-get, $txn-store, $rcpt-get, $rcpt-store, and a corresponding hook_get and hook_store that are passed ( $self, $txn, [ $rcpt ] ) so it knows what to key on, though it can even ignore it if it wants. Extra points if QP provides a safe-but-reusing-connections-appropriately-depending-on-the-forking-model DBI handle via $self-qp-dbh, and maybe a $self-qp-cache() for a 'persistent' Cache::FastMmap or Memcached cache, but the plugin could be required to establish the actual data store for itself. Then the plugin goes to town. It can store everything in a generic way, key = value or whatever, or it could map the key names to your own business logic. The reference persistent storage plugin could still implement the type of store you're talking about, and that would work out of the box for people who just want to have greylisting out of the box or etc; but if I'm reading it correctly, even if it's really awesome it's likely a lot of developers would just have to scrap it in favor of what they're already doing, at which point the more flexibility hook_store plugins have the better. # in stock upstream plugins/greylist $rcpt-store( greylist = $self-qp-connection-remote_ip ); # in our internal storage plugin that overrides the generic one # I don't think anyone would actually want this in particular :) sub init { shift-{dbh} = $dbi-connect(...) } sub dbh { shift-{dbh} } sub hook_store { my ( $self, $txn, $rcpt, $conf, $value ) = @_; return DECLINED unless $conf eq 'greylist'; return OK unless $rcpt-notes('user_uid'); my $sth = $self-dbh-prepare_cached( INSERT INTO bobsgreylist (ip,helo,rcpt) VALUES (?,?,?) ); $sth-execute( $value, $self-qp-connection-hello_host, $rcpt-notes('user_uid') ); return OK; } Unlike per-recip config, though we don't have any API etc. written in-house to support generic persistent storage writing, for now we just stick our DBI directly in our plugins, making them even more forked; so this is all purely theoretical and David has the advantage of speaking from some experience :) -Jared
Re: per-recipient configuration
See in my mind, per-recipient config and persistent data storage are more separate. Maybe part of the reason I look at it this way is that in my own implementation, I never really write config for a recipient, I only read it (from my persistent storage, the db, of course). I don't see it necessary to be able to say $rcpt-config( spam_threshold = 10 ) from QP, I'd do it from the UI). Stored things are always written from QP I have stuff that might wind up looking like $rcpt-persistent-{statistics_receivedmsgs_total}++; $rcpt-persistent-{statistics_receivedstatsbysender}{$NormalizedSenderAddress}++; in addition to reading configuration.