-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Theo Van Dinter writes: > Ok, here are my thoughts about how to do faster updates. ie: how > to release rules + scores faster, potentially multiple times a day. > I currently only think rules + scores ought to be released this way -- people > aren't going to be comfortable with automated code updates IMO. Code/plugins > are best left to full releases. (plugin support could be easily added later > on, btw.) > > Pseudo-code is below, but here's some background details: > > Updates occur from "channels". The default channel is > "updates.spamassassin.org", but the user can specify any number of > channels on the commandline to use additionally. These can either be > provided by us (think of "updates" being stable vs "expirimental" vs ...), > or some third party (as long as they provide the same infrastructure...) cool. > Updates have version numbers. The value format of which is irrelevent, > as long as its monotonically increasing. For our updates I was thinking > SVN revision, but could also do YYYYMMDDVV ala DNS SOA, etc. > > Versions are tracked per channel and SpamAssassin version. To check > for updates, do a DNS TXT query ala "z.y.x.updates.spamassassin.org", > where z.y.x refers to the version of SpamAssassin being used, aka: > x.y.z for 3.0.2, etc. For simplicitly, wildcards can be used on the > DNS server to match a whole set of releases. An example: > > *.0.3.updates.spamassassin.org TXT "154203" > *.1.3.updates.spamassassin.org TXT "158203" > > I haven't decided if that needs to be more machine parsable for future > expansion. ie: "v=1 ver=154023 ...." I can't think of anything off hand > that would need to go in there so just a version number is probably ok. > > For the initial request, mirrors.channel is a TXT record with an URL for > the MIRRORED.BY (ie: http://spamassassin.apache.org/updates/MIRRORED.BY), > which contains a list of parent URLs, and an optional list of options > per mirror. ie: > > http://spamassassin.apache.org/updates weight=20 > http://spamassassin.kluge.net/updates > http://somemirror.example.com/spamassassin/updates weight=4 > > Means there are 3 mirrors, weighted so the apache.org one will be used the > most (80% of the time), followed by the example.com one (16% of the time), > followed by the kluge.net one (4% of the time). Weights are default > '1', btw. > > The directory that is to be mirrored out appropriately looks like: > > dir/ > MIRRORED.BY > version.ext > version.ext.sha1 > ... > versionn.ext > versionn.ext.sha1 > > with "version.ext.gpg .. versionnn.ext.gpg" available optionally. > I don't think GPG needs to be required, but for the paranoid amongst us, > it needs to be available as an option. > > At the end, the script outputs a number of channel.cf files, which by > default will just be read by SpamAssassin at startup (leaving restarting > spamd up to the admin outside the script, based on exit code...) If a > different directory is used, admin can simply include the channel.cf > file in their local.cf. > > There are a few things I haven't fully fleshed out yet: > > 1) How to archive the update files together? I envisioned a similar > naming convention to our normal rules directory (ie: a bunch of files > named ##_type.cf), but the script should just expect to download a single > file which will then be expanded. I don't want to rely on system calls to > run an expansion, nor do I want to expect tar or zip to be installed, etc. > > 2) How to validate with GPG? Similar to the archive issue. Perhaps using > GnuPG::Interface? It's really just a wrapper to running gpg from the > commandline, but at least abstracts the issue for platforms where "gpg" isn't > what I think it is. > > 3) Using "channel.cf" means that it may or may not come after local.cf. > We should probably use some form of prefix to get it to load beforehand, > but what? People should be able to override the channel config if > they want to. I don't know if I want "AA_updates_spamassassin_org.cf" > as a file. > > Pseudo code: > > - Script has a list of GPG keys which are allowed to sign update releases. > The default is 265FA05B, which is the SA signing key. > - load Mail::SpamAssassin > - load Digest::SHA1 > - load LWP > - Accept commandline options for GPG keys to allow for signing in addition > to default (for third-party updates). > - Accept commandline option for whether or not to use GPG for verification. > - Accept commandline options for additional channels to use beyond > updates.spamassassin.org > - Accept commandline option for parent directory for updates. Default is > whatever the first site_rules_path value is, ie: /etc/mail/spamassassin. > ala: $msa->first_existing_path (@M::SA::site_rules_path); > - Accept other options such as debug, version, etc. > - exit code = 255 > - foreach ( @channels ): > - Convert channel name to "platform friendly" version? Is > "foo.bar.baz.etc.example.com" ok for all platforms? I was thinking > s/\./_/g +1 on that. > - read /dir/channel.cf and get current version from comment on first line > - convert internal SA version to z.y.x format, and query DNS for > TXT z.y.x.channel > - if no answer, throw error, goto next channel > - for version checks, use ^(\d+) for version. if same channel will have > same > update version value for different SA versions, can do "1345-3_0". > - if version is <= current, goto next channel > - if no /dir/channel/MIRRORED.BY file exists: > - query DNS for TXT mirrors.channel > - if no answer, throw error, goto next channel > - grab URI, write to /dir/channel/MIRRORED.BY > - read /dir/channel/MIRRORED.BY: > - add each parent URI to internal array. if weight given, add URI that > many times. (this algorithm can be made more efficient, but it's simple > for now.) > - foreach ( pick_random(@mirrors) ): > - grab parent_uri/version.foo ("foo" depends on the "what archive method" > issue) > - if there's an error, go back and choose another mirror > - grab parent_uri/version.foo.sha1 (ditto foo) > - do IMS grab for parent_uri/MIRRORED.BY, missing is ok > - if GPG is enabled, grab parent_uti/version.foo.gpg (ditto foo) > - an error in either GPG or SHA1 causes an error for the channel, goto > next channel > - no error means break out of the mirror loop > - write files to some temp place (mkdir tmpfile) > - if no mirrors work completely, channel fails, goto next channel > - validate version.foo.sha1 internally > - if failed, fail channel, goto next channel > - if GPG is enabled, validate version.foo.gpg (depends on the "how to do > gpg" issue) > - if failed, fail channel, goto next channel > - file fails if signature fails, or if signature is ok but not signed by > list of "trusted" keys > - remove all files except MIRRORED.BY from /dir/channel > - remove /dir/channel.cf > - unarchive version.foo into /dir/channel > - on error, fail channel, goto next channel > - move new MIRRORED.BY to /dir/channel if it exists > - remove temp version.foo* files > - create new /dir/channel.cf file > - first line is comment w/ version of channel > - foreach (readdir(/dir/channel)): > - add "include /dir/channel/file.cf", only do .cf files > - exit code = 0 > - return exit code btw, I think Coral would be useful as a mirroring infrastructure, too. http://www.scs.cs.nyu.edu/coral/ - --j. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Exmh CVS iD8DBQFCCA1AMJF5cimLx9ARAjEwAJ9O5bYxIzFblUP6aOWA1PlGMG2NmACfZ7I/ JnaQO/OYDtGKEbmx1Sec2PU= =1B9W -----END PGP SIGNATURE-----
