-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Theo Van Dinter writes:
> Ok, here are my thoughts about how to do faster updates.  ie: how
> to release rules + scores faster, potentially multiple times a day.
> I currently only think rules + scores ought to be released this way -- people
> aren't going to be comfortable with automated code updates IMO.  Code/plugins
> are best left to full releases.  (plugin support could be easily added later
> on, btw.)
> 
> Pseudo-code is below, but here's some background details:
> 
> Updates occur from "channels".  The default channel is
> "updates.spamassassin.org", but the user can specify any number of
> channels on the commandline to use additionally.  These can either be
> provided by us (think of "updates" being stable vs "expirimental" vs ...),
> or some third party (as long as they provide the same infrastructure...)

cool.

> Updates have version numbers.  The value format of which is irrelevent,
> as long as its monotonically increasing.  For our updates I was thinking
> SVN revision, but could also do YYYYMMDDVV ala DNS SOA, etc.
> 
> Versions are tracked per channel and SpamAssassin version.  To check
> for updates, do a DNS TXT query ala "z.y.x.updates.spamassassin.org",
> where z.y.x refers to the version of SpamAssassin being used, aka:
> x.y.z for 3.0.2, etc.  For simplicitly, wildcards can be used on the
> DNS server to match a whole set of releases.  An example:
> 
> *.0.3.updates.spamassassin.org TXT "154203"
> *.1.3.updates.spamassassin.org TXT "158203"
> 
> I haven't decided if that needs to be more machine parsable for future
> expansion.  ie: "v=1 ver=154023 ...."   I can't think of anything off hand
> that would need to go in there so just a version number is probably ok.
> 
> For the initial request, mirrors.channel is a TXT record with an URL for
> the MIRRORED.BY (ie: http://spamassassin.apache.org/updates/MIRRORED.BY),
> which contains a list of parent URLs, and an optional list of options
> per mirror.  ie:
> 
> http://spamassassin.apache.org/updates weight=20
> http://spamassassin.kluge.net/updates
> http://somemirror.example.com/spamassassin/updates weight=4
> 
> Means there are 3 mirrors, weighted so the apache.org one will be used the
> most (80% of the time), followed by the example.com one (16% of the time),
> followed by the kluge.net one (4% of the time).  Weights are default
> '1', btw.
> 
> The directory that is to be mirrored out appropriately looks like:
> 
> dir/
>       MIRRORED.BY
>       version.ext
>       version.ext.sha1
>       ...
>       versionn.ext
>       versionn.ext.sha1
> 
> with "version.ext.gpg .. versionnn.ext.gpg" available optionally.
> I don't think GPG needs to be required, but for the paranoid amongst us,
> it needs to be available as an option.
> 
> At the end, the script outputs a number of channel.cf files, which by
> default will just be read by SpamAssassin at startup (leaving restarting
> spamd up to the admin outside the script, based on exit code...)  If a
> different directory is used, admin can simply include the channel.cf
> file in their local.cf.
> 
> There are a few things I haven't fully fleshed out yet:
> 
> 1) How to archive the update files together?  I envisioned a similar
> naming convention to our normal rules directory (ie: a bunch of files
> named ##_type.cf), but the script should just expect to download a single
> file which will then be expanded.  I don't want to rely on system calls to
> run an expansion, nor do I want to expect tar or zip to be installed, etc.
> 
> 2) How to validate with GPG?  Similar to the archive issue.  Perhaps using
> GnuPG::Interface?  It's really just a wrapper to running gpg from the
> commandline, but at least abstracts the issue for platforms where "gpg" isn't
> what I think it is.
> 
> 3) Using "channel.cf" means that it may or may not come after local.cf.
> We should probably use some form of prefix to get it to load beforehand,
> but what?  People should be able to override the channel config if
> they want to.  I don't know if I want "AA_updates_spamassassin_org.cf"
> as a file.
> 
> Pseudo code:
> 
> - Script has a list of GPG keys which are allowed to sign update releases.
>   The default is 265FA05B, which is the SA signing key.
> - load Mail::SpamAssassin
> - load Digest::SHA1
> - load LWP
> - Accept commandline options for GPG keys to allow for signing in addition
>   to default (for third-party updates).
> - Accept commandline option for whether or not to use GPG for verification.
> - Accept commandline options for additional channels to use beyond
>   updates.spamassassin.org
> - Accept commandline option for parent directory for updates.  Default is
>   whatever the first site_rules_path value is, ie: /etc/mail/spamassassin.
>   ala: $msa->first_existing_path (@M::SA::site_rules_path);
> - Accept other options such as debug, version, etc.
> - exit code = 255
> - foreach ( @channels ):
>   - Convert channel name to "platform friendly" version?  Is
>     "foo.bar.baz.etc.example.com" ok for all platforms?  I was thinking
>     s/\./_/g

+1 on that.

>   - read /dir/channel.cf and get current version from comment on first line
>   - convert internal SA version to z.y.x format, and query DNS for
>     TXT z.y.x.channel
>   - if no answer, throw error, goto next channel
>   - for version checks, use ^(\d+) for version.  if same channel will have 
> same
>     update version value for different SA versions, can do "1345-3_0".
>   - if version is <= current, goto next channel
>   - if no /dir/channel/MIRRORED.BY file exists:
>     - query DNS for TXT mirrors.channel
>     - if no answer, throw error, goto next channel
>     - grab URI, write to /dir/channel/MIRRORED.BY
>   - read /dir/channel/MIRRORED.BY:
>     - add each parent URI to internal array.  if weight given, add URI that
>       many times.  (this algorithm can be made more efficient, but it's simple
>       for now.)
>   - foreach ( pick_random(@mirrors) ):
>     - grab parent_uri/version.foo ("foo" depends on the "what archive method" 
> issue)
>       - if there's an error, go back and choose another mirror
>     - grab parent_uri/version.foo.sha1 (ditto foo)
>     - do IMS grab for parent_uri/MIRRORED.BY, missing is ok
>     - if GPG is enabled, grab parent_uti/version.foo.gpg (ditto foo)
>     - an error in either GPG or SHA1 causes an error for the channel, goto
>       next channel
>     - no error means break out of the mirror loop
>     - write files to some temp place (mkdir tmpfile)
>     - if no mirrors work completely, channel fails, goto next channel
>   - validate version.foo.sha1 internally
>     - if failed, fail channel, goto next channel
>   - if GPG is enabled, validate version.foo.gpg (depends on the "how to do
>     gpg" issue)
>     - if failed, fail channel, goto next channel
>     - file fails if signature fails, or if signature is ok but not signed by
>       list of "trusted" keys
>   - remove all files except MIRRORED.BY from /dir/channel
>   - remove /dir/channel.cf
>   - unarchive version.foo into /dir/channel
>     - on error, fail channel, goto next channel
>   - move new MIRRORED.BY to /dir/channel if it exists
>   - remove temp version.foo* files
>   - create new /dir/channel.cf file
>     - first line is comment w/ version of channel
>     - foreach (readdir(/dir/channel)):
>       - add "include /dir/channel/file.cf", only do .cf files
>   - exit code = 0
> - return exit code

btw, I think Coral would be useful as a mirroring infrastructure, too.
http://www.scs.cs.nyu.edu/coral/

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCCA1AMJF5cimLx9ARAjEwAJ9O5bYxIzFblUP6aOWA1PlGMG2NmACfZ7I/
JnaQO/OYDtGKEbmx1Sec2PU=
=1B9W
-----END PGP SIGNATURE-----

Reply via email to