-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Loren Wilton writes:
> Let me challenge or at least prod around the edges of this a bit to further
> my understanding.
> 
> I think what you are saying is that priority is used (at least in part) to
> do the ordering that is known or believed to be required.  However, there
> seems to be some ordering built into PMS itself, such as firing the net
> rules first and then harvesting the results later.
> 
> That makes me believe that we probably have two methods of the same thing:
> some rules are ordered because the pms code is written to do them in a given
> order, and some rules are ordered because someone assigned a priority
> somewhere.
> 
> I guess I'm mostly wondering if 'priority' as a number (at least one with a
> seemingly rather fine granularity) is necessarily the way to do this.  It is
> certainly general.  But I'm wondering if this is over-general, and can end
> up forcing a rule ordering algorithm to make potentially bad ordering
> decisions.
> 
> Might it be reasonable to do the enforced ordering based on a small set of
> known rule types, and just flag the unusual rules of each special type?  The
> unusual rules that come to mind just at the instant are net, bayes, and awl.
> Maybe there are more, but I can't think what they would be at the moment.
> 
> While it could be argued that an enumeration is just a form of priority that
> doesn't use numbers, it seems to me to have an advantage - you can change
> the order that you look at the enumerated values without having to change
> the values themselves.  Also, it would prevent assigning 'useless'
> classifications such as priorities of 501, 502 and 503 to three user rules.
> 
> An example of why I think an enumeration might be better:  Right now all net
> rules are started first, since they take longest.  But suppose we have a
> rule that will score -100 and the total positive score, including net rules,
> is only 100.  Clearly it makes more sense to evaluate that single -100 rule
> before firing any of the net rules -- if the -100 rule triggers the net rule
> scores are moot, and we have wasted significant system resources.  Doing
> that with priorities would be awkward.

Why?

    header BIG_NEGATIVE_RULE Foo =~ /bar/
    score BIG_NEGATIVE_RULE  -999
    priority BIG_NEGATIVE_RULE  10      # ensure it runs first

seems pretty simple and non-awkward imo ;)

This is certainly the main form of early-exiting that I'd want to
see -- a specific few tests (like USER_IN_WHITELIST) that run first
(using priority) and cause early exit.

The way priority works is that there's only 1 priority defined in the
config files -- AWL at pri 1000.  everything else is at the same priority,
except for meta tests at META_TEST_MIN_PRIORITY and DNSBL harvesting which
only happens after HARVEST_DNSBL_PRIORITY.

We've been trying to reduce the number of wierd effects in the rule
ordering and selection code, for example the fact that DNS tests are run
first and then harvested later -- that *should* be implemented as some
kind of general mechanism, instead of being keyed off the rule type,
because doing it by rule type means that there'll be no way for plugins to
support similar models for future unforeseen rule types without changes to
PerMsgStatus and/or Conf.   Also, AWL was previously not even a real rule
- -- it was just hard-coded into PerMsgStatus.  having priority let us fix
both of these (although the DNS stuff is still only half-implemented ;).

With a very small number of priorities, and the vast majority of rules
being in one priority "bucket", you can then optimise the rule selection
in that bucket and have more or less the same effect without too much
wierd stuff like side-effects of rule names, or rule types.

btw, yes, short-circuiting breaks Bayes autolearning and the AWL. this is
one of its downsides.  However, by far the biggest downside is that we
have never found a way to do automatic short-circuiting that didn't turn
out at about the same speed as not short-circuiting, when measured against
a corpus of mail that includes spam. ;)

You should read the discussion that went on when priority was implemented
last year: http://bugzilla.spamassassin.org/show_bug.cgi?id=2912 .  Much
of this was discussed then. 

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFC6mXUMJF5cimLx9ARAqaVAJ0f4WfH08qteaVQyTefK8ivIqeBNQCfbSIq
ruV89/Z0Eq2xG1vITm0s+qw=
=txDh
-----END PGP SIGNATURE-----

Reply via email to