Am 2022-05-17 17:09, schrieb [email protected]:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
--- Comment #27 from Michael Storz <[email protected]> ---
(In reply to Henrik Krohns from comment #24)
(In reply to Michael Storz from comment #23)
> Great work! Are you now ready to switch from the brute force algorithm to a
> deterministic algorithm? I am not sure which of the two algorithms would be
> faster. That needs to be tested.
It should not matter if I'm ready or not. If you have something, then
share
it for everyone. :-)
Ok, let's see if the change leads to better performance or not. The
idea is
really simple, no big surprise, no fancy algorithm :-)
Each meta rule needs a counter with the number of dependent rules. For
each
rule, an array is created with all meta rules that depend on that rule.
When a
rule has finished, got_hit/got_mis/rule_ready, it decrements the
counter of
each dependent meta rule. If the counter is 0, then the meta rule is
moved from
meta_pending to meta_ready queue. The meta_ready queue is then
processed by
do_meta_tests.
Alternative: Instead of moving the meta rule, it is executed
immediately. The
do_meta_tests subroutine then becomes redundant (what to do with the
reuse part
then, I have no idea).
In the end, the leftover meta-rules must then be handled with
finish_meta_tests.
A few code fragments for clarification:
sub compile_meta_rules
foreach my $name (keys %{$conf->{tests}}) {
...
$conf->{meta_dep_count}->{$name} = scalar @{$rule_deps{$name}};
foreach my $rulename (@{$rule_deps{$name}}) {
push @{$conf->{rule_dependencies}->{$rulename}}, $name;
}
}
---------------------------------------
got_hit/got_miss/rule_ready
foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
move_meta_p2r($meta_rule) unless
--$conf->{meta_dep_count}->{$meta_rule};
}
foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
run_meta_rule($meta_rule) unless
--$conf->{meta_dep_count}->{$meta_rule};
}
run_meta_rule like do_meta_tests + delete
$pms->{meta_pending}->{$meta_rule}
sub do_meta_tests {
my ($self, $pms, $priority) = @_;
...
return if $self->{am_compiling}; # nothing to compile here
my $mr = $pms->{meta_ready};
my $mt = $pms->{conf}->{meta_tests};
my $h = $pms->{tests_already_hit};
while (my $rulename = shift @$mr) {
# Metasubs look like ($_[1]->{$rulename}||($_[2]->{$rulename}?1:0))
...
my $result = $mt->{$rulename}->($pms, $h, {});
if ($result) {
dbg("rules: ran meta rule $rulename ======> got hit ($result)");
$pms->got_hit($rulename, '', ruletype => 'meta', value =>
$result);
} else {
dbg("rules-all: ran meta rule $rulename, no hit") if
$would_log_rules_all;
$pms->got_miss($rulename); # mark meta done
}
}
}
Performance: The standard ruleset of SpamAssassin includes about 3,000
rules. One third of these are meta rules. The performance of the
evaluation of the meta rules therefore matters.
My code fragments with move_meta_p2r and run_meta_rule would therefore
degrade performance if these were actually implemented as subroutine
calls, since about 1,000 additional subroutine calls would then be made.
The immediate execution of meta rules by run_meta_rule makes little
sense, since we do not short-circuit the boolean expressions of the meta
rules. Short-circuit would only bring something, if then in consequence
large parts of the rules would not be evaluated any more.
Therefore only move_meta_p2r comes into question. move_meta_p2r can be
implemented as the two statements add ready + delete pending.
In do_meta_tests we then have the set of ready meta rules. Instead of
processing them individually, which would mean executing 1,000
subroutine calls, we can now generate subroutines that execute many meta
rules at once, analogous to the procedure for the header tests. This
could reduce the number of calls to a handful and would boost
performance.
Michael