Am 2022-05-17 17:09, schrieb [email protected]:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987

--- Comment #27 from Michael Storz <[email protected]> ---
(In reply to Henrik Krohns from comment #24)
(In reply to Michael Storz from comment #23)
> Great work! Are you now ready to switch from the brute force algorithm to a
> deterministic algorithm? I am not sure which of the two algorithms would be
> faster. That needs to be tested.

It should not matter if I'm ready or not. If you have something, then share
it for everyone. :-)

Ok, let's see if the change leads to better performance or not. The idea is
really simple, no big surprise, no fancy algorithm :-)

Each meta rule needs a counter with the number of dependent rules. For each rule, an array is created with all meta rules that depend on that rule. When a rule has finished, got_hit/got_mis/rule_ready, it decrements the counter of each dependent meta rule. If the counter is 0, then the meta rule is moved from meta_pending to meta_ready queue. The meta_ready queue is then processed by
do_meta_tests.

Alternative: Instead of moving the meta rule, it is executed immediately. The do_meta_tests subroutine then becomes redundant (what to do with the reuse part
then, I have no idea).

In the end, the leftover meta-rules must then be handled with
finish_meta_tests.

A few code fragments for clarification:

sub compile_meta_rules

  foreach my $name (keys %{$conf->{tests}}) {
    ...
    $conf->{meta_dep_count}->{$name} = scalar @{$rule_deps{$name}};
    foreach my $rulename (@{$rule_deps{$name}}) {
      push @{$conf->{rule_dependencies}->{$rulename}}, $name;
    }
  }

---------------------------------------

got_hit/got_miss/rule_ready

foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
move_meta_p2r($meta_rule) unless --$conf->{meta_dep_count}->{$meta_rule};
}

foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
run_meta_rule($meta_rule) unless --$conf->{meta_dep_count}->{$meta_rule};
}

run_meta_rule like do_meta_tests + delete $pms->{meta_pending}->{$meta_rule}

sub do_meta_tests {
  my ($self, $pms, $priority) = @_;
  ...
  return if $self->{am_compiling}; # nothing to compile here

  my $mr = $pms->{meta_ready};
  my $mt = $pms->{conf}->{meta_tests};
  my $h = $pms->{tests_already_hit};

  while (my $rulename = shift @$mr) {
# Metasubs look like ($_[1]->{$rulename}||($_[2]->{$rulename}?1:0)) ...
    my $result = $mt->{$rulename}->($pms, $h, {});
    if ($result) {
      dbg("rules: ran meta rule $rulename ======> got hit ($result)");
$pms->got_hit($rulename, '', ruletype => 'meta', value => $result);
    } else {
      dbg("rules-all: ran meta rule $rulename, no hit") if
$would_log_rules_all;
      $pms->got_miss($rulename); # mark meta done
    }
  }
}

Performance: The standard ruleset of SpamAssassin includes about 3,000 rules. One third of these are meta rules. The performance of the evaluation of the meta rules therefore matters.

My code fragments with move_meta_p2r and run_meta_rule would therefore degrade performance if these were actually implemented as subroutine calls, since about 1,000 additional subroutine calls would then be made.

The immediate execution of meta rules by run_meta_rule makes little sense, since we do not short-circuit the boolean expressions of the meta rules. Short-circuit would only bring something, if then in consequence large parts of the rules would not be evaluated any more.

Therefore only move_meta_p2r comes into question. move_meta_p2r can be implemented as the two statements add ready + delete pending.

In do_meta_tests we then have the set of ready meta rules. Instead of processing them individually, which would mean executing 1,000 subroutine calls, we can now generate subroutines that execute many meta rules at once, analogous to the procedure for the header tests. This could reduce the number of calls to a handful and would boost performance.

Michael

Reply via email to