Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready

Michael Storz Wed, 18 May 2022 02:03:53 -0700

Am 2022-05-17 17:09, schrieb [email protected]:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7987
--- Comment #27 from Michael Storz <[email protected]> ---
(In reply to Henrik Krohns from comment #24)
(In reply to Michael Storz from comment #23)
> Great work! Are you now ready to switch from the brute force algorithm to a
> deterministic algorithm? I am not sure which of the two algorithms would be
> faster. That needs to be tested.
It should not matter if I'm ready or not. If you have something, thenshare
it for everyone. :-)
Ok, let's see if the change leads to better performance or not. Theidea is
really simple, no big surprise, no fancy algorithm :-)
Each meta rule needs a counter with the number of dependent rules. Foreachrule, an array is created with all meta rules that depend on that rule.When arule has finished, got_hit/got_mis/rule_ready, it decrements thecounter ofeach dependent meta rule. If the counter is 0, then the meta rule ismoved frommeta_pending to meta_ready queue. The meta_ready queue is thenprocessed by
do_meta_tests.
Alternative: Instead of moving the meta rule, it is executedimmediately. Thedo_meta_tests subroutine then becomes redundant (what to do with thereuse part
then, I have no idea).

In the end, the leftover meta-rules must then be handled with
finish_meta_tests.

A few code fragments for clarification:

sub compile_meta_rules

  foreach my $name (keys %{$conf->{tests}}) {
    ...
    $conf->{meta_dep_count}->{$name} = scalar @{$rule_deps{$name}};
    foreach my $rulename (@{$rule_deps{$name}}) {
      push @{$conf->{rule_dependencies}->{$rulename}}, $name;
    }
  }

---------------------------------------

got_hit/got_miss/rule_ready

foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
move_meta_p2r($meta_rule) unless--$conf->{meta_dep_count}->{$meta_rule};
}

foreach my $meta_rule (@{$conf->{rule_dependencies}->{$rulename}) {
run_meta_rule($meta_rule) unless--$conf->{meta_dep_count}->{$meta_rule};
}
run_meta_rule like do_meta_tests + delete$pms->{meta_pending}->{$meta_rule}
sub do_meta_tests {
  my ($self, $pms, $priority) = @_;
  ...
  return if $self->{am_compiling}; # nothing to compile here

  my $mr = $pms->{meta_ready};
  my $mt = $pms->{conf}->{meta_tests};
  my $h = $pms->{tests_already_hit};

  while (my $rulename = shift @$mr) {
# Metasubs look like ($_[1]->{$rulename}||($_[2]->{$rulename}?1:0))...
    my $result = $mt->{$rulename}->($pms, $h, {});
    if ($result) {
      dbg("rules: ran meta rule $rulename ======> got hit ($result)");
$pms->got_hit($rulename, '', ruletype => 'meta', value =>$result);
    } else {
      dbg("rules-all: ran meta rule $rulename, no hit") if
$would_log_rules_all;
      $pms->got_miss($rulename); # mark meta done
    }
  }
}

Performance: The standard ruleset of SpamAssassin includes about 3,000rules. One third of these are meta rules. The performance of theevaluation of the meta rules therefore matters.

My code fragments with move_meta_p2r and run_meta_rule would thereforedegrade performance if these were actually implemented as subroutinecalls, since about 1,000 additional subroutine calls would then be made.

The immediate execution of meta rules by run_meta_rule makes littlesense, since we do not short-circuit the boolean expressions of the metarules. Short-circuit would only bring something, if then in consequencelarge parts of the rules would not be evaluated any more.

Therefore only move_meta_p2r comes into question. move_meta_p2r can beimplemented as the two statements add ready + delete pending.

In do_meta_tests we then have the set of ready meta rules. Instead ofprocessing them individually, which would mean executing 1,000subroutine calls, we can now generate subroutines that execute many metarules at once, analogous to the procedure for the header tests. Thiscould reduce the number of calls to a handful and would boostperformance.


Michael

Re: [Bug 7987] DNSEval.pm,HashBL.pm,URILocalBL.pm: unnecessary use of rule_pending and rule_ready

Reply via email to