You were quite right Henrick, I had insufficiently tested it and while it computed dependencies only once, it did not properly retrieve previously computed dependencies.

I updated the patch:

--- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-02-07 
15:01:16.124532948 -0500
+++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-02-07 
15:03:41.127692566 -0500
@@ -968,24 +968,29 @@
   foreach my $name (keys %{$conf->{tests}}) {
     next unless ($conf->{test_types}->{$name}
                     == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
-
-    my $deps = [ ];
-    my $alreadydone = { };
-    $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
-    $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
+    my $alreadydone = {};
+    $self->_meta_deps_recurse($conf, $name, $name, $alreadydone);
   }
 }
 sub _meta_deps_recurse {
-  my ($self, $conf, $toprule, $name, $deps, $alreadydone) = @_;
+  my ($self, $conf, $toprule, $name, $alreadydone) = @_;
-  # Only do each rule once per top-level meta; avoid infinite recursion
-  return if $alreadydone->{$name};
-  $alreadydone->{$name} = 1;
+  # Avoid recomputing the dependencies of a rule
+  return split(' ', $conf->{meta_dependencies}->{$name}) if defined 
$conf->{meta_dependencies}->{$name};
   # Obviously, don't trace empty or nonexistent rules
   my $rule = $conf->{tests}->{$name};
-  return unless $rule;
+  unless ($rule) {
+      $conf->{meta_dependencies}->{$name} = '';
+      return ( );
+  }
+
+  # Avoid infinite recursion
+  return ( ) if exists $alreadydone->{$name};
+  $alreadydone->{$name} = ( );
+
+  my %deps;
   # Lex the rule into tokens using a rather simple RE method ...
   my $lexer = ARITH_EXPRESSION_LEXER;
@@ -1001,9 +1006,12 @@
     next unless exists $conf_tests->{$token};
     # add and recurse
-    push(@{$deps}, untaint_var($token));
-    $self->_meta_deps_recurse($conf, $toprule, $token, $deps, $alreadydone);
+    $deps{untaint_var($token)} = ( );
+    my @subdeps = $self->_meta_deps_recurse($conf, $toprule, $token, 
$alreadydone);
+    @deps{@subdeps} = ( );
   }
+  $conf->{meta_dependencies}->{$name} = join (' ', keys %deps);
+  return keys %deps;
 }
 sub fix_priorities {


/_meta_deps_recurse/ now returns a hash that acts as a set of dependencies.

These dependencies are stored in /$conf->{meta_dependencies}->{$name}/ as they are calculated.

The result is not exactly the same as it was before, as I had some cases where duplicates in /$conf->{meta_dependencies}->{$name}/ would arise. This is no longer the case. The sets of dependencies stay identical.

I also considered changing /$conf->{meta_dependencies}->{$name}/ to a list instead of a string, but that was not necessary to achieve a 4-fold performance improvement and would require additional refactoring.

What do you think? I am not used to writing Perl code so feel free to correct the style.

On 19-01-08 11 h 24, Kevin A. McGrail wrote:
Thanks Henrik!

On 1/8/2019 11:21 AM, Henrik K wrote:
The code is saving dependencies of each meta rule in this variable:
   $conf->{meta_dependencies}

That variable is used in Check.pm meta tests.

Simply adding debug print after this line:
   $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
   print STDERR "DEPS FOR $name: $conf->{meta_dependencies}->{$name}\n";

And running both versions with spamassassin --lint 2>/tmp/log1 and log2

Shows for example that original version:

DEPS FOR __YOU_WON: __GIVE_MONEY __HAS_WON_01 __MOVE_MONEY __YOU_WON_01 
__YOU_WON_02 __YOU_WON_03 __YOU_WON_04 __YOU_WON_05

New "faster" version:

DEPS FOR __YOU_WON:

So it appears __YOU_WON has no dependencies, thus the patch does not work as
intended, Check.pm fails to check for dependencies, -1 from me.

Contributions are very much appreciated, but I urge everyone to use very
simple debug methods like this to verify even "trivial" things. :-)

Cheers,
Henrik

On Tue, Jan 08, 2019 at 10:37:12AM -0500, Kevin A. McGrail wrote:
It's an interesting issue and a trivial patch.  I'm sure it was done as a
safety valve figuring the cycles weren't a big deal.  I can't think of any
reason to keep checking all these meta dependencies over and over either.

I'm +1 on this.
--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
[1]https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Tue, Jan 8, 2019 at 10:31 AM Olivier Coutu <[3][email protected]>
wrote:


     We use a lot of custom rules (2k+) and lint time was growing out of
     control, over a minute on some machines.

     It appears like $alreadydone, the set of rules that have already been
     linted, is reset for every test. This makes lint time quadratic in the
     number of tests and does not seem to have any use. Moving it out of the
     tests loop reduces my lint time from 21 seconds to 3 seconds with no
     visible side effect.

     This patch moves it out of the tests loop

     --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 
17:54:19.244624440 -0500
     +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 
17:55:11.948708724 -0500
     @@ -964,13 +964,13 @@
        my ($self) = @_;
        my $conf = $self->{conf};
        $conf->{meta_dependencies} = { };
     +  my $alreadydone = { };

        foreach my $name (keys %{$conf->{tests}}) {
          next unless ($conf->{test_types}->{$name}
                          == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);

          my $deps = [ ];
     -    my $alreadydone = { };
          $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
          $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
        }


     The result on ok rulesets and on obviously broken rulesets does not change
     from pre-patch, and I can't see any negative side effects of moving it out
     of the loop.

     I'm not used to posting on dev lists, so please tell me if more information
     is needed or if things should be presented differently.

     -Olivier


References:

[1] https://www.linkedin.com/in/kmcgrail
[3] mailto:[email protected]

Reply via email to