Hi Olivier, I'm focusing my SA time on getting 3.4.3 done but wanted to
mentioned that this patch is getting into non-trivial territory.  Could you
please confirm or file an ICLA?  It's awesome you are looking at
improvements and this helps us consider them freely:
https://www.apache.org/licenses/icla.pdf

Regards,
KAM
--
Kevin A. McGrail
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


On Fri, Feb 8, 2019 at 1:34 PM Olivier Coutu <[email protected]>
wrote:

> You were quite right Henrick, I had insufficiently tested it and while it
> computed dependencies only once, it did not properly retrieve previously
> computed dependencies.
>
> I updated the patch:
>
> --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-02-07 
> 15:01:16.124532948 -0500
> +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-02-07 
> 15:03:41.127692566 -0500
> @@ -968,24 +968,29 @@
>    foreach my $name (keys %{$conf->{tests}}) {
>      next unless ($conf->{test_types}->{$name}
>                      == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
> -
> -    my $deps = [ ];
> -    my $alreadydone = { };
> -    $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
> -    $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
> +    my $alreadydone = {};
> +    $self->_meta_deps_recurse($conf, $name, $name, $alreadydone);
>    }
>  }
>
>  sub _meta_deps_recurse {
> -  my ($self, $conf, $toprule, $name, $deps, $alreadydone) = @_;
> +  my ($self, $conf, $toprule, $name, $alreadydone) = @_;
>
> -  # Only do each rule once per top-level meta; avoid infinite recursion
> -  return if $alreadydone->{$name};
> -  $alreadydone->{$name} = 1;
> +  # Avoid recomputing the dependencies of a rule
> +  return split(' ', $conf->{meta_dependencies}->{$name}) if defined 
> $conf->{meta_dependencies}->{$name};
>
>    # Obviously, don't trace empty or nonexistent rules
>    my $rule = $conf->{tests}->{$name};
> -  return unless $rule;
> +  unless ($rule) {
> +      $conf->{meta_dependencies}->{$name} = '';
> +      return ( );
> +  }
> +
> +  # Avoid infinite recursion
> +  return ( ) if exists $alreadydone->{$name};
> +  $alreadydone->{$name} = ( );
> +
> +  my %deps;
>
>    # Lex the rule into tokens using a rather simple RE method ...
>    my $lexer = ARITH_EXPRESSION_LEXER;
> @@ -1001,9 +1006,12 @@
>      next unless exists $conf_tests->{$token};
>
>      # add and recurse
> -    push(@{$deps}, untaint_var($token));
> -    $self->_meta_deps_recurse($conf, $toprule, $token, $deps, $alreadydone);
> +    $deps{untaint_var($token)} = ( );
> +    my @subdeps = $self->_meta_deps_recurse($conf, $toprule, $token, 
> $alreadydone);
> +    @deps{@subdeps} = ( );
>    }
> +  $conf->{meta_dependencies}->{$name} = join (' ', keys %deps);
> +  return keys %deps;
>  }
>
>  sub fix_priorities {
>
>
> *_meta_deps_recurse* now returns a hash that acts as a set of
> dependencies.
>
> These dependencies are stored in *$conf->{meta_dependencies}->{$name}* as
> they are calculated.
>
> The result is not exactly the same as it was before, as I had some cases
> where duplicates in *$conf->{meta_dependencies}->{$name}* would arise.
> This is no longer the case. The sets of dependencies stay identical.
>
> I also considered changing *$conf->{meta_dependencies}->{$name}* to a
> list instead of a string, but that was not necessary to achieve a 4-fold
> performance improvement and would require additional refactoring.
>
> What do you think? I am not used to writing Perl code so feel free to
> correct the style.
> On 19-01-08 11 h 24, Kevin A. McGrail wrote:
>
> Thanks Henrik!
>
> On 1/8/2019 11:21 AM, Henrik K wrote:
>
> The code is saving dependencies of each meta rule in this variable:
>   $conf->{meta_dependencies}
>
> That variable is used in Check.pm meta tests.
>
> Simply adding debug print after this line:
>   $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>   print STDERR "DEPS FOR $name: $conf->{meta_dependencies}->{$name}\n";
>
> And running both versions with spamassassin --lint 2>/tmp/log1 and log2
>
> Shows for example that original version:
>
> DEPS FOR __YOU_WON: __GIVE_MONEY __HAS_WON_01 __MOVE_MONEY __YOU_WON_01 
> __YOU_WON_02 __YOU_WON_03 __YOU_WON_04 __YOU_WON_05
>
> New "faster" version:
>
> DEPS FOR __YOU_WON:
>
> So it appears __YOU_WON has no dependencies, thus the patch does not work as
> intended, Check.pm fails to check for dependencies, -1 from me.
>
> Contributions are very much appreciated, but I urge everyone to use very
> simple debug methods like this to verify even "trivial" things. :-)
>
> Cheers,
> Henrik
>
> On Tue, Jan 08, 2019 at 10:37:12AM -0500, Kevin A. McGrail wrote:
>
> It's an interesting issue and a trivial patch.  I'm sure it was done as a
> safety valve figuring the cycles weren't a big deal.  I can't think of any
> reason to keep checking all these meta dependencies over and over either.
>
> I'm +1 on this.
> --
> Kevin A. McGrail
> VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> [1]https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
>
> On Tue, Jan 8, 2019 at 10:31 AM Olivier Coutu <[3][email protected]> 
> <[3][email protected]>
> wrote:
>
>
>     We use a lot of custom rules (2k+) and lint time was growing out of
>     control, over a minute on some machines.
>
>     It appears like $alreadydone, the set of rules that have already been
>     linted, is reset for every test. This makes lint time quadratic in the
>     number of tests and does not seem to have any use. Moving it out of the
>     tests loop reduces my lint time from 21 seconds to 3 seconds with no
>     visible side effect.
>
>     This patch moves it out of the tests loop
>
>     --- /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 
> 17:54:19.244624440 -0500
>     +++ /usr/share/perl5/Mail/SpamAssassin/Conf/Parser.pm   2019-01-07 
> 17:55:11.948708724 -0500
>     @@ -964,13 +964,13 @@
>        my ($self) = @_;
>        my $conf = $self->{conf};
>        $conf->{meta_dependencies} = { };
>     +  my $alreadydone = { };
>
>        foreach my $name (keys %{$conf->{tests}}) {
>          next unless ($conf->{test_types}->{$name}
>                          == $Mail::SpamAssassin::Conf::TYPE_META_TESTS);
>
>          my $deps = [ ];
>     -    my $alreadydone = { };
>          $self->_meta_deps_recurse($conf, $name, $name, $deps, $alreadydone);
>          $conf->{meta_dependencies}->{$name} = join (' ', @{$deps});
>        }
>
>
>     The result on ok rulesets and on obviously broken rulesets does not change
>     from pre-patch, and I can't see any negative side effects of moving it out
>     of the loop.
>
>     I'm not used to posting on dev lists, so please tell me if more 
> information
>     is needed or if things should be presented differently.
>
>     -Olivier
>
>
> References:
>
> [1] https://www.linkedin.com/in/kmcgrail
> [3] mailto:[email protected] <[email protected]>
>
>

Reply via email to