https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7689

            Bug ID: 7689
           Summary: reduce lint time from quadratic to linear
           Product: Spamassassin
           Version: 3.4.2
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: spamassassin
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: Undefined

Created attachment 5642
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5642&action=edit
Patch to make lint time linear in number of rules instead of quadratic

c.f. Patch to reduce lint time in
https://mail-archives.apache.org/mod_mbox/spamassassin-dev/201902.mbox/browser

Lint checking does not re-use calculated dependencies, rendering the lint time
quadratic in the number of chained dependencies. This patch changes that time
to linear.

/_meta_deps_recurse/ now returns a hash that acts as a set of dependencies.

These dependencies are stored in /$conf->{meta_dependencies}->{$name}/ 
as they are calculated.

The result is not exactly the same as it was before, as I had some cases 
where duplicates in /$conf->{meta_dependencies}->{$name}/ would arise. 
This is no longer the case. The sets of dependencies stay identical.

I also considered changing /$conf->{meta_dependencies}->{$name}/ to a 
list instead of a string, but that was not necessary to achieve a 4-fold 
performance improvement and would require additional refactoring.

In practice, my lint time with a large number of dependencies goes from 20
seconds to 5 seconds on an unloaded machine with a 2.1 GHZ E5-2450 CPU. Memory
usage is mostly unaffected.

I have done tests with the standard ruleset. Since the number of chained
dependancies is low, the lint time is mostly unaffected, from 1.30s to 1.08s.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to