Re: gnulib-tool.py: lists vs. sets

Bruno Haible Sat, 06 Apr 2024 06:42:10 -0700

Hi Collin,

> I was looking at simplifying some list comprehensions yesterday and
> noticed that using sets + removing unnecessary sorted() calls in
> GLModuleTable reduces the time of ./import-tests/test-all.sh from 4.8
> seconds to 4.5 seconds consistently. A truly life changing amount of
> time.
> 
> You can probably notice this by looking in
> GLModuleTable.transitive_closure and
> GLModuleTable.transitive_closure_separately. Most of the time we are
> checking if a module is in a list, O(N). It makes more sense to use a
> set, O(1).
> 
> The only issue with that is it becomes annoying to deal with the dummy
> module. Is there a reason why that one doesn't get sorted?


Two comments:

* gnulib-tool.py is between 2x and 300x faster than gnulib-tool.sh, depending
  on the arguments. This means, the goal of speedup has been fully achieved.
  The other goals (clear code, maintainability, etc.) are still present.

  This means, if you have a 7% speed improvement and it does not impede
  the other goals, nor changes the results, then apply it. But when you start
  seeing that something else is in its way, then leave it.

* Lists have a lookup time of O(N) on average, sets of O(1) or O(log N),
  depending on implementation. But lists preserve the order.

  A common trick to have both advantages — in the case that there are no
  removals — is to keep a list _and_ a set in two variables (or two data
  fields of the same class). Do the insertions on both. Do the lookups on
  the set.

  This technique can also be generalized to support removals, but that is
  a bit more tricky.

Bruno

Re: gnulib-tool.py: lists vs. sets

Reply via email to