Re: gnulib-tool.py: Optimize module set lookups

2024-04-11 Thread Collin Funk
Hi Bruno,

On 4/11/24 12:51 PM, Bruno Haible wrote:
> The func_transitive_closure function in the shell implementation can take
> a lot of time. So I wondered whether in the Python implementation, there
> is room for speedup at this place as well. And indeed, there is.

Nice work. I noticed this too, which was my reasoning for wanting to
change GLModuleSystem to use sets to store the modules instead of
lists.

Doing so would break the sorting required for the test cases though.
All modules are sorted except for the dummy module which is placed at
the end.

I ended up just changing the avoided modules since it was easy and
every module + it's dependencies are checked:

while inmodules:
inmodules_this_round = inmodules
inmodules = []   # Accumulator, queue for next round
for module in inmodules_this_round:
if module not in self.avoids:  # Important line here.
outmodules.append(module)

Collin



gnulib-tool.py: Optimize module set lookups

2024-04-11 Thread Bruno Haible
The func_transitive_closure function in the shell implementation can take
a lot of time. So I wondered whether in the Python implementation, there
is room for speedup at this place as well. And indeed, there is.

Profiling the test-create-testdir-4.sh unit test, I saw this
profiler output:

+ 43843917 function calls (43842963 primitive calls) in 201.018 seconds
+
+   Ordered by: internal time
+
+   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
+   11  181.248   16.477  181.248   16.477 {built-in method posix.waitpid}
+ 250090653.6660.0003.6660.000 GLModuleSystem.py:225(__eq__)
+970052.7830.0006.6640.000 GLModuleSystem.py:184(__init__)
+ 78911.5040.0004.4720.001 GLModuleSystem.py:915()
+   1075971.0560.0001.0560.000 {built-in method io.open}
+   3270941.0430.0001.5090.000 posixpath.py:338(normpath)
+   3166960.8570.0001.1080.000 posixpath.py:71(join)
+   1940490.6100.0000.6100.000 {method 'read' of 
'_io.BufferedReader' objects}
+   2463960.5670.0000.5670.000 {built-in method posix.stat}
+970130.4540.0000.5870.000 codecs.py:681(__init__)
+   3165050.4280.0003.0480.000 constants.py:269(joinpath)
+970130.4160.0002.0050.000 codecs.py:871(open)
+ 15830.3540.000   16.8040.011 
GLModuleSystem.py:814(transitive_closure)
+10.2990.2990.9030.903 GLModuleSystem.py:954()
+970070.2840.0000.9860.000 codecs.py:451(read)
+970050.2400.000   10.3760.000 GLModuleSystem.py:99(find)
+440610.2280.000   10.8060.000 
GLModuleSystem.py:537(getDependenciesWithConditions)
+  27865080.2120.0000.2120.000 {method 'append' of 'list' 
objects}
+   1022960.1930.0001.3530.000 GLFileSystem.py:77(lookup)
+   4892390.1900.0000.2710.000 GLModuleSystem.py:257(__hash__)
+10.1780.178  200.996  200.996 main.py:120(main)

So, I changed a list to a set in two places. As expected, this nearly
annihilates the 25 million GLModule.__eq__ invocations:

+ 19010970 function calls (19010016 primitive calls) in 195.304 seconds
+
+   Ordered by: internal time
+
+   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
+   11  181.039   16.458  181.039   16.458 {built-in method posix.waitpid}
+970052.7700.0006.6760.000 GLModuleSystem.py:184(__init__)
+   1075971.0640.0001.0640.000 {built-in method io.open}
+   3270941.0160.0001.4860.000 posixpath.py:338(normpath)
+   3166960.8570.0001.1110.000 posixpath.py:71(join)
+   1940490.5850.0000.5850.000 {method 'read' of 
'_io.BufferedReader' objects}
+   2463960.5560.0000.5560.000 {built-in method posix.stat}
+970130.4540.0000.5970.000 codecs.py:681(__init__)
+   3165050.4240.0003.0260.000 constants.py:269(joinpath)
+970130.4140.0002.0230.000 codecs.py:871(open)
+ 15830.3310.000   12.2040.008 
GLModuleSystem.py:814(transitive_closure)
+970070.2840.0000.9620.000 codecs.py:451(read)
+970050.2430.000   10.3750.000 GLModuleSystem.py:99(find)
+440610.2240.000   10.7950.000 
GLModuleSystem.py:537(getDependenciesWithConditions)
+  27865140.2130.0000.2130.000 {method 'append' of 'list' 
objects}
+   5044890.1970.0000.2770.000 GLModuleSystem.py:257(__hash__)
+   1022960.1900.0001.3430.000 GLFileSystem.py:77(lookup)
+10.1870.187  195.283  195.283 main.py:120(main)

The speed improvement is also measurable without a profiler:

Before:
$ GNULIB_TOOL_IMPL=py time ./test-create-testdir-4.sh
real 192.48s
user 131.35s
sys  51.53s

After:
$ GNULIB_TOOL_IMPL=py time ./test-create-testdir-4.sh
real 189.47s
user 128.31s
sys  51.64s


2024-04-11  Bruno Haible  

gnulib-tool.py: Optimize module set lookups.
* gnulib-tool.py (profiler_args): New variable.
* pygnulib/GLModuleSystem.py (GLModuleTable.transitive_closure): Turn
handledmodules into a set.
(GLModuleTable.transitive_closure_separately): For the 'in' test, use
a set variable main_modules_set.

diff --git a/gnulib-tool.py b/gnulib-tool.py
index bb2837a3d5..cdcd316909 100755
--- a/gnulib-tool.py
+++ b/gnulib-tool.py
@@ -144,4 +144,7 @@ else
   func_fatal_error "python3 not found; try setting GNULIB_TOOL_IMPL=sh"
 fi
 
-exec python3 "$gnulib_dir/.gnulib-tool.py" "$@"
+profiler_args=
+# For profiling, cf. <https://docs.python.org/3/library/profile.html>.
+#profiler_args="-m cProfile -s tottime"
+exec python3 $profiler_args "$gnulib_dir/.gnulib-tool.py" "$@"
diff --git a/pygnulib/GLModuleS