In GLModuleSystem.py Makefile.am snippets are searched twice for the
file names stored in 'lib_SOURCES += ...' variable. Instead of writing
the regular expression twice (differently) I have moved it to a private
variable & function.

Indirectly this also fixes two issues. The first is the same issue as
yesterdays patch. Variables with a trailing comment, e.g.

    lib_SOURCES += file.c file.h # comment

are not matched. As far as I can tell none of the module descriptions
have this comment style but gnulib-tool.sh handles it (line 3453-3466).
So gnulib-tool.py should too.

The second issue I have mentioned previously, I forget where. In some
nested loops we have "break" statements that are ineffective. Unlike
shell, in Python we can't use "break 2" to exit the outer loop.

In this case we had 3 loops nested, something like this:

     for module in modules:
         for value in lib_SOURCES_var:
            for file_name in value:
                if not file_name.endswith('.h'):
                   condition = True
                   break

So the break only stops the current loop 'lib_SOURCES' match of a
snippet. But we should stop looping entirely if even one module
description sets condition to True.

Since the use of a private functions removes a level of nesting, I have
added an 'if condition: break'. Not too messy and since the outer loop
goes over all modules (can be 1000+) it seems worth it.

Collin

>From 26d7603143116626cacba37b988fa26cc7dba1b8 Mon Sep 17 00:00:00 2001
From: Collin Funk <collin.fu...@gmail.com>
Date: Sun, 2 Jun 2024 22:32:39 -0700
Subject: [PATCH] gnulib-tool.py: Refactor duplicated regular expressions.

* pygnulib/GLModuleSystem.py (_LIB_SOURCES_PATTERN): New variable.
(_extract_lib_SOURCES): New function.
(GLModule.getAutomakeSnippet_Unconditional): Use the new function.
(GLModuleTable.add_dummy): Likewise. Add a second break statement to
stop unnecessary looping.
---
 ChangeLog                  |  9 +++++++++
 pygnulib/GLModuleSystem.py | 40 ++++++++++++++++++++++++--------------
 2 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 269293ca79..e0ed08a9af 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2024-06-02  Collin Funk  <collin.fu...@gmail.com>
+
+	gnulib-tool.py: Refactor duplicated regular expressions.
+	* pygnulib/GLModuleSystem.py (_LIB_SOURCES_PATTERN): New variable.
+	(_extract_lib_SOURCES): New function.
+	(GLModule.getAutomakeSnippet_Unconditional): Use the new function.
+	(GLModuleTable.add_dummy): Likewise. Add a second break statement to
+	stop unnecessary looping.
+
 2024-06-02  Bruno Haible  <br...@clisp.org>
 
 	c-strtod, c-strtof, c-strtold: Fix link error on AIX.
diff --git a/pygnulib/GLModuleSystem.py b/pygnulib/GLModuleSystem.py
index 5b8d331dd6..5d67c5b6df 100644
--- a/pygnulib/GLModuleSystem.py
+++ b/pygnulib/GLModuleSystem.py
@@ -40,6 +40,21 @@
 from .GLFileSystem import GLFileSystem
 
 
+# Regular expression matching a 'lib_SOURCES += ...' variable with the file
+# names in capture group 1.
+_LIB_SOURCES_PATTERN = re.compile(r'^lib_SOURCES[\t ]*\+=[\t ]*(.+?)[\t ]*(?:#|$)', re.MULTILINE)
+
+
+def _extract_lib_SOURCES(snippet: str) -> list[str]:
+    '''Return the file names specified in the lib_SOURCES variable of
+    a Makefile.am snippet.'''
+    lines = [ line.group(1)
+              for line in re.finditer(_LIB_SOURCES_PATTERN, snippet) ]
+    return [ file_name
+             for line in lines
+             for file_name in line.split() ]
+
+
 #===============================================================================
 # Define GLModuleSystem class
 #===============================================================================
@@ -571,13 +586,8 @@ def getAutomakeSnippet_Unconditional(self) -> str:
                 # Synthesize an EXTRA_DIST augmentation.
                 snippet = self.getAutomakeSnippet_Conditional()
                 snippet = combine_lines(snippet)
-                pattern = re.compile(r'^lib_SOURCES[\t ]*\+=[\t ]*(.*)$', re.MULTILINE)
-                mentioned_files = set(pattern.findall(snippet))
-                if mentioned_files:
-                    # Get all the file names from 'lib_SOURCES += ...'.
-                    mentioned_files = { filename
-                                        for line in mentioned_files
-                                        for filename in line.split() }
+                # Get all the file names from 'lib_SOURCES += ...'.
+                mentioned_files = _extract_lib_SOURCES(snippet)
                 all_files = self.getFiles()
                 lib_files = filter_filelist('\n', all_files,
                                             'lib/', '', 'lib/', '')
@@ -985,14 +995,14 @@ def add_dummy(self, modules: list[GLModule]) -> list[GLModule]:
                     # Extract the value of unconditional "lib_SOURCES += ..." augmentations.
                     snippet = combine_lines(snippet)
                     snippet = self.remove_if_blocks(snippet)
-                    pattern = re.compile(r'^lib_SOURCES[\t ]*\+=[\t ]*([^#]+?)$', re.MULTILINE)
-                    for matching_rhs in re.finditer(pattern, snippet):
-                        files = matching_rhs.group(1).split()
-                        for file in files:
-                            # Ignore .h files since they are not compiled.
-                            if not file.endswith('.h'):
-                                have_lib_sources = True
-                                break
+                    file_names = _extract_lib_SOURCES(snippet)
+                    for file_name in file_names:
+                        # Ignore .h files since they are not compiled.
+                        if not file_name.endswith('.h'):
+                            have_lib_sources = True
+                            break
+                    if have_lib_sources:
+                        break
         # Add the dummy module, to make sure the library will be non-empty.
         if not have_lib_sources:
             dummy = self.modulesystem.find('dummy')
-- 
2.45.1

Reply via email to