Hi Bruno,

I haven't pushed this patch yet, but it goes in the correct direction,
in my opinion.

Here we were talking about the rewrite_filename functions:

     https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00268.html

I asked about the 'test=lib/' filename replacement under mode ==
'copy-file' and you said:

> 'tests=lib/' should not occur in the scope of mode == 'copy-file'.
> But this string is an indicator inside gnulib-tool; users of gnulib-tool
> should not be allowed to pass it.

Does it make more sense to check for a filename starting with
'tests=lib/' and error out or is it safe to assume the input is clean?
Ideally I would like to use a module-level function or @staticmethod
there. Less repeated code to maintain.

The second thing I am not 100% satisfied with is the creation of
GLFileTable objects. In GLImport.prepare(), for example, it takes
three lines. The setters also take two arguments which feels a bit
unorthodox to me. After this patch in GLImport.prepare():

    # Construct the file table.
    filetable = GLFileTable(sorted(set(filelist)))
    filetable.set_old_files(self.cache, old_files)
    filetable.set_new_files(self.config, new_files)

The reason I did this is that, from what I understand, --import and
--create-testdir (and their respective classes) are essentially the
same for the most part. The main difference here is that
--create-testdir doesn't require reading a gnulib-cache.m4 file.
Therefore, there is no need for filetable.old_files. After this patch
in GLTestDir.execute():

    # Setup the file table.
    filetable = GLFileTable(filelist)
    filetable.set_new_files(self.config, filelist)

I feel that handling all different combinations for
GLFileTable.__init__() would get annoying. As an example, what is the
correct behavior when old_files is passed but a GLConfig with
directories read from gnulib-cache.m4 is not? That seems like invalid
input to me.

I feel like two @classmethod constructors would be a fine solution. Or
maybe inheritance would be better? To illustrate here is GLFileTable's
instance variables and their usage:

    class GLFileTable:
        # Used in GLImport & GLTestDir.
        all_files: list[str]
        new_files: list[tuple[str, str]]

        # Used in GLImport only.
        old_files: list[tuple[str, str]]
        added_files: list[str]
        removed_files: list[str]

Any opinions on these ideas? If not, I will probably sleep on it and
push an improved patch.

Collin
From 3d937ef8f553e91183d10e6bb328a47f4dfa539d Mon Sep 17 00:00:00 2001
From: Collin Funk <collin.fu...@gmail.com>
Date: Wed, 24 Apr 2024 22:36:56 -0700
Subject: [PATCH] gnulib-tool.py: Reduce code duplication in filename
 transformations.

* pygnulib/GLFileTable.py (GLFileTable.__init__): Add missing type
checks for argument.
(GLFileTable.rewrite_filename): New function derived from removed
GLImport and GLTestDir functions. Accept a single filename instead of a
list.
(GLFileTable.set_old_files, GLFileTable.set_new_files): New functions to
mutate the GLFileTable using the GLFileTable.rewrite_filename function.
* pygnulib/GLImport.py (GLImport.prepare): Use the new GLFileTable
functions.
(GLImport.rewrite_new_files, GLImport.rewrite_old_files): Remove
functions.
* pygnulib/GLTestDir.py (GLImport.execute): Use the new GLFileTable
functions.
(GLImport.rewrite_files): Remove function.
---
 ChangeLog               | 18 ++++++++
 pygnulib/GLFileTable.py | 69 +++++++++++++++++++++++++++++
 pygnulib/GLImport.py    | 98 +++--------------------------------------
 pygnulib/GLTestDir.py   | 49 ++-------------------
 4 files changed, 96 insertions(+), 138 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index ee17881d2a..9c47cd38ee 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,21 @@
+2024-04-24  Collin Funk  <collin.fu...@gmail.com>
+
+	gnulib-tool.py: Reduce code duplication in filename transformations.
+	* pygnulib/GLFileTable.py (GLFileTable.__init__): Add missing type
+	checks for argument.
+	(GLFileTable.rewrite_filename): New function derived from removed
+	GLImport and GLTestDir functions. Accept a single filename instead of a
+	list.
+	(GLFileTable.set_old_files, GLFileTable.set_new_files): New functions to
+	mutate the GLFileTable using the GLFileTable.rewrite_filename function.
+	* pygnulib/GLImport.py (GLImport.prepare): Use the new GLFileTable
+	functions.
+	(GLImport.rewrite_new_files, GLImport.rewrite_old_files): Remove
+	functions.
+	* pygnulib/GLTestDir.py (GLImport.execute): Use the new GLFileTable
+	functions.
+	(GLImport.rewrite_files): Remove function.
+
 2024-04-24  Collin Funk  <collin.fu...@gmail.com>
 
 	gnulib-tool.py: Add a new GLFileTable class.
diff --git a/pygnulib/GLFileTable.py b/pygnulib/GLFileTable.py
index 6180e0d318..71442c1e5b 100644
--- a/pygnulib/GLFileTable.py
+++ b/pygnulib/GLFileTable.py
@@ -15,13 +15,21 @@
 
 from __future__ import annotations
 
+import os.path
+from .constants import substart
+from .GLConfig import GLConfig
+
 
 class GLFileTable:
     '''The GLFileTable class stores file information for the duration of the
     operation being executed.'''
 
     all_files: list[str]
+    # old_files is a table with two columns: (rewritten-file-name original-file-name),
+    # representing the files according to the last gnulib-tool invocation.
     old_files: list[tuple[str, str]]
+    # new_files is a table with two columns: (rewritten-file-name original-file-name),
+    # representing the files after this gnulib-tool invocation.
     new_files: list[tuple[str, str]]
     added_files: list[str]
     removed_files: list[str]
@@ -29,8 +37,69 @@ class GLFileTable:
     def __init__(self, all_files: list[str]) -> None:
         '''Create a GLFileTable with initialized fields.
         - all_files, a list of all files being operated on.'''
+        if type(all_files) is not list:
+            raise TypeError(f'all_files must be a list, not {type(all_files).__name__}')
+        for filename in all_files:
+            if type(filename) is not str:
+                raise TypeError(f'Each filename in all_files must be a str, not {type(filename).__name__}')
         self.all_files = all_files
         self.old_files = []
         self.new_files = []
         self.added_files = []
         self.removed_files = []
+
+    def rewrite_filename(self, config: GLConfig, filename: str) -> str:
+        '''Replace auxdir, docbase, sourcebase, m4base and testsbase from
+        default to their version from config.
+        - config, The GLConfig object with replacement directories,
+        - filename, The filename to modify.'''
+        if type(filename) is not str:
+            raise TypeError(f'filename must be a str, not {type(filename).__name__}')
+        auxdir = config['auxdir']
+        docbase = config['docbase']
+        sourcebase = config['sourcebase']
+        m4base = config['m4base']
+        testsbase = config['testsbase']
+        if filename.startswith('build-aux/'):
+            result = substart('build-aux/', '%s/' % auxdir, filename)
+        elif filename.startswith('doc/'):
+            result = substart('doc/', '%s/' % docbase, filename)
+        elif filename.startswith('lib/'):
+            result = substart('lib/', '%s/' % sourcebase, filename)
+        elif filename.startswith('m4/'):
+            result = substart('m4/', '%s/' % m4base, filename)
+        elif filename.startswith('tests/'):
+            result = substart('tests/', '%s/' % testsbase, filename)
+        elif filename.startswith('tests=lib/'):
+            result = substart('tests=lib/', '%s/' % testsbase, filename)
+        elif filename.startswith('top/'):
+            result = substart('top/', '', filename)
+        else:  # filename is not a special file
+            result = filename
+        return os.path.normpath(result)
+
+    def set_old_files(self, cache: GLConfig, filenames: list[str]) -> None:
+        '''Setup the old_files in the file table.
+        - cache, the GLConfig with directories read from gnulib-cache.m4,
+        - filenames, a list of file names read from gnulib-cache.m4.'''
+        if type(filenames) is not list:
+            raise TypeError(f'filenames must be a list, not {type(filenames).__name__}')
+        for filename in filenames:
+            if type(filename) is not str:
+                raise TypeError(f'Each filename in filenames must be a str, not {type(filename).__name__}')
+        old_files = { (self.rewrite_filename(cache, filename), filename)
+                      for filename in filenames }
+        self.old_files = sorted(set(old_files), key=lambda pair: pair[0])
+
+    def set_new_files(self, config: GLConfig, filenames: list[str]) -> None:
+        '''Setup the new_files in the file table.
+        - config, the GLConfig with directories passed from the command-line,
+        - filenames, a list of file names added after this gnulib-tool invocation.'''
+        if type(filenames) is not list:
+            raise TypeError(f'filenames must be a list, not {type(filenames).__name__}')
+        for filename in filenames:
+            if type(filename) is not str:
+                raise TypeError(f'Each filename in filenames must be a str, not {type(filename).__name__}')
+        new_files = { (self.rewrite_filename(config, filename), filename)
+                      for filename in filenames }
+        self.new_files = sorted(set(new_files), key=lambda pair: pair[0])
diff --git a/pygnulib/GLImport.py b/pygnulib/GLImport.py
index cf1ebd10ec..7582a6cd0a 100644
--- a/pygnulib/GLImport.py
+++ b/pygnulib/GLImport.py
@@ -303,80 +303,6 @@ def __repr__(self) -> str:
         result = '<pygnulib.GLImport %s>' % hex(id(self))
         return result
 
-    def rewrite_old_files(self, files: list[str]) -> list[str]:
-        '''Replace auxdir, docbase, sourcebase, m4base and testsbase from default
-        to their version from cached config.'''
-        if type(files) is not list:
-            raise TypeError('files argument must has list type, not %s'
-                            % type(files).__name__)
-        for file in files:
-            if type(file) is not str:
-                raise TypeError('each file must be a string instance')
-        files = sorted(set(files))
-        files = [ '%s%s' % (file, os.path.sep)
-                  for file in files ]
-        auxdir = self.cache['auxdir']
-        docbase = self.cache['docbase']
-        sourcebase = self.cache['sourcebase']
-        m4base = self.cache['m4base']
-        testsbase = self.cache['testsbase']
-        result = []
-        for file in files:
-            if file.startswith('build-aux/'):
-                path = substart('build-aux/', '%s/' % auxdir, file)
-            elif file.startswith('doc/'):
-                path = substart('doc/', '%s/' % docbase, file)
-            elif file.startswith('lib/'):
-                path = substart('lib/', '%s/' % sourcebase, file)
-            elif file.startswith('m4/'):
-                path = substart('m4/', '%s/' % m4base, file)
-            elif file.startswith('tests/'):
-                path = substart('tests/', '%s/' % testsbase, file)
-            elif file.startswith('tests=lib/'):
-                path = substart('tests=lib/', '%s/' % testsbase, file)
-            elif file.startswith('top/'):
-                path = substart('top/', '', file)
-            else:  # file is not a special file
-                path = file
-            result.append(os.path.normpath(path))
-        return sorted(set(result))
-
-    def rewrite_new_files(self, files: list[str]) -> list[str]:
-        '''Replace auxdir, docbase, sourcebase, m4base and testsbase from
-        default to their version from config.'''
-        if type(files) is not list:
-            raise TypeError('files argument must has list type, not %s'
-                            % type(files).__name__)
-        for file in files:
-            if type(file) is not str:
-                raise TypeError('each file must be a string instance')
-        files = sorted(set(files))
-        auxdir = self.config['auxdir']
-        docbase = self.config['docbase']
-        sourcebase = self.config['sourcebase']
-        m4base = self.config['m4base']
-        testsbase = self.config['testsbase']
-        result = []
-        for file in files:
-            if file.startswith('build-aux/'):
-                path = substart('build-aux/', '%s/' % auxdir, file)
-            elif file.startswith('doc/'):
-                path = substart('doc/', '%s/' % docbase, file)
-            elif file.startswith('lib/'):
-                path = substart('lib/', '%s/' % sourcebase, file)
-            elif file.startswith('m4/'):
-                path = substart('m4/', '%s/' % m4base, file)
-            elif file.startswith('tests/'):
-                path = substart('tests/', '%s/' % testsbase, file)
-            elif file.startswith('tests=lib/'):
-                path = substart('tests=lib/', '%s/' % testsbase, file)
-            elif file.startswith('top/'):
-                path = substart('top/', '', file)
-            else:  # file is not a special file
-                path = file
-            result.append(os.path.normpath(path))
-        return sorted(set(result))
-
     def actioncmd(self) -> str:
         '''Return command-line invocation comment.'''
         modules = self.config.getModules()
@@ -943,31 +869,17 @@ def prepare(self) -> tuple[GLFileTable, dict[str, tuple[re.Pattern, str] | None]
         # old_files is the list of files according to the last gnulib-tool invocation.
         # new_files is the list of files after this gnulib-tool invocation.
 
-        # Construct tables and transformers.
+        # Construct the transformers table.
         transformers = dict()
         transformers['lib'] = sed_transform_lib_file
         transformers['aux'] = sed_transform_build_aux_file
         transformers['main'] = sed_transform_main_lib_file
         transformers['tests'] = sed_transform_testsrelated_lib_file
-        old_table = []
-        new_table = []
-        for src in old_files:
-            dest = self.rewrite_old_files([src])[-1]
-            old_table.append(tuple([dest, src]))
-        for src in new_files:
-            dest = self.rewrite_new_files([src])[-1]
-            new_table.append(tuple([dest, src]))
-        old_table = sorted(set(old_table))
-        new_table = sorted(set(new_table))
-        # old_table is a table with two columns: (rewritten-file-name original-file-name),
-        # representing the files according to the last gnulib-tool invocation.
-        # new_table is a table with two columns: (rewritten-file-name original-file-name),
-        # representing the files after this gnulib-tool invocation.
-
-        # Prepare the filetable.
+
+        # Construct the file table.
         filetable = GLFileTable(sorted(set(filelist)))
-        filetable.old_files = sorted(set(old_table), key=lambda pair: pair[0])
-        filetable.new_files = sorted(set(new_table), key=lambda pair: pair[0])
+        filetable.set_old_files(self.cache, old_files)
+        filetable.set_new_files(self.config, new_files)
 
         # Return the result.
         result = tuple([filetable, transformers])
diff --git a/pygnulib/GLTestDir.py b/pygnulib/GLTestDir.py
index 0ac2f32f69..632ab3dbeb 100644
--- a/pygnulib/GLTestDir.py
+++ b/pygnulib/GLTestDir.py
@@ -122,42 +122,6 @@ def __init__(self, config: GLConfig, testdir: str) -> None:
         self.config.resetWitnessCMacro()
         self.config.resetVCFiles()
 
-    def rewrite_files(self, files: list[str]) -> list[str]:
-        '''Replace auxdir, docbase, sourcebase, m4base and testsbase from
-        default to their version from config.'''
-        if type(files) is not list:
-            raise TypeError('files argument must have list type, not %s'
-                            % type(files).__name__)
-        for file in files:
-            if type(file) is not str:
-                raise TypeError('each file must be a string instance')
-        files = sorted(set(files))
-        auxdir = self.config['auxdir']
-        docbase = self.config['docbase']
-        sourcebase = self.config['sourcebase']
-        m4base = self.config['m4base']
-        testsbase = self.config['testsbase']
-        result = []
-        for file in files:
-            if file.startswith('build-aux/'):
-                path = substart('build-aux/', '%s/' % auxdir, file)
-            elif file.startswith('doc/'):
-                path = substart('doc/', '%s/' % docbase, file)
-            elif file.startswith('lib/'):
-                path = substart('lib/', '%s/' % sourcebase, file)
-            elif file.startswith('m4/'):
-                path = substart('m4/', '%s/' % m4base, file)
-            elif file.startswith('tests/'):
-                path = substart('tests/', '%s/' % testsbase, file)
-            elif file.startswith('tests=lib/'):
-                path = substart('tests=lib/', '%s/' % testsbase, file)
-            elif file.startswith('top/'):
-                path = substart('top/', '', file)
-            else:  # file is not a special file
-                path = file
-            result.append(os.path.normpath(path))
-        return sorted(set(result))
-
     def execute(self) -> None:
         '''Create a scratch package with the given modules.'''
         auxdir = self.config['auxdir']
@@ -341,22 +305,17 @@ def execute(self) -> None:
 
         # Setup the file table.
         filetable = GLFileTable(filelist)
+        filetable.set_new_files(self.config, filelist)
 
         # Create directories.
-        directories = [ joinpath(self.testdir, os.path.dirname(file))
-                        for file in self.rewrite_files(filetable.all_files) ]
-        directories = sorted(set(directories))
+        directories = sorted({ joinpath(self.testdir, os.path.dirname(pair[0]))
+                               for pair in filetable.new_files })
         for directory in directories:
             if not os.path.isdir(directory):
                 os.makedirs(directory)
 
         # Copy files or make symbolic links or hard links.
-        for src in filetable.all_files:
-            dest = self.rewrite_files([src])[-1]
-            filetable.new_files.append(tuple([dest, src]))
-        for row in filetable.new_files:
-            src = row[1]
-            dest = row[0]
+        for (dest, src) in filetable.new_files:
             destpath = joinpath(self.testdir, dest)
             if src.startswith('tests=lib/'):
                 src = substart('tests=lib/', 'lib/', src)
-- 
2.44.0

Reply via email to