On 5/21/24 5:32 AM, Bruno Haible wrote:
> * If the file ends with a non-empty line without a newline, getline()
>   returns a string that does not end in a newline.
>   Quoting 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/getline.html:
>    "including the delimiter character if one was encountered before EOF".

Oops, Thanks. I'm not sure why I was thinking that the line would
*always* end in the delimiter. Good catch.

> * The following code does not work with tokens that consist of a single
>   character:
> 
>           while (start < end && isspace ((unsigned char) *end))
> 
>           if (start < end && IS_ABSOLUTE_FILE_NAME (start))
> 
>   The condition "start < end" here tests whether the token has at least
>   two characters. Which is probably not what you intended. (In this case,
>   it is not fatal, since absolute file names of non-directories always
>   consist of at least 2 characters. But anyway.)

Actually, it was intentional. Perhaps I have a strange way of thinking
of things. That plus non-standard functions leave a lot of room for
"creativity". :)

I was thinking that "/" and "c:/" (on Windows) will never refer to an
actual shell. So no point in caring about it.

However with this in my /etc/shells:

================
/bin/bash
/
===============

When printing all shells glibc prints both lines. Mine only prints
"/bin/bash".

Strange function, in my opinion. However, lets trust the BSD and glibc
developers who wrote it over me. :) No need to change it's behavior.

>   The cause is that you are working with start and end both being
>   inclusive, that is, with a string of length end-start+1. There are 4
>   possible intervals for two variables start and length:
>     [start,end]
>     [start,end)
>     (start,end]
>     (start,end)
>   If you use sometimes [start,end], sometimes [start,end), you cannot
>   remember working idioms and you will regularly make mistakes.
>   The best way to avoid such mistakes is to work with [start,end)
>   intervals each time you work with pointers into arrays.
>   Then the condition "start < end" tests for a non-empty subsequence,
>   and you can remember and reuse idioms.

Interesting way to think about it, thanks. Do you have a strong math
background? It has been a while since I looked at that interval
notation.

Typically the way I think of things is using a "cursor", which I guess
is just a pointer to a position in a known data structure.

So in that patch we have something like this:

  start                    end
    ^                       ^
    |                       |
   [1] [2] [3] [4] [5] [6] [7] [8]
   '/' 'b' 'i' 'n' '/' 's' 'h' '\0'

Which would be [start,end] since *start and *end are part of the
string.

I've attached the patch which I think follows your suggestion of
[start, end):

  start                        end
    ^                           ^
    |                           |
   [1] [2] [3] [4] [5] [6] [7] [8]
   '/' 'b' 'i' 'n' '/' 's' 'h' '\0'

I guess that probably makes everything easier for others to follow. I
didn't think about it too much until you mentioned it.

Feel free to correct any misunderstandings I may have in that
explination + the patch.

Collin
From 45911704a56dcb2fcdfade0cfb9989d1df4a40e7 Mon Sep 17 00:00:00 2001
From: Collin Funk <collin.fu...@gmail.com>
Date: Tue, 21 May 2024 16:35:09 -0700
Subject: [PATCH] getusershell: Split file by lines instead of spaces.

* lib/getusershell.c: Include string.h and filename.h
(GNULIB_GETUSERSHELL_SINGLE_THREAD): Remove conditional to include
unlocked stdio functions that are no longer used.
(readname): Remove function.
(getusershell): Use getline and process the string instead of using
readname. Return the first absolute file name.
* modules/getusershell (Depends-on): Remove unlocked-io-internal.
Add getline and filename.
* doc/multithread.texi (Multithreading Optimizations): Don't mention
GNULIB_GETUSERSHELL_SINGLE_THREAD.
---
 ChangeLog            | 14 ++++++++
 doc/multithread.texi |  4 ---
 lib/getusershell.c   | 83 ++++++++++++++++++++++----------------------
 modules/getusershell |  3 +-
 4 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 099a249271..cb2bfbdb74 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,17 @@
+2024-05-21  Collin Funk  <collin.fu...@gmail.com>
+
+	getusershell: Split file by lines instead of spaces.
+	* lib/getusershell.c: Include string.h and filename.h
+	(GNULIB_GETUSERSHELL_SINGLE_THREAD): Remove conditional to include
+	unlocked stdio functions that are no longer used.
+	(readname): Remove function.
+	(getusershell): Use getline and process the string instead of using
+	readname. Return the first absolute file name.
+	* modules/getusershell (Depends-on): Remove unlocked-io-internal.
+	Add getline and filename.
+	* doc/multithread.texi (Multithreading Optimizations): Don't mention
+	GNULIB_GETUSERSHELL_SINGLE_THREAD.
+
 2024-05-21  Bruno Haible  <br...@clisp.org>
 
 	trim tests: Avoid test failure on Solaris 11 OmniOS.
diff --git a/doc/multithread.texi b/doc/multithread.texi
index 7d8126e8f3..424e1d2a75 100644
--- a/doc/multithread.texi
+++ b/doc/multithread.texi
@@ -293,10 +293,6 @@ @node Multithreading Optimizations
 You can get this macro defined by including the Gnulib module
 @code{wchar-single}.
 @item
-You may define the C macro @code{GNULIB_GETUSERSHELL_SINGLE_THREAD}, if all the
-programs in your package invoke the functions @code{setusershell},
-@code{getusershell}, @code{endusershell} only from a single thread.
-@item
 You may define the C macro @code{GNULIB_EXCLUDE_SINGLE_THREAD}, if all the
 programs in your package invoke the functions of the @code{exclude} module
 only from a single thread.
diff --git a/lib/getusershell.c b/lib/getusershell.c
index 4da5bff37f..356b8023d5 100644
--- a/lib/getusershell.c
+++ b/lib/getusershell.c
@@ -34,17 +34,13 @@
 #endif
 
 #include <stdlib.h>
+#include <string.h>
 #include <ctype.h>
 
 #include "stdio--.h"
+#include "filename.h"
 #include "xalloc.h"
 
-#if GNULIB_GETUSERSHELL_SINGLE_THREAD
-# include "unlocked-io.h"
-#endif
-
-static idx_t readname (char **, idx_t *, FILE *);
-
 #if ! defined ADDITIONAL_DEFAULT_SHELLS && defined __MSDOS__
 # define ADDITIONAL_DEFAULT_SHELLS \
   "c:/dos/command.com", "c:/windows/command.com", "c:/command.com",
@@ -70,7 +66,7 @@ static FILE *shellstream = NULL;
 static char *line = NULL;
 
 /* Number of bytes allocated for 'line'. */
-static idx_t line_size = 0;
+static size_t line_size = 0;
 
 /* Return an entry from the shells file, ignoring comment lines.
    If the file doesn't exist, use the list in DEFAULT_SHELLS (above).
@@ -99,12 +95,46 @@ getusershell (void)
         }
     }
 
-  while (readname (&line, &line_size, shellstream))
+  for (;;)
     {
-      if (*line != '#')
-        return line;
+      ssize_t nread = getline (&line, &line_size, shellstream);
+
+      /* End of file.  */
+      if (nread == -1)
+        return NULL;
+      /* Skip empty lines. */
+      else if (nread > 1)
+        {
+          char *start = line;
+          char *comment = strchr (start, '#');
+          char *end;
+
+          if (comment != NULL)
+            {
+              /* Trim the comment mark.  */
+              comment = '\0';
+              end = comment;
+            }
+          else
+            {
+              /* Trim the newline.  */
+              end = start + nread;
+              if (end[-1] == '\n')
+                *--end = '\0';
+            }
+
+          /* Skip leading whitespace.  */
+          while (start < end && isspace ((unsigned char) start[0]))
+            start++;
+          /* Trim trailing whitespace.  */
+          while (start < end && isspace ((unsigned char) end[-1]))
+            *--end = '\0';
+
+          /* Only return absolute file names. */
+          if (start < end && IS_ABSOLUTE_FILE_NAME (start))
+            return start;
+        }
     }
-  return NULL;                  /* End of file. */
 }
 
 /* Rewind the shells file. */
@@ -129,37 +159,6 @@ endusershell (void)
     }
 }
 
-/* Read a line from STREAM, removing any newline at the end.
-   Place the result in *NAME, which is malloc'd
-   and/or realloc'd as necessary and can start out NULL,
-   and whose size is passed and returned in *SIZE.
-
-   Return the number of bytes placed in *NAME
-   if some nonempty sequence was found, otherwise 0.  */
-
-static idx_t
-readname (char **name, idx_t *size, FILE *stream)
-{
-  int c;
-  size_t name_index = 0;
-
-  /* Skip blank space.  */
-  while ((c = getc (stream)) != EOF && isspace (c))
-    /* Do nothing. */ ;
-
-  for (;;)
-    {
-      if (*size <= name_index)
-        *name = xpalloc (*name, size, 1, -1, sizeof **name);
-      if (c == EOF || isspace (c))
-        break;
-      (*name)[name_index++] = c;
-      c = getc (stream);
-    }
-  (*name)[name_index] = '\0';
-  return name_index;
-}
-
 #ifdef TEST
 int
 main (void)
diff --git a/modules/getusershell b/modules/getusershell
index 2f47c62e3d..a4a40852ac 100644
--- a/modules/getusershell
+++ b/modules/getusershell
@@ -9,7 +9,8 @@ Depends-on:
 unistd
 extensions
 fopen-safer          [test $HAVE_GETUSERSHELL = 0 || test $REPLACE_GETUSERSHELL = 1]
-unlocked-io-internal [test $HAVE_GETUSERSHELL = 0 || test $REPLACE_GETUSERSHELL = 1]
+getline              [test $HAVE_GETUSERSHELL = 0 || test $REPLACE_GETUSERSHELL = 1]
+filename             [test $HAVE_GETUSERSHELL = 0 || test $REPLACE_GETUSERSHELL = 1]
 xalloc               [test $HAVE_GETUSERSHELL = 0 || test $REPLACE_GETUSERSHELL = 1]
 
 configure.ac:
-- 
2.45.1

Reply via email to