Hi Paul,

Trimming CC and adding Bernhard and [email protected] in case anyone
else wants to add input.

Paul Eggert <[email protected]> writes:

> Thanks; I installed the attached somewhat fancier patch into Gnulib.
>
> From efd5c380ff8062541d5fd98b050ecd3cb295917c Mon Sep 17 00:00:00 2001
> From: Paul Eggert <[email protected]>
> Date: Sun, 13 Apr 2025 18:01:08 -0700
> Subject: [PATCH] regex: match current Emacs behavior
>
> * config/srclist.txt: Comment out regex.h, since we now
> disagree with glibc.
> * lib/regex.h (RE_SYNTAX_EMACS):
> Match Emacs 21+ behavior, not Emacs 20-.
> * m4/regex.m4 (gl_REGEX): Check for this Emacs fix.
> ---
>  ChangeLog          | 9 +++++++++
>  config/srclist.txt | 2 +-
>  doc/regex.texi     | 3 ++-
>  lib/regex.h        | 8 ++++----
>  m4/regex.m4        | 6 +++++-
>  5 files changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/ChangeLog b/ChangeLog
> index 2ea548d13f..c20c151757 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,12 @@
> +2025-04-13  Paul Eggert  <[email protected]>
> +
> +     regex: match current Emacs behavior
> +     * config/srclist.txt: Comment out regex.h, since we now
> +     disagree with glibc.
> +     * lib/regex.h (RE_SYNTAX_EMACS):
> +     Match Emacs 21+ behavior, not Emacs 20-.
> +     * m4/regex.m4 (gl_REGEX): Check for this Emacs fix.
> +
>  2025-04-13  Bruno Haible  <[email protected]>
>  
>       getlogin_r tests: Avoid writing to a literal string.
> diff --git a/config/srclist.txt b/config/srclist.txt
> index 173f23edaf..62816dcf4a 100644
> --- a/config/srclist.txt
> +++ b/config/srclist.txt
> @@ -68,7 +68,7 @@ $LIBCSRC malloc/scratch_buffer_set_array_size.c     
> lib/malloc
>  #$LIBCSRC misc/sys/cdefs.h           lib
>  #$LIBCSRC posix/regcomp.c            lib
>  $LIBCSRC posix/regex.c                       lib
> -$LIBCSRC posix/regex.h                       lib
> +#$LIBCSRC posix/regex.h                      lib
>  #$LIBCSRC posix/regex_internal.c     lib
>  #$LIBCSRC posix/regex_internal.h     lib
>  #$LIBCSRC posix/regexec.c            lib
> diff --git a/doc/regex.texi b/doc/regex.texi
> index cba1e13520..925b0db639 100644
> --- a/doc/regex.texi
> +++ b/doc/regex.texi
> @@ -316,7 +316,8 @@ regular expressions.
>  The predefined syntaxes---taken directly from @file{regex.h}---are:
>  
>  @smallexample
> -#define RE_SYNTAX_EMACS 0
> +# define RE_SYNTAX_EMACS                                                \
> +  (RE_CHAR_CLASSES | RE_INTERVALS)
>  
>  #define RE_SYNTAX_AWK                                                   \
>    (RE_BACKSLASH_ESCAPE_IN_LISTS | RE_DOT_NOT_NULL                       \
> diff --git a/lib/regex.h b/lib/regex.h
> index 67a3aa70a5..c4c6089a8c 100644
> --- a/lib/regex.h
> +++ b/lib/regex.h
> @@ -66,9 +66,8 @@ typedef unsigned long int active_reg_t;
>  
>  /* The following bits are used to determine the regexp syntax we
>     recognize.  The set/not-set meanings are chosen so that Emacs syntax
> -   remains the value 0.  The bits are given in alphabetical order, and
> -   the definitions shifted by one from the previous bit; thus, when we
> -   add or remove a bit, only one other definition need change.  */
> +   is the value 0 for Emacs 20 (2000) and earlier, and the value
> +   RE_SYNTAX_EMACS for Emacs 21 (2001) and later.  */
>  typedef unsigned long int reg_syntax_t;
>  
>  #ifdef __USE_GNU
> @@ -215,7 +214,8 @@ extern reg_syntax_t re_syntax_options;
>     (The [[[ comments delimit what gets put into the Texinfo file, so
>     don't delete them!)  */
>  /* [[[begin syntaxes]]] */
> -# define RE_SYNTAX_EMACS 0
> +# define RE_SYNTAX_EMACS                                             \
> +  (RE_CHAR_CLASSES | RE_INTERVALS)
>  
>  # define RE_SYNTAX_AWK                                                       
> \
>    (RE_BACKSLASH_ESCAPE_IN_LISTS   | RE_DOT_NOT_NULL                  \
> diff --git a/m4/regex.m4 b/m4/regex.m4
> index 80dfb8e1e5..1b2012fe00 100644
> --- a/m4/regex.m4
> +++ b/m4/regex.m4
> @@ -1,5 +1,5 @@
>  # regex.m4
> -# serial 78
> +# serial 79
>  dnl Copyright (C) 1996-2001, 2003-2025 Free Software Foundation, Inc.
>  dnl This file is free software; the Free Software Foundation
>  dnl gives unlimited permission to copy and/or distribute it,
> @@ -53,6 +53,10 @@ AC_DEFUN([gl_REGEX],
>              /* Exit with distinguishable exit code.  */
>              static void sigabrt_no_core (int sig) { raise (SIGTERM); }
>              #endif
> +
> +            #if RE_SYNTAX_EMACS != (RE_CHAR_CLASSES | RE_INTERVALS)
> +            # error "RE_SYNTAX_EMACS does not match Emacs behavior"
> +            #endif
>            ]],
>            [[int result = 0;
>              static struct re_pattern_buffer regex;

Bernhard, I built findutils with GNULIB_SRCDIR set to my local
clone. This uses the latest Gnulib commit instead of the one specified
by the submodule.

This patch causes the following 'make check' fail in findutils:

    ./../doc/regexprops.texi /tmp/check-regexprops.wUz52k differ: char 1649, 
line 45
    ./../doc/regexprops.texi is out of date.
    Updated output is saved in regexprops.texi.new
    FAIL check-regexprops (exit status: 1)

Upon further inspection, one can see that it fails because
lib/regexprops.c produces output differently than what is in the
repository. In lib/regextype.c we have:

    static struct tagRegexTypeMap regex_map[] =
      {
      [...]
       { "emacs",                 CONTEXT_ALL,       RE_SYNTAX_EMACS },
      [...]
      };

But with this patch RE_SYNTAX_EMACS is changed. A diff of the generated
documentation confirms this.

What is the proper way to fix this? My thinking is to first update the
findutils uses and copy the regexprops.texi.new to regexprops.texi,
since the new value of RE_SYNTAX_EMACS is more correct based on this
thread. This file will also have to be copied to Gnulib's
doc/regexprops-generic.texi, if I understand correctly.

What do you think?

Collin

Reply via email to