gbranden pushed a commit to branch master
in repository groff.

commit 39c1176bfa9ba84d519b015b1453f316e013d1de
Author: G. Branden Robinson <[email protected]>
AuthorDate: Mon Jan 27 18:52:39 2025 -0600

    [troff]: Fix Savannah #66686 (`\w|foo|` blues).
    
    * src/roff/troff/input.cpp (is_char_usable_as_delimiter): Restore `|`
      character as an invalid delimiter when not in compatibility mode.
      This would regress the fix for Savannah #66481, but...
    
      (do_overstrike, do_bracket, do_name_test, do_zero_width_output)
      (read_size, do_register, do_width, do_device_extension)
      (read_drawing_command): Throw warning in "delimiter" category and
      explain ambiguity of delimiter instead of emitting error and refusing
      further interpretation of the escape sequence being parsed.  Leave
      behind "#if"ed code for restoration of former stricter behavior in a
      future groff release (which would fix Savannah #66009).
    
      (is_conditional_expression_true): Stop special-casing an exception for
      `|` that permitted it to be used as a formatted output comparison
      operator.  Savannah #66481 complained only about groff's rejection of
      `|` to delimit the argument to the `\w` (width measurement) escape
      sequence, not in general, and was seen in some man pages.  The usage
      Paul Eggert reported remains accepted, albeit warned about, per
      `do_width()` above.
    
    * src/roff/groff/tests/check-delimiter-validity.sh: Update test
      expectations.  We now expect `|` to be invalid once again to delimit a
      line-drawing escape sequence.
    
    Fixes <https://savannah.gnu.org/bugs/?66686>.  Thanks to Dave Kemper for
    the report.  Savannah #66526 is implicated.
    
    The uses of `\w|whatever|` cited by Eggert appear to be a chunk of
    boilerplate passed around by some man page authors for GNU man pages.
    Eggert's "tz" package doesn't itself use `|` as a delimiter, but I did
    verify their use in gawk, grep, and rcs.
    
    gawk:
    .if !\n(.g \{\
    .       if !\w|\*(lq| \{\
    .               ds lq ``
    .               if \w'\(lq' .ds lq "\(lq
    .       \}
    .       if !\w|\*(rq| \{\
    .               ds rq ''
    .               if \w'\(rq' .ds rq "\(rq
    .       \}
    .\}
    
    grep:
    .if !\w|\*(lq| \{\
    .\" groff an-old.tmac does not seem to be in use, so define lq and rq.
    .       ie \n(.g \{\
    .               ds lq \(lq\"
    .               ds rq \(rq\"
    .       \}
    .       el \{\
    .               ds lq ``
    .               ds rq ''
    .       \}
    .\}
    
    rcs:
    .if !\n(.g \{\
    .       if !\w|\*(lq| \{\
    .               ds lq ``
    .               if \w'\(lq' .ds lq "\(lq
    .       \}
    .       if !\w|\*(rq| \{\
    .               ds rq ''
    .               if \w'\(rq' .ds rq "\(rq
    .       \}
    .\}
    
    One observes that only grep(1) doesn't feature-gate its use of `\w|foo|`
    behind a test of the `.g` register (groff-compatible formatter) for
    falsity.  It is therefore the only one of these three that would have
    provoked a warning.  (What the foregoing do is reimplement the `lq` and
    `rq` quotation strings, an extension from 4BSD (1980)--my guess is to
    compensate for System V-based troffs missing them).
    
    The rcs man pages otherwise use `\w` pretty extensively (which can be
    its own portability worry), but in every other case with the unambiguous
    `'` delimiter.
---
 ChangeLog                                        | 31 ++++++++++
 src/roff/groff/tests/check-delimiter-validity.sh |  4 +-
 src/roff/troff/input.cpp                         | 77 +++++++++++++++++++++---
 3 files changed, 100 insertions(+), 12 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index f8e3a1c49..f76bcfcc9 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,34 @@
+2025-01-28  G. Branden Robinson <[email protected]>
+
+       * src/roff/troff/input.cpp (is_char_usable_as_delimiter):
+       Restore `|` character as an invalid delimiter when not in
+       compatibility mode.  This would regress the fix for Savannah
+       #66481, but...
+
+       (do_overstrike, do_bracket, do_name_test, do_zero_width_output)
+       (read_size, do_register, do_width, do_device_extension)
+       (read_drawing_command): Throw warning in "delimiter" category
+       and explain ambiguity of delimiter instead of emitting error and
+       refusing further interpretation of the escape sequence being
+       parsed.  Leave behind "#if"ed code for restoration of former
+       stricter behavior in a future groff release (which would fix
+       Savannah #66009).
+
+       (is_conditional_expression_true): Stop special-casing an
+       exception for `|` that permitted it to be used as a formatted
+       output comparison operator.  Savannah #66481 complained only
+       about groff's rejection of `|` to delimit the argument to the
+       `\w` (width measurement) escape sequence, not in general, and
+       was seen in some man pages.  The usage Paul Eggert reported
+       remains accepted, albeit warned about, per `do_width()` above.
+
+       * src/roff/groff/tests/check-delimiter-validity.sh: Update test
+       expectations.  We now expect `|` to be invalid once again to
+       delimit a line-drawing escape sequence.
+
+       Fixes <https://savannah.gnu.org/bugs/?66686>.  Thanks to Dave
+       Kemper for the report.  Savannah #66526 is implicated.
+
 2025-01-27  G. Branden Robinson <[email protected]>
 
        * src/roff/troff/env.cpp (environment::choose_breakpoint):
diff --git a/src/roff/groff/tests/check-delimiter-validity.sh 
b/src/roff/groff/tests/check-delimiter-validity.sh
index 6ecddddbd..c2e107182 100755
--- a/src/roff/groff/tests/check-delimiter-validity.sh
+++ b/src/roff/groff/tests/check-delimiter-validity.sh
@@ -28,7 +28,7 @@ wail () {
 }
 
 for c in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z \
-         a b c d e f g h i j k l m n o p q r s t u v w x y z '|'
+         a b c d e f g h i j k l m n o p q r s t u v w x y z
 do
   echo "checking validity of '$c' as delimiter in normal mode" \
     >&2
@@ -38,7 +38,7 @@ do
   echo "$output" | grep -Fqx ___ || wail
 done
 
-for c in 0 1 2 3 4 5 6 7 8 9 + - / '*' % '<' '>' = '&' : '(' ')' .
+for c in 0 1 2 3 4 5 6 7 8 9 + - / '*' % '<' '>' = '&' : '(' ')' . '|'
 do
   echo "checking invalidity of '$c' as delimiter in normal mode" \
     >&2
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index ab50f1f54..9f82c2a84 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -1600,10 +1600,17 @@ node *do_overstrike() // \o
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
+  if (!start_token.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", start_token.description());
+  // TODO: groff 1.25?
+#if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */)) {
     delete osnode;
     return 0 /* nullptr */;
   }
+#endif
   for (;;) {
     tok.next();
     if (tok.is_newline() || tok.is_eof()) {
@@ -1645,10 +1652,17 @@ static node *do_bracket() // \b
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
+  if (!start_token.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", start_token.description());
+  // TODO: groff 1.25?
+#if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */)) {
     delete bracketnode;
     return 0 /* nullptr */;
   }
+#endif
   for (;;) {
     tok.next();
     if (tok.is_newline() || tok.is_eof()) {
@@ -1680,8 +1694,15 @@ static const char *do_name_test() // \A
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
+  if (!start_token.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", start_token.description());
+  // TODO: groff 1.25?
+#if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */))
     return 0 /* nullptr */;
+#endif
   bool got_bad_char = false;
   bool got_some_char = false;
   for (;;) {
@@ -1790,10 +1811,17 @@ static node *do_zero_width_output() // \Z
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
+  if (!start_token.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", start_token.description());
+  // TODO: groff 1.25?
+#if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */)) {
     delete rev;
     return 0 /* nullptr */;
   }
+#endif
   for (;;) {
     tok.next();
     if (tok.is_newline() || tok.is_eof()) {
@@ -2614,12 +2642,8 @@ static bool is_char_usable_as_delimiter(int c)
   case '(':
   case ')':
   case '.':
+  case '|':
     return false;
-  // TODO: In groff 1.25, style-warn on '|' and [A-Za-z].
-  // TODO: In groff 1.26, promote style warning to error with
-  // deprecation message.
-  // TODO: In groff 1.27, make '|' and letters return false.
-  // See Savannah #66481.
   default:
     return true;
   }
@@ -5561,8 +5585,15 @@ static bool read_size(int *x)
     }
     val *= sizescale;
   }
-  else if (!tok.is_usable_as_delimiter(true /* report error */))
+  else if (!tok.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", tok.description());
+    // TODO: groff 1.25?
+#if 0
+  else if (!to.is_usable_as_delimiter(true /* report error */))
     return false;
+#endif
   else {
     token start(tok);
     tok.next();
@@ -5693,8 +5724,15 @@ static void do_register() // \R
 {
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter(true /* report error */))
+  if (!start_token.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", start_token.description());
+  // TODO: groff 1.25?
+#if 0
+  if (!start_token.is_usable_as_delimiter(true /* report error */)) {
     return;
+#endif
   tok.next();
   symbol nm = get_long_name(true /* required */);
   if (nm.is_null())
@@ -5726,8 +5764,15 @@ static void do_width() // \w
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
+  if (!start_token.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", start_token.description());
+  // TODO: groff 1.25?
+#if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */))
     return;
+#endif
   environment env(curenv);
   environment *oldenv = curenv;
   curenv = &env;
@@ -6005,8 +6050,15 @@ static node *do_device_extension() // \X
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
+  if (!start_token.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", start_token.description());
+  // TODO: groff 1.25?
+#if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */))
     return 0 /* nullptr */;
+#endif
   macro mac;
   if ((curdiv == topdiv) && (topdiv->before_first_page_status > 0))
     topdiv->begin_page();
@@ -6457,9 +6509,7 @@ static bool is_conditional_expression_true()
   }
   else if (tok.is_space())
     result = false;
-  // Treat `|` specially for AT&T troff compatibility, where it _isn't_
-  // a delimiter in this context; see Savannah #66526.
-  else if (tok.is_usable_as_delimiter() && (tok.ch() != '|')) {
+  else if (tok.is_usable_as_delimiter()) {
     // Perform (formatted) output comparison.
     token delim = tok;
     int delim_level = input_stack::get_level();
@@ -9498,8 +9548,15 @@ static node *read_drawing_command()
 {
   token start_token;
   start_token.next();
+  if (!start_token.is_usable_as_delimiter())
+    warning(WARN_DELIM, "interpreting %1 as an escape sequence"
+           " delimiter; it is ambiguous because it is also meaningful"
+           " in a numeric expression", start_token.description());
+  // TODO: groff 1.25?
+#if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */))
     return 0 /* nullptr */;
+#endif
   else {
     tok.next();
     if (tok == start_token)

_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit

Reply via email to