gbranden pushed a commit to branch master
in repository groff.

commit 8b83fd3dd84e02a134d3c35619fd8d14a225039a
Author: G. Branden Robinson <[email protected]>
AuthorDate: Sun Nov 2 09:47:42 2025 -0600

    [troff]: Fix Savannah #67408.
    
    As I've observed before in the groff 1.24 development cycle, the
    delimiter situation is a mess.  AT&T troff actually has three different,
    context-dependent sets of delimiters; getting too adventurous, though,
    leads to core dumps, as with `\D0l 1 20' or `.if %0%0% .tm true`.  Last
    November, I attempted to streamline groff's delimiter handling, but
    inadvertently broke use of compatibility mode.  I still think it's
    valuable for GNU troff users to be able to rely upon a single set of
    delimiters when not exercising AT&T compatibility mode, so change the
    formatter to handle four cases of delimiter context: "groff" (not in
    compatibility mode), AT&T string expressions (as used in `\o` or `\b` or
    the `tl` request), AT&T numeric expressions (`\s`, the first argument to
    `\l` or `\L`), and output comparison operators.
    
    * src/roff/troff/token.h: Introduce new `delimiter_context` enumerated
      type.
    
      (class token): Add parameter to `is_usable_as_delimiter()` member
      function, defaulting its value to `DELIMITER_GROFF_EXPRESSION` since
      we expect that to be the most common case.
    
    * src/roff/troff/input.cpp (is_usable_as_delimiter): Implement the three
      special cases for AT&T troff compatibility.
    
      (do_overstrike)
      (do_bracket)
      (do_name_test)
      (do_expr_test)
      (do_zero_width_output)
      (get_line_arg)
      (read_size)
      (do_register)
      (do_width)
      (do_device_extension)
      (read_drawing_command)
      (read_delimited_number): Distinguish compatibility mode.  If in it and
      given invalid delimiter, throw warning in category "delim" and unwind
      the operation, behaving more like AT&T troff.  (Outside of
      compatibility mode, GNU troff uses the delimiter anyway, but warns of
      its ambiguity.)
    
      (is_conditional_expression_true): Refactor, introducing new Boolean
      local variable `perform_output_comparison`.  Throw syntax warning if
      `v` conditional operator used in compatibility mode, as no variant of
      AT&T troff available to me recognizes it.  Distinguish compatibility
      mode once the letteral conditional expression operators are ruled out.
    
    Fixes <https://savannah.gnu.org/bugs/?67408>.
---
 ChangeLog                |  51 ++++++++++
 src/roff/troff/input.cpp | 243 ++++++++++++++++++++++++++++++++++++++---------
 src/roff/troff/token.h   |  10 +-
 3 files changed, 258 insertions(+), 46 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 6985b7e34..522a08a27 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,54 @@
+2025-11-02  G. Branden Robinson <[email protected]>
+
+       [troff]: Fix Savannah #67408.  As I've observed before in the
+       groff 1.24 development cycle, the delimiter situation is a mess.
+       AT&T troff actually has three different, context-dependent sets
+       of delimiters; getting too adventurous, though, leads to core
+       dumps, as with `\D0l 1 20' or `.if %0%0% .tm true`.  Last
+       November, I attempted to streamline groff's delimiter handling,
+       but inadvertently broke compatibility mode.  I still think it's
+       valuable for GNU troff users to be able to rely upon a single
+       set of delimiters when not exercising AT&T compatibility mode,
+       so change the formatter to handle four cases of delimiter
+       context: "groff" (not in compatibility mode), AT&T string
+       expressions (as used in `\o` or `\b` or the `tl` request), AT&T
+       numeric expressions (`\s`, the first argument to `\l` or `\L`),
+       and output comparison operators.
+
+       * src/roff/troff/token.h: Introduce new `delimiter_context`
+       enumerated type.
+       (class token): Add parameter to `is_usable_as_delimiter()`
+       member function, defaulting its value to
+       `DELIMITER_GROFF_EXPRESSION` since we expect that to be the most
+       common case.
+
+       * src/roff/troff/input.cpp (is_usable_as_delimiter): Implement
+       the three special cases for AT&T troff compatibility.
+       (do_overstrike)
+       (do_bracket)
+       (do_name_test)
+       (do_expr_test)
+       (do_zero_width_output)
+       (get_line_arg)
+       (read_size)
+       (do_register)
+       (do_width)
+       (do_device_extension)
+       (read_drawing_command)
+       (read_delimited_number): Distinguish compatibility mode.  If in
+       it and given invalid delimiter, throw warning in category
+       "delim" and unwind the operation, behaving more like AT&T troff.
+       {Outside of compatibility mode, GNU troff uses the delimiter
+       anyway, but warns of its ambiguity.}
+       (is_conditional_expression_true): Refactor, introducing new
+       Boolean local variable `perform_output_comparison`.  Throw
+       syntax warning if `v` conditional operator used in compatibility
+       mode, as no variant of AT&T troff available to me recognizes it.
+       Distinguish compatibility mode once the letteral conditional
+       expression operators are ruled out.
+
+       Fixes <https://savannah.gnu.org/bugs/?67408>.
+
 2025-10-31  G. Branden Robinson <[email protected]>
 
        * src/roff/groff/tests/check-delimiter-validity.sh: Expand test
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 2167eaf9b..bafc62d1b 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -1625,10 +1625,19 @@ node *do_overstrike() // \o
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter())
+  if (!want_att_compat && !start_token.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it can also begin a"
            " numeric expression", start_token.description());
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "overstriking escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
+    delete osnode;
+    return 0 /* nullptr */;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */)) {
@@ -1677,10 +1686,19 @@ static node *do_bracket() // \b
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter())
+  if (!want_att_compat && !start_token.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it can also begin a"
            " numeric expression", start_token.description());
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "bracket-building escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
+    delete bracketnode;
+    return 0 /* nullptr */;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */)) {
@@ -1719,10 +1737,18 @@ static const char *do_name_test() // \A
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter())
+  if (!want_att_compat && !start_token.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it can also begin a"
            " numeric expression", start_token.description());
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "name test escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
+    return 0 /* nullptr */;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */))
@@ -1758,8 +1784,17 @@ static const char *do_expr_test() // \B
   token start_token;
   start_token.next();
   int start_level = input_stack::get_level();
-  if (!start_token.is_usable_as_delimiter(true /* report error */))
+  if (!want_att_compat
+      && !start_token.is_usable_as_delimiter(true /* report error */))
+    return 0 /* nullptr */;
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_NUMERIC_EXPRESSION)) {
+    warning(WARN_DELIM, "numeric expression test escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
     return 0 /* nullptr */;
+  }
   tok.next();
   // disable all warning and error messages temporarily
   unsigned int saved_warning_mask = warning_mask;
@@ -1836,10 +1871,18 @@ static node *do_zero_width_output() // \Z
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter())
+  if (!want_att_compat && !start_token.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it can also begin a"
            " numeric expression", start_token.description());
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "zero-width sequence escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
+    return 0 /* nullptr */;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */)) {
@@ -2676,12 +2719,39 @@ static bool is_char_usable_as_delimiter(int c)
 }
 
 // Is the token a valid delimiter (like `'`)?
-bool token::is_usable_as_delimiter(bool report_error)
+bool token::is_usable_as_delimiter(bool report_error,
+                                  enum delimiter_context context)
 {
   bool is_valid = false;
   switch (type) {
   case TOKEN_CHAR:
-    is_valid = is_char_usable_as_delimiter(c);
+    if (!want_att_compat)
+      is_valid = is_char_usable_as_delimiter(c);
+    else {
+      assert(context != DELIMITER_GROFF_EXPRESSION);
+      switch (context) {
+      case DELIMITER_ATT_STRING_EXPRESSION:
+       if (csgraph(c))
+         is_valid = true;
+       break;
+      case DELIMITER_ATT_NUMERIC_EXPRESSION:
+       if (csalnum(c) || ('.' == c) || ('|' == c))
+         is_valid = true;
+       break;
+      case DELIMITER_ATT_OUTPUT_COMPARISON_EXPRESSION:
+       if (csupper(c)
+           || (cslower(c)
+               && (c != 'e')
+               && (c != 'n')
+               && (c != 'o')
+               && (c != 't')))
+         is_valid = true;
+       break;
+      default:
+       assert(0 == "unhandled case of `context` (enum dcontext)");
+       break;
+      }
+    }
     if (!is_valid && report_error)
       error("character '%1' is not allowed as a delimiter",
            static_cast<char>(c));
@@ -5663,20 +5733,30 @@ static bool read_delimited_number(units *n,
 {
   token start_token;
   start_token.next();
-  if (start_token.is_usable_as_delimiter(true /* report error */)) {
-    tok.next();
-    if (read_measurement(n, si, prev_value)) {
-      if (start_token != tok) {
-       // token::description() writes to static, class-wide storage, so
-       // we must allocate a copy of it before issuing the next
-       // diagnostic.
-       char *delimdesc = strdup(start_token.description());
-       warning(WARN_DELIM, "closing delimiter does not match;"
-               " expected %1, got %2", delimdesc, tok.description());
-       free(delimdesc);
-      }
-      return true;
+  bool is_valid = false;
+  if (!want_att_compat && start_token.is_usable_as_delimiter())
+    is_valid = true;
+  else if (want_att_compat
+           && start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_NUMERIC_EXPRESSION))
+    is_valid = true;
+  if (!is_valid) {
+    warning(WARN_DELIM, "cannot use %1 to delimit a numeric expression",
+           start_token.description());
+    return false;
+  }
+  tok.next();
+  if (read_measurement(n, si, prev_value)) {
+    if (start_token != tok) {
+      // token::description() writes to static, class-wide storage, so
+      // we must allocate a copy of it before issuing the next
+      // diagnostic.
+      char *delimdesc = strdup(start_token.description());
+      warning(WARN_DELIM, "closing delimiter does not match;"
+             " expected %1, got %2", delimdesc, tok.description());
+      free(delimdesc);
     }
+    return true;
   }
   return false;
 }
@@ -5685,20 +5765,30 @@ static bool read_delimited_number(units *n, unsigned 
char si)
 {
   token start_token;
   start_token.next();
-  if (start_token.is_usable_as_delimiter(true /* report error */)) {
-    tok.next();
-    if (read_measurement(n, si)) {
-      if (start_token != tok) {
-       // token::description() writes to static, class-wide storage, so
-       // we must allocate a copy of it before issuing the next
-       // diagnostic.
-       char *delimdesc = strdup(start_token.description());
-       warning(WARN_DELIM, "closing delimiter does not match;"
-               " expected %1, got %2", delimdesc, tok.description());
-       free(delimdesc);
-      }
-      return true;
+  bool is_valid = false;
+  if (!want_att_compat && start_token.is_usable_as_delimiter())
+    is_valid = true;
+  else if (want_att_compat
+           && start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_NUMERIC_EXPRESSION))
+    is_valid = true;
+  if (!is_valid) {
+    warning(WARN_DELIM, "cannot use %1 to delimit a numeric expression",
+           start_token.description());
+    return false;
+  }
+  tok.next();
+  if (read_measurement(n, si)) {
+    if (start_token != tok) {
+      // token::description() writes to static, class-wide storage, so
+      // we must allocate a copy of it before issuing the next
+      // diagnostic.
+      char *delimdesc = strdup(start_token.description());
+      warning(WARN_DELIM, "closing delimiter does not match;"
+             " expected %1, got %2", delimdesc, tok.description());
+      free(delimdesc);
     }
+    return true;
   }
   return false;
 }
@@ -5709,8 +5799,17 @@ static bool get_line_arg(units *n, unsigned char si, 
charinfo **cp)
   token start_token;
   start_token.next();
   int start_level = input_stack::get_level();
-  if (!start_token.is_usable_as_delimiter(true /* report error */))
+  if (!want_att_compat
+      && !start_token.is_usable_as_delimiter(true /* report error */))
+    return false;
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(true,
+                 DELIMITER_ATT_NUMERIC_EXPRESSION)) {
+    warning(WARN_DELIM, "line-drawing escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
     return false;
+  }
   tok.next();
   if (read_measurement(n, si)) {
     if (tok.is_dummy() || tok.is_transparent_dummy())
@@ -5799,10 +5898,18 @@ static bool read_size(int *x) // \s
     }
     val *= sizescale;
   }
-  else if (!tok.is_usable_as_delimiter())
+  else if (!want_att_compat && !tok.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it can also begin a"
            " numeric expression", tok.description());
+  else if (want_att_compat
+           && !tok.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "type size escape sequence"
+           " does not accept %1 as a delimiter",
+           tok.description());
+    return false;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   else if (!tok.is_usable_as_delimiter(true /* report error */))
@@ -5936,10 +6043,18 @@ static void do_register() // \R
 {
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter())
+  if (!want_att_compat && !start_token.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it can also begin a"
            " numeric expression", start_token.description());
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "register assignment escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
+    return;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */)) {
@@ -5976,10 +6091,18 @@ static void do_width() // \w
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter())
+  if (!want_att_compat && !start_token.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it is also meaningful"
            " in a numeric expression", start_token.description());
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "width computation escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
+    return;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */))
@@ -6278,10 +6401,18 @@ static node *do_device_extension() // \X
   int start_level = input_stack::get_level();
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter())
+  if (!want_att_compat && !start_token.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it can also begin a"
            " numeric expression", start_token.description());
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "device extension command escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
+    return 0 /* nullptr */;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */))
@@ -6692,6 +6823,7 @@ static std::stack<bool> if_else_stack;
 
 static bool is_conditional_expression_true()
 {
+  bool perform_output_comparison = false;
   bool want_test_sense_inverted = false;
   while (tok.is_space())
     tok.next();
@@ -6709,9 +6841,11 @@ static bool is_conditional_expression_true()
     case 'd':
     case 'm':
     case 'r':
+    case 'v':
       warning(WARN_SYNTAX,
              "conditional operator '%1' used in compatibility mode",
              c);
+             // TODO: "; treating as output comparison delimiter", c);
       break;
     default:
       break;
@@ -6724,10 +6858,6 @@ static bool is_conditional_expression_true()
     tok.next();
     result = in_nroff_mode;
   }
-  else if (c == 'v') {
-    tok.next();
-    result = false;
-  }
   else if (c == 'o') {
     result = (topdiv->get_page_number() & 1);
     tok.next();
@@ -6736,6 +6866,8 @@ static bool is_conditional_expression_true()
     result = !(topdiv->get_page_number() & 1);
     tok.next();
   }
+  // TODO: else if (!want_att_compat) {
+  // Check for GNU troff extension conditional operators.
   else if ((c == 'd') || (c == 'r')) {
     tok.next();
     symbol nm = get_name(true /* required */);
@@ -6786,11 +6918,22 @@ static bool is_conditional_expression_true()
     }
     result = is_abstract_style(nm);
   }
+  // vtroff extension
+  else if (c == 'v') {
+    tok.next();
+    result = false;
+  }
   else if (tok.is_space())
     result = false;
-  else if (tok.is_usable_as_delimiter())
-    result = are_comparands_equal();
+  else if (!want_att_compat
+          && tok.is_usable_as_delimiter())
+    perform_output_comparison = true;
+  else if (want_att_compat
+          && tok.is_usable_as_delimiter(false /* report error */,
+                 DELIMITER_ATT_OUTPUT_COMPARISON_EXPRESSION))
+    perform_output_comparison = true;
   else {
+    // Evaluate numeric expression.
     units n;
     if (!read_measurement(&n, 'u')) {
       skip_branch();
@@ -6799,6 +6942,8 @@ static bool is_conditional_expression_true()
     else
       result = n > 0;
   }
+  if (perform_output_comparison)
+    result = are_comparands_equal();
   if (want_test_sense_inverted)
     result = !result;
   if (result)
@@ -9831,10 +9976,18 @@ static node *read_drawing_command() // \D
 {
   token start_token;
   start_token.next();
-  if (!start_token.is_usable_as_delimiter())
+  if (!want_att_compat && !start_token.is_usable_as_delimiter())
     warning(WARN_DELIM, "interpreting %1 as an escape sequence"
            " delimiter; it is ambiguous because it can also begin a"
            " numeric expression", start_token.description());
+  else if (want_att_compat
+           && !start_token.is_usable_as_delimiter(false,
+                 DELIMITER_ATT_STRING_EXPRESSION)) {
+    warning(WARN_DELIM, "drawing command escape sequence"
+           " does not accept %1 as a delimiter",
+           start_token.description());
+    return 0 /* nullptr */;
+  }
   // TODO: groff 1.24.0 release + 2 years?
 #if 0
   if (!start_token.is_usable_as_delimiter(true /* report error */))
diff --git a/src/roff/troff/token.h b/src/roff/troff/token.h
index 290d1387f..4119301b1 100644
--- a/src/roff/troff/token.h
+++ b/src/roff/troff/token.h
@@ -26,6 +26,13 @@ class charinfo;
 struct node;
 class vunits;
 
+enum delimiter_context {
+  DELIMITER_GROFF_EXPRESSION,
+  DELIMITER_ATT_STRING_EXPRESSION,
+  DELIMITER_ATT_NUMERIC_EXPRESSION,
+  DELIMITER_ATT_OUTPUT_COMPARISON_EXPRESSION
+};
+
 class token {
   symbol nm;
   node *nd;
@@ -86,7 +93,8 @@ public:
   bool is_tab();
   bool is_leader();
   bool is_backspace();
-  bool is_usable_as_delimiter(bool /* report_error */ = false);
+  bool is_usable_as_delimiter(bool /* report_error */ = false,
+    enum delimiter_context /* context */ = DELIMITER_GROFF_EXPRESSION);
   bool is_dummy();
   bool is_transparent_dummy();
   bool is_transparent();

_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit

Reply via email to