Hi!

The following patch attempts to implement the C++23 P2246R1
Character encoding of diagnostic text paper.
Initially I thought there is nothing to do, but this patch shows
that there is (and I wonder if we shouldn't backport it to release
branches).  Though the patch is on top of the cpp_translate_string
libcpp addition from the reflection patchset (though, that is
quite small change that could be backported too).

We have various different encodings in play in GCC.
There is -finput-charset= defaulting to SOURCE_CHARSET, which is
almost always UTF-8 (but in theory could be UTF-EBCDIC if that really
works).  libcpp converts source from the input charset to SOURCE_CHARSET
initially.  And then we have -fexec-charset=, again defaulting to
SOURCE_CHARSET, -fwide-exec-charset=, then UTF-8, UTF-16 and UTF-32
for u8, u and U string literals and constants and finally user uses
some character set in the terminal in which gcc is running.

Now, I think we mostly just emit diagnostics in SOURCE_CHARSET,
there is identifier_to_locale function which uses UCNs if LC_CTYPE
CODESET is not UTF-8-ish, but I think we don't use it all the time.
Even then, there is really no support for outputing from SOURCE_CHARSET
UTF-8 to non-ASCII compatible terminal charsets.
So for now let's pretend that we are emitting diagnostics to UTF-8
capable terminal.

When reporting errors about identifiers in the source (which are in
SOURCE_CHARSET), we just emit those.  The paper talks about
deprecated & nodiscard attribute msgs, static_assert, #error (and for
C++26 it would talk about #warning, delete (reason) and static_assert
with constexpr user messages).  #error/#warning works fine on UTF-8
terminals, delete (reason) too (we don't translate the string literal
from SOURCE_CHARSET to exec-charset in that case), static_assert
with a string literal too (again, notranslate), __attribute__ form
of deprecated attribute too (again, !parser->translate_strings_p).
What doesn't work properly are C++11 attributes (standard or gnu::),
we do translate those to exec charset, except for C++26
standard deprecated/nodiscard (which aren't translated).  And static_assert
with user messages doesn't work, those really have to be in exec-charset
because we have no control on how user constructs the messages during
constexpr evaluation.

So, this patch for C++11 attributes if they have the first argument
of a CPP_STRING temporarily disables translation of that string, which
fixes [[gnu::deprecated ("foo")]], [[gnu::unavailable ("foo")]]
and for C++ < 26 also [[deprecated ("foo")]] and [[nodiscard ("foo")]].
And another change is convert back from exec-charset to SOURCE_CHARSET
the custom user static_assert messages (and also inline asm strings).
For diagnostics without this patch worst case we show garbage, but
for inline asm we actually then fail to assemble stuff when users
use the constexpr created string views with non-ASCII exec charsets.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-11-15  Jakub Jelinek  <[email protected]>

        PR c++/102613
        * parser.cc: Implement C++23 P2246R1 - Character encoding of
        diagnostic text.
        (cp_parser_parenthesized_expression_list): For std attribute
        argument where the first argument is CPP_STRING, ensure the
        string is not translated.
        * semantics.cc: Include c-family/c-pragma.h.
        (cexpr_str::extract): Use cpp_translate_string to translate
        string from ordinary literal encoding to SOURCE_CHARSET.

        * g++.dg/cpp1z/constexpr-asm-6.C: New test.
        * g++.dg/cpp23/charset2.C: New test.
        * g++.dg/cpp23/charset3.C: New test.
        * g++.dg/cpp23/charset4.C: New test.
        * g++.dg/cpp23/charset5.C: New test.

--- gcc/cp/parser.cc.jj 2025-11-15 11:57:34.059327587 +0100
+++ gcc/cp/parser.cc    2025-11-15 15:40:21.993487924 +0100
@@ -9214,6 +9214,17 @@ cp_parser_parenthesized_expression_list
                expression_list->quick_push (arg);
            goto get_comma;
          }
+       else if (is_attribute_list == normal_attr
+                && cp_lexer_next_token_is (parser->lexer, CPP_STRING)
+                && (cp_lexer_nth_token_is (parser->lexer, 2, CPP_COMMA)
+                    || cp_lexer_nth_token_is (parser->lexer, 2, 
CPP_CLOSE_PAREN)))
+         {
+           auto t = make_temp_override (parser->translate_strings_p, false);
+           expr
+             = cp_parser_parenthesized_expression_list_elt (parser, cast_p,
+                                                            allow_expansion_p,
+                                                            non_constant_p);
+         }
        else
          expr
            = cp_parser_parenthesized_expression_list_elt (parser, cast_p,
--- gcc/cp/semantics.cc.jj      2025-11-14 11:00:14.368583778 +0100
+++ gcc/cp/semantics.cc 2025-11-15 14:23:30.124110011 +0100
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.
 #include "memmodel.h"
 #include "gimplify.h"
 #include "contracts.h"
+#include "c-family/c-pragma.h"
 
 /* There routines provide a modular interface to perform many parsing
    operations.  They may therefore be used during actual parsing, or
@@ -12645,6 +12646,24 @@ cexpr_str::extract (location_t location,
              return false;
            }
        }
+      /* Convert the string from execution charset to SOURCE_CHARSET.  */
+      cpp_string istr, ostr;
+      istr.len = len;
+      istr.text = (const unsigned char *) msg;
+      if (!cpp_translate_string (parse_in, &istr, &ostr, CPP_STRING, true))
+       {
+         error_at (location, "could not convert constexpr string from "
+                             "ordinary literal encoding to source character "
+                             "set");
+         return false;
+       }
+      else
+       {
+         if (buf)
+           XDELETEVEC (buf);
+         msg = buf = const_cast <char *> ((const char *) ostr.text);
+         len = ostr.len;
+       }
     }
   else
     {
--- gcc/testsuite/g++.dg/cpp1z/constexpr-asm-6.C.jj     2025-11-15 
15:57:07.150186175 +0100
+++ gcc/testsuite/g++.dg/cpp1z/constexpr-asm-6.C        2025-11-15 
15:58:06.754337405 +0100
@@ -0,0 +1,34 @@
+/* { dg-do compile { target c++17 } } */
+/* { dg-skip-if "requires hosted libstdc++ for string" { ! hostedlib } } */
+// { dg-require-iconv "IBM1047" }
+// { dg-options "-fexec-charset=IBM1047" }
+
+#include <string>
+
+constexpr std::string_view genfoo ()
+{
+  return "foo %1,%0";
+}
+
+constexpr std::string_view genoutput ()
+{
+  return "=r";
+}
+
+constexpr std::string_view geninput ()
+{
+  return "r";
+}
+
+constexpr std::string_view genclobber ()
+{
+  return "memory";
+}
+
+void f()
+{
+  int a;
+  asm((genfoo ()) : (genoutput ()) (a) : (geninput ()) (1) : (genclobber ()));
+}
+
+/* { dg-final { scan-assembler "foo" } } */
--- gcc/testsuite/g++.dg/cpp23/charset2.C.jj    2025-11-15 13:08:13.117197980 
+0100
+++ gcc/testsuite/g++.dg/cpp23/charset2.C       2025-11-15 15:18:11.934390647 
+0100
@@ -0,0 +1,36 @@
+// P2246R1
+// { dg-do compile { target c++23 } }
+// { dg-require-iconv "IBM1047" }
+// { dg-options "-pedantic-errors -fexec-charset=IBM1047" }
+
+[[deprecated ("foo")]] int d;          // { dg-message "declared here" }
+int e = d;                             // { dg-warning "'d' is deprecated: 
foo" }
+static_assert (false, "bar");          // { dg-error "static assertion failed: 
bar" }
+#error "baz"                           // { dg-error "#error \"baz\"" }
+[[nodiscard ("qux")]] int foo ();      // { dg-message "declared here" }
+void
+bar ()
+{
+  foo ();                              // { dg-warning "ignoring return value 
of 'int foo\\\(\\\)', declared with attribute 'nodiscard': 'qux'" }
+}
+#if __cplusplus > 202302L
+#warning "fred"                                // { dg-warning "#warning 
\"fred\"" "" { target c++26 } }
+#endif
+#if __cpp_static_assert >= 202306L
+struct A { constexpr int size () const { return 5; }
+           constexpr const char *data () const { return "xyzzy"; } };
+static_assert (false, A {});           // { dg-error "static assertion failed: 
xyzzy" "" { target c++26 } }
+#endif
+#if __cpp_deleted_function >= 202403L
+int baz () = delete ("garply");                // { dg-message "declared here" 
"" { target c++26 } }
+void
+plugh ()
+{
+  baz ();                              // { dg-error "use of deleted function 
'int baz\\\(\\\)': garply" "" { target c++26 } }
+}
+#endif
+namespace [[deprecated ("corge")]] ND  // { dg-message "declared here" }
+{
+  int i;
+};
+int j = ND::i;                         // { dg-warning "'ND' is deprecated: 
corge" }
--- gcc/testsuite/g++.dg/cpp23/charset3.C.jj    2025-11-15 15:14:07.942858892 
+0100
+++ gcc/testsuite/g++.dg/cpp23/charset3.C       2025-11-15 15:20:02.961812431 
+0100
@@ -0,0 +1,24 @@
+// P2246R1
+// { dg-do compile { target c++11 } }
+// { dg-require-iconv "IBM1047" }
+// { dg-options "-fexec-charset=IBM1047" }
+
+[[gnu::deprecated ("foo")]] int d;     // { dg-message "declared here" }
+int e = d;                             // { dg-warning "'d' is deprecated: 
foo" }
+[[gnu::unavailable ("bar")]] int f;    // { dg-message "declared here" }
+int g = f;                             // { dg-error "'f' is unavailable: bar" 
}
+__attribute__((deprecated ("baz"))) int h; // { dg-message "declared here" }
+int i = h;                             // { dg-warning "'h' is deprecated: 
baz" }
+__attribute__((unavailable ("qux"))) int j;    // { dg-message "declared here" 
}
+int k = j;                             // { dg-error "'j' is unavailable: qux" 
}
+#warning "fred"                                // { dg-warning "#warning 
\"fred\"" }
+namespace [[gnu::deprecated ("corge")]] ND // { dg-message "declared here" }
+{
+  int l;
+};
+int m = ND::l;                         // { dg-warning "'ND' is deprecated: 
corge" }
+namespace __attribute__((deprecated ("xyzzy"))) NE // { dg-message "declared 
here" }
+{
+  int l;
+};
+int n = NE::l;                         // { dg-warning "'NE' is deprecated: 
xyzzy" }
--- gcc/testsuite/g++.dg/cpp23/charset4.C.jj    2025-11-15 15:42:15.496875146 
+0100
+++ gcc/testsuite/g++.dg/cpp23/charset4.C       2025-11-15 15:50:32.221810008 
+0100
@@ -0,0 +1,36 @@
+// P2246R1
+// { dg-do compile { target c++23 } }
+// { dg-require-iconv "UTF-8" }
+// { dg-options "-pedantic-errors -fexec-charset=UTF-8" }
+
+[[deprecated ("áæ)")]] int d;          // { dg-message "declared here" }
+int e = d;                             // { dg-warning "'d' is deprecated: áæ" 
}
+static_assert (false, "áæ");           // { dg-error "static assertion failed: 
áæ" }
+#error "áæ"                            // { dg-error "#error \"áæ\"" }
+[[nodiscard ("áæ")]] int foo ();       // { dg-message "declared here" }
+void
+bar ()
+{
+  foo ();                              // { dg-warning "ignoring return value 
of 'int foo\\\(\\\)', declared with attribute 'nodiscard': 'áæ'" }
+}
+#if __cplusplus > 202302L
+#warning "áæ"                          // { dg-warning "#warning \"áæ\"" "" { 
target c++26 } }
+#endif
+#if __cpp_static_assert >= 202306L
+struct A { constexpr int size () const { return sizeof ("áæ") - 1; }
+           constexpr const char *data () const { return "áæ"; } };
+static_assert (false, A {});           // { dg-error "static assertion failed: 
áæ" "" { target c++26 } }
+#endif
+#if __cpp_deleted_function >= 202403L
+int baz () = delete ("áæ");            // { dg-message "declared here" "" { 
target c++26 } }
+void
+plugh ()
+{
+  baz ();                              // { dg-error "use of deleted function 
'int baz\\\(\\\)': áæ" "" { target c++26 } }
+}
+#endif
+namespace [[deprecated ("áæ")]] ND     // { dg-message "declared here" }
+{
+  int i;
+};
+int j = ND::i;                         // { dg-warning "'ND' is deprecated: 
áæ" }
--- gcc/testsuite/g++.dg/cpp23/charset5.C.jj    2025-11-15 15:52:31.409112764 
+0100
+++ gcc/testsuite/g++.dg/cpp23/charset5.C       2025-11-15 15:53:51.547971577 
+0100
@@ -0,0 +1,24 @@
+// P2246R1
+// { dg-do compile { target c++11 } }
+// { dg-require-iconv "UTF-8" }
+// { dg-options "-fexec-charset=UTF-8" }
+
+[[gnu::deprecated ("áæ")]] int d;      // { dg-message "declared here" }
+int e = d;                             // { dg-warning "'d' is deprecated: áæ" 
}
+[[gnu::unavailable ("áæ")]] int f;     // { dg-message "declared here" }
+int g = f;                             // { dg-error "'f' is unavailable: áæ" }
+__attribute__((deprecated ("áæ"))) int h; // { dg-message "declared here" }
+int i = h;                             // { dg-warning "'h' is deprecated: áæ" 
}
+__attribute__((unavailable ("áæ"))) int j;     // { dg-message "declared here" 
}
+int k = j;                             // { dg-error "'j' is unavailable: áæ" }
+#warning "áæ"                          // { dg-warning "#warning \"áæ\"" }
+namespace [[gnu::deprecated ("áæ")]] ND // { dg-message "declared here" }
+{
+  int l;
+};
+int m = ND::l;                         // { dg-warning "'ND' is deprecated: 
áæ" }
+namespace __attribute__((deprecated ("áæ"))) NE // { dg-message "declared 
here" }
+{
+  int l;
+};
+int n = NE::l;                         // { dg-warning "'NE' is deprecated: 
áæ" }

        Jakub

Reply via email to