Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-13 Thread Tom Honermann via Gcc-patches

On 6/11/21 12:53 PM, Jakub Jelinek wrote:

On Fri, Jun 11, 2021 at 12:20:48PM -0400, Tom Honermann wrote:

I'm open to whatever signaling mechanism would be preferred.  It took me a
while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I didn't
find much for other precedents.

I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option is
unusual with respect to other feature test macros.  Is that what you find to
be weird and inconsistent?

Predefining __SIZEOF_CHAR8_T__ would be consistent with __SIZEOF_WCHAR_T__,
but kind of strange too since the size is always 1.

Perhaps a better approach would be to follow the __CHAR16_TYPE__ and
__CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char.  That
is likewise a bit strange since the type would always be unsigned char, but
it does provide a bit more symmetry.  That could potentially have some use
as well; for C++, it could be defined as char8_t and thereby reflect the
difference between the two languages.  Perhaps it could be useful in the
future as well if WG14 were to add distinct char8_t, char16_t, and char32_t
types as C++ did (I'm not offering any prediction regarding the likelihood
of that happening).

C++ already predefines
#define __CHAR8_TYPE__ unsigned char
#define __CHAR16_TYPE__ short unsigned int
#define __CHAR32_TYPE__ unsigned int
for -std={c,gnu}++2{0,a,3,b} or -fchar8_t (unless -fno-char8_t), so I agree
just making sure __CHAR8_TYPE__ is defined to unsigned char even for C
is best.
And you probably don't need to do anything in the C patch for it,
void
c_stddef_cpp_builtins(void)
{
   builtin_define_with_value ("__SIZE_TYPE__", SIZE_TYPE, 0);
...
   if (flag_char8_t)
 builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0);
   builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0);
   builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0);
will do that.


Thank you; I had forgotten that I had already done that work.  I 
confirmed that the proposed changes result in __CHAR8_TYPE__ being 
defined (the tests included with the patch already enforced it).


Tom.



Jakub





Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 11, 2021 at 12:20:48PM -0400, Tom Honermann wrote:
> I'm open to whatever signaling mechanism would be preferred.  It took me a
> while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I didn't
> find much for other precedents.
> 
> I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option is
> unusual with respect to other feature test macros.  Is that what you find to
> be weird and inconsistent?
> 
> Predefining __SIZEOF_CHAR8_T__ would be consistent with __SIZEOF_WCHAR_T__,
> but kind of strange too since the size is always 1.
> 
> Perhaps a better approach would be to follow the __CHAR16_TYPE__ and
> __CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char.  That
> is likewise a bit strange since the type would always be unsigned char, but
> it does provide a bit more symmetry.  That could potentially have some use
> as well; for C++, it could be defined as char8_t and thereby reflect the
> difference between the two languages.  Perhaps it could be useful in the
> future as well if WG14 were to add distinct char8_t, char16_t, and char32_t
> types as C++ did (I'm not offering any prediction regarding the likelihood
> of that happening).

C++ already predefines
#define __CHAR8_TYPE__ unsigned char
#define __CHAR16_TYPE__ short unsigned int
#define __CHAR32_TYPE__ unsigned int
for -std={c,gnu}++2{0,a,3,b} or -fchar8_t (unless -fno-char8_t), so I agree
just making sure __CHAR8_TYPE__ is defined to unsigned char even for C
is best.
And you probably don't need to do anything in the C patch for it,
void
c_stddef_cpp_builtins(void)
{
  builtin_define_with_value ("__SIZE_TYPE__", SIZE_TYPE, 0);
...
  if (flag_char8_t)
builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0);
  builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0);
  builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0);
will do that.

Jakub



Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Tom Honermann via Gcc-patches

On 6/11/21 12:01 PM, Jakub Jelinek wrote:

On Fri, Jun 11, 2021 at 11:52:41AM -0400, Tom Honermann via Gcc-patches wrote:

On 6/7/21 5:11 PM, Joseph Myers wrote:

On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:


When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro
is predefined.  This is the mechanism proposed to glibc to opt-in to
declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed
in N2653.  See [2].

I don't think glibc should have such a feature test macro, and I don't
think GCC should define such feature test macros either - _*_SOURCE macros
are generally for the *user* to define to decide what namespace they want
visible, not for the compiler to define.  Without proliferating new
language dialects, __STDC_VERSION__ ought to be sufficient to communicate
from the compiler to the library (including to GCC's own headers such as
stdatomic.h).


In general I agree, but I think an exception is warranted in this case for a
few reasons:

1. The feature includes both core language changes (the change of type
for u8 string literals) and library changes.  The library changes
are not actually dependent on the core language change, but they are
intended to be used together.
2. Existing use of the char8_t identifier can be found in existing open
source projects and likely exists in some closed source projects as
well.  An opt-in approach avoids conflict and the need to
conditionalize code based on gcc version.
3. An opt-in approach enables evaluation of the feature prior to any
WG14 approval.

But calling it _CHAR8_T_SOURCE is weird and inconsistent with everything
else.
In C++, there is __cpp_char8_t 201811L predefined macro for char8_t.
Using that in C is not right, sure.
Often we use __SIZEOF_type__ macros not just for sizeof(), but also for
presence check of the types, like
#ifdef __SIZEOF_INT128__
__int128 i;
#else
long long i;
#endif
etc., while char8_t has sizeof (char8_t) == 1, perhaps predefining
__SIZEOF_CHAR8_T__ 1
instead of _CHAR8_T_SOURCE would be better?


I'm open to whatever signaling mechanism would be preferred.  It took me 
a while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I 
didn't find much for other precedents.


I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option 
is unusual with respect to other feature test macros.  Is that what you 
find to be weird and inconsistent?


Predefining __SIZEOF_CHAR8_T__ would be consistent with 
__SIZEOF_WCHAR_T__, but kind of strange too since the size is always 1.


Perhaps a better approach would be to follow the __CHAR16_TYPE__ and 
__CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char.  
That is likewise a bit strange since the type would always be unsigned 
char, but it does provide a bit more symmetry.  That could potentially 
have some use as well; for C++, it could be defined as char8_t and 
thereby reflect the difference between the two languages.  Perhaps it 
could be useful in the future as well if WG14 were to add distinct 
char8_t, char16_t, and char32_t types as C++ did (I'm not offering any 
prediction regarding the likelihood of that happening).


Tom.



Jakub





Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 11, 2021 at 11:52:41AM -0400, Tom Honermann via Gcc-patches wrote:
> On 6/7/21 5:11 PM, Joseph Myers wrote:
> > On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:
> > 
> > > When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE 
> > > macro
> > > is predefined.  This is the mechanism proposed to glibc to opt-in to
> > > declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions 
> > > proposed
> > > in N2653.  See [2].
> > I don't think glibc should have such a feature test macro, and I don't
> > think GCC should define such feature test macros either - _*_SOURCE macros
> > are generally for the *user* to define to decide what namespace they want
> > visible, not for the compiler to define.  Without proliferating new
> > language dialects, __STDC_VERSION__ ought to be sufficient to communicate
> > from the compiler to the library (including to GCC's own headers such as
> > stdatomic.h).
> > 
> In general I agree, but I think an exception is warranted in this case for a
> few reasons:
> 
> 1. The feature includes both core language changes (the change of type
>for u8 string literals) and library changes.  The library changes
>are not actually dependent on the core language change, but they are
>intended to be used together.
> 2. Existing use of the char8_t identifier can be found in existing open
>source projects and likely exists in some closed source projects as
>well.  An opt-in approach avoids conflict and the need to
>conditionalize code based on gcc version.
> 3. An opt-in approach enables evaluation of the feature prior to any
>WG14 approval.

But calling it _CHAR8_T_SOURCE is weird and inconsistent with everything
else.
In C++, there is __cpp_char8_t 201811L predefined macro for char8_t.
Using that in C is not right, sure.
Often we use __SIZEOF_type__ macros not just for sizeof(), but also for
presence check of the types, like
#ifdef __SIZEOF_INT128__
__int128 i;
#else
long long i;
#endif
etc., while char8_t has sizeof (char8_t) == 1, perhaps predefining
__SIZEOF_CHAR8_T__ 1
instead of _CHAR8_T_SOURCE would be better?

Jakub



Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Tom Honermann via Gcc-patches

On 6/7/21 5:12 PM, Joseph Myers wrote:

Also, it seems odd to add a new field to cpp_options without any code in
libcpp that uses the value of that field.

Ah, thank you.  That appears to be leftover code from prior 
experimentation and I failed to identify it as such when preparing the 
patch.  I'll provide a revised patch.


Tom.



Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Tom Honermann via Gcc-patches

On 6/7/21 5:11 PM, Joseph Myers wrote:

On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:


When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro
is predefined.  This is the mechanism proposed to glibc to opt-in to
declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed
in N2653.  See [2].

I don't think glibc should have such a feature test macro, and I don't
think GCC should define such feature test macros either - _*_SOURCE macros
are generally for the *user* to define to decide what namespace they want
visible, not for the compiler to define.  Without proliferating new
language dialects, __STDC_VERSION__ ought to be sufficient to communicate
from the compiler to the library (including to GCC's own headers such as
stdatomic.h).

In general I agree, but I think an exception is warranted in this case 
for a few reasons:


1. The feature includes both core language changes (the change of type
   for u8 string literals) and library changes.  The library changes
   are not actually dependent on the core language change, but they are
   intended to be used together.
2. Existing use of the char8_t identifier can be found in existing open
   source projects and likely exists in some closed source projects as
   well.  An opt-in approach avoids conflict and the need to
   conditionalize code based on gcc version.
3. An opt-in approach enables evaluation of the feature prior to any
   WG14 approval.

Tom.



Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-07 Thread Joseph Myers
Also, it seems odd to add a new field to cpp_options without any code in 
libcpp that uses the value of that field.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-07 Thread Joseph Myers
On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:

> When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro
> is predefined.  This is the mechanism proposed to glibc to opt-in to
> declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed
> in N2653.  See [2].

I don't think glibc should have such a feature test macro, and I don't 
think GCC should define such feature test macros either - _*_SOURCE macros 
are generally for the *user* to define to decide what namespace they want 
visible, not for the compiler to define.  Without proliferating new 
language dialects, __STDC_VERSION__ ought to be sufficient to communicate 
from the compiler to the library (including to GCC's own headers such as 
stdatomic.h).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 1/3]: C N2653 char8_t: Language support

2021-06-06 Thread Tom Honermann via Gcc-patches
This patch implements the core language and compiler dependent library 
changes proposed in WG14 N2653 [1] for C.  The changes include:

- Use of the existing -fchar8_t and -fno-char8_t options to opt-in to
  (or opt-out of) the following changes when compiling C code.
- Change of type for UTF-8 string literals from array of char to array
  of char8_t (unsigned char).
- A new atomic_char8_t typedef.
- A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of a new
  predefined ATOMIC_CHAR8_T_LOCK_FREE macro.

When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE 
macro is predefined.  This is the mechanism proposed to glibc to opt-in 
to declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions 
proposed in N2653.  See [2].


Tested on Linux x86_64.

gcc/ChangeLog:

2021-05-31  Tom Honermann  

 * ginclude/stdatomic.h (atomic_char8_t, ATOMIC_CHAR8_T_LOCK_FREE):
   New typedef and macro.

gcc/c/ChangeLog:

2021-05-31  Tom Honermann  

 * c-parser.c (c_parser_string_literal): Use char8_t as the type of
   CPP_UTF8STRING when char8_t support is enabled.
 * c-typeck.c (digest_init): Handle initialization of an array
   of character type by a string literal with type array of
   unsigned char.

gcc/c-family/ChangeLog:

2021-05-31  Tom Honermann  

 * c-cppbuiltin.c (c_cpp_builtins): Define _CHAR8_T_SOURCE if
   char8_t support is enabled in non-C++ language modes.
 * c-lex.c (lex_string): Use char8_t as the type of
   CPP_UTF8STRING when char8_t support is enabled.
 * c-opts.c (c_common_handle_option): Inform the preprocessor if
   char8_t support is enabled.
 * c.opt (fchar8_t): Enable for C language modes.

libcpp/ChangeLog:

2021-05-31  Tom Honermann  

 * include/cpplib.h (cpp_options): Add char8.

Tom.

[1]: WG14 N2653
 "char8_t: A type for UTF-8 characters and strings (Revision 1)"
 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm

[2]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and 
c8rtomb().
 [Patch 0]: 
https://sourceware.org/pipermail/libc-alpha/2021-June/127230.html
 [Patch 1]: 
https://sourceware.org/pipermail/libc-alpha/2021-June/127231.html
 [Patch 2]: 
https://sourceware.org/pipermail/libc-alpha/2021-June/127232.html
 [Patch 3]: 
https://sourceware.org/pipermail/libc-alpha/2021-June/127233.html
commit c4260c7c49822522945377cc2fb93ee9830cefc8
Author: Tom Honermann 
Date:   Sat Feb 13 09:02:34 2021 -0500

N2653 char8_t for C: Language support

This patch implements the core language and compiler dependent library
changes proposed in WG14 N2653 for C.  The changes include:
- Use of the existing -fchar8_t and -fno-char8_t options to opt-in to
  (or opt-out of) the following changes when compiling C code.
- Change of type for UTF-8 string literals from array of const char to
  array of const char8_t (unsigned char).
- A new atomic_char8_t typedef.
- A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of a new
  predefined ATOMIC_CHAR8_T_LOCK_FREE macro.

When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE
macro is predefined.  This is the mechanism proposed to glibc to opt-in
to declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions
proposed in N2653.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 42b7604c9ac..3e944ec2b86 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1467,6 +1467,11 @@ c_cpp_builtins (cpp_reader *pfile)
   if (flag_iso)
 cpp_define (pfile, "__STRICT_ANSI__");
 
+  /* Express intent for char8_t support in C (not C++) to the C library if
+ requested.  */
+  if (!c_dialect_cxx () && flag_char8_t)
+cpp_define (pfile, "_CHAR8_T_SOURCE");
+
   if (!flag_signed_char)
 cpp_define (pfile, "__CHAR_UNSIGNED__");
 
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index c44e7a13489..e30e44e9f5c 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -1335,7 +1335,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
 	default:
 	case CPP_STRING:
 	case CPP_UTF8STRING:
-	  value = build_string (1, "");
+	  if (type == CPP_UTF8STRING && flag_char8_t)
+	{
+	  value = build_string (TYPE_PRECISION (char8_type_node)
+/ TYPE_PRECISION (char_type_node),
+"");  /* char8_t is 8 bits */
+	}
+	  else
+	value = build_string (1, "");
 	  break;
 	case CPP_STRING16:
 	  value = build_string (TYPE_PRECISION (char16_type_node)
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 60b5802722c..eefc607dac6 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -718,6 +718,10 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value,
 case OPT_v:
   verbose = true;
   break;
+
+case OPT_fchar8_t:
+  c