Re: C11: should we use char32_t for unicode code points?

Jeff Davis Sun, 26 Oct 2025 12:43:31 -0700

On Sat, 2025-10-25 at 16:21 +1300, Thomas Munro wrote:
> I
> guess you could static_assert(__STDC_UTF_32__, "char32_t must use
> UTF-32 encoding").


Done.

>   It's also defined as at least, not exactly, 32
> bits but we already require the machine to have uint32_t so it must
> be
> exactly 32 bits for us, and we could static_assert(sizeof(char32_t)
> ==
> 4) for good measure.

What would be the problem if it were larger than 32 bits?

I don't mind adding the asserts, but it's slightly awkward because
StaticAssertDecl() isn't defined yet at the point we are including
uchar.h.

> IIUC you're proposing that all the stuff that only works when
> database
> encoding is UTF8 should be flipped over to the new type, and that
> seems like a really good idea to me: remaining uses of pg_wchar would
> be warnings that the encoding is only conditionally known.

Exactly. The idea is to make pg_wchar stand out more as a platform-
dependent (or encoding-dependent) representation, and remove the doubt
when someone sees char32_t.

>   It'd be
> documentation without new type safety though: for example I think you
> missed a spot, the return type of the definition of utf8_to_unicode()
> (I didn't search exhaustively).

Right, it's not offering type safety. Fixed the omission.

> Do you consider explicit casts between eg pg_wchar and char32_t to be
> useful documentation for humans, when coercion should just work?  I
> kinda thought we were trying to cut down on useless casts, they might
> signal something but can also hide bugs.

The patch doesn't add any explicit casts, except in to_char32() and
to_pg_wchar(), so I assume that the callsites of those functions are
what you meant by "explicit casts"?

We can get rid of those functions if you want. The main reason they
exist is for a place to comment on the safety of converting pg_wchar to
char32_t. I can put that somewhere else, though.

>   Should the few places that
> deal in surrogates be using char16_t instead?

Yes, done.

> I wonder if the XXX_libc_mb() functions that contain our hard-coded
> assumptions that libc's wchar_t values are in UTF-16 or UTF-32 should
> use your to_char32_t() too (probably with a longer name
> pg_wchar_to_char32_t() if it's in a header for wider use).

I don't think those functions do depend on UTF-32. iswalpha(), etc.,
take a wint_t, which is just a wchar_t that can also be WEOF.

And if we don't use to_char32/to_pg_wchar in there, I don't see much
need for it outside of pg_locale_builtin.c, but if the need arises we
can move it to a header file and give it a longer name.

>   That'd
> highlight the exact points at which we make that assumption and
> centralise the assertion about database encoding, and then the code
> that compares with various known cut-off values would be clearly in
> the char32_t world.

The asserts about UTF-8 in pg_locale_libc.c are there because the
previous code only took those code paths for UTF-8, and I preserved
that. Also there is some code that depends on UTF-8 for decoding, but I
don't think anything in there depends on UTF-32 specifically.

> There is one small practical problem though: Apple hasn't got around
> to supplying <uchar.h> in its C SDK yet.  It's there for C++ only,
> and
>  isn't needed for the type in C++ anyway.  I don't think that alone
> warrants a new name wart, as the standard tells us it must match
> uint32_least32_t so we can just define it ourselves if
> !defined(__cplusplus__) && !defined(HAVE_UCHAR_H), until Apple gets
> around to that.

Thank you, I added a configure test for uchar.h and some more
preprocessor logic in c.h.

> Since it confused me briefly: Apple does provide <unicode/uchar.h>
> but
> that's a coincidentally named ICU header, and on that subject I see
> that ICU hasn't adopted these types yet but there are some hints that
> they're thinking about it; meanwhile their C++ interfaces have begun
> to document that they are acceptable in a few template functions.

Even when they fully move to char32_t, we will still have to support
the older ICU versions for a long time.

> All other target systems have it AFAICS.  Windows: tested by CI,
> MinGW: found discussion, *BSD, Solaris, Illumos: found man pages.

Great, thank you!

> They solve the "size and encoding of wchar_t is
> undefined" problem

One thing I never understood about this is that it's our code that
converts from the server encoding to pg_wchar (e.g.
pg_latin12wchar_with_len()), so we must understand the representation
of pg_wchar. And we cast directly from pg_wchar to wchar_t, so we
understand the encoding of wchar_t, too, right?


>  In passing, we seem to have a couple of mentions of "pg_wchar_t"
> (bogus _t) in existing comments.

Thank you. I'll fix that separately.

Regards,
        Jeff Davis

From a289df81514b1f784f9242a1c214bcc0749b6f9c Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Tue, 21 Oct 2025 13:16:47 -0700
Subject: [PATCH v2] Use C11 char16_t and char32_t for Unicode code points.

Reviewed-by: Tatsuo Ishii <[email protected]>
Reviewed-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/bedcc93d06203dfd89815b10f815ca2de8626e85.camel%40j-davis.com
---
 configure.ac                                  |  1 +
 meson.build                                   |  1 +
 src/backend/parser/parser.c                   |  8 +--
 src/backend/parser/scan.l                     |  8 +--
 src/backend/utils/adt/jsonpath_scan.l         |  6 +-
 src/backend/utils/adt/pg_locale_builtin.c     | 44 ++++++++++-----
 src/backend/utils/adt/varlena.c               | 40 ++++++-------
 src/backend/utils/mb/mbutils.c                |  4 +-
 src/common/saslprep.c                         | 48 ++++++++--------
 src/common/unicode/case_test.c                | 23 ++++----
 src/common/unicode/category_test.c            |  3 +-
 .../unicode/generate-norm_test_table.pl       |  4 +-
 .../unicode/generate-unicode_case_table.pl    |  7 +--
 .../generate-unicode_category_table.pl        |  8 +--
 src/common/unicode/norm_test.c                |  6 +-
 src/common/unicode_case.c                     | 56 +++++++++----------
 src/common/unicode_category.c                 | 50 ++++++++---------
 src/common/unicode_norm.c                     | 56 +++++++++----------
 src/fe_utils/mbprint.c                        | 10 ++--
 src/include/c.h                               | 19 +++++++
 src/include/common/unicode_case.h             | 10 ++--
 src/include/common/unicode_case_table.h       | 13 ++---
 src/include/common/unicode_category.h         | 46 ++++++++-------
 src/include/common/unicode_category_table.h   |  8 +--
 src/include/common/unicode_norm.h             |  6 +-
 src/include/mb/pg_wchar.h                     | 32 +++++------
 src/tools/pgindent/typedefs.list              |  2 +
 27 files changed, 276 insertions(+), 243 deletions(-)

diff --git a/configure.ac b/configure.ac
index e44943aa6fe..6ab2e157531 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1513,6 +1513,7 @@ AC_CHECK_HEADERS(m4_normalize([
 	sys/signalfd.h
 	sys/ucred.h
 	termios.h
+	uchar.h
 	ucred.h
 	xlocale.h
 ]))
diff --git a/meson.build b/meson.build
index 395416a6060..c3128c7554f 100644
--- a/meson.build
+++ b/meson.build
@@ -2613,6 +2613,7 @@ header_checks = [
   'sys/signalfd.h',
   'sys/ucred.h',
   'termios.h',
+  'uchar.h',
   'ucred.h',
   'xlocale.h',
 ]
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 33a040506b4..a3679f8e86c 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -339,7 +339,7 @@ hexval(unsigned char c)
 
 /* is Unicode code point acceptable? */
 static void
-check_unicode_value(pg_wchar c)
+check_unicode_value(char32_t c)
 {
 	if (!is_valid_unicode_codepoint(c))
 		ereport(ERROR,
@@ -376,7 +376,7 @@ str_udeescape(const char *str, char escape,
 	char	   *new,
 			   *out;
 	size_t		new_len;
-	pg_wchar	pair_first = 0;
+	char16_t	pair_first = 0;
 	ScannerCallbackState scbstate;
 
 	/*
@@ -420,7 +420,7 @@ str_udeescape(const char *str, char escape,
 					 isxdigit((unsigned char) in[3]) &&
 					 isxdigit((unsigned char) in[4]))
 			{
-				pg_wchar	unicode;
+				char32_t	unicode;
 
 				unicode = (hexval(in[1]) << 12) +
 					(hexval(in[2]) << 8) +
@@ -457,7 +457,7 @@ str_udeescape(const char *str, char escape,
 					 isxdigit((unsigned char) in[6]) &&
 					 isxdigit((unsigned char) in[7]))
 			{
-				pg_wchar	unicode;
+				char32_t	unicode;
 
 				unicode = (hexval(in[2]) << 20) +
 					(hexval(in[3]) << 16) +
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 08990831fe8..a67815339b7 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -121,7 +121,7 @@ static void addlitchar(unsigned char ychar, core_yyscan_t yyscanner);
 static char *litbufdup(core_yyscan_t yyscanner);
 static unsigned char unescape_single_char(unsigned char c, core_yyscan_t yyscanner);
 static int	process_integer_literal(const char *token, YYSTYPE *lval, int base);
-static void addunicode(pg_wchar c, yyscan_t yyscanner);
+static void addunicode(char32_t c, yyscan_t yyscanner);
 
 #define yyerror(msg)  scanner_yyerror(msg, yyscanner)
 
@@ -640,7 +640,7 @@ other			.
 					addlit(yytext, yyleng, yyscanner);
 				}
 <xe>{xeunicode} {
-					pg_wchar	c = strtoul(yytext + 2, NULL, 16);
+					char32_t	c = strtoul(yytext + 2, NULL, 16);
 
 					/*
 					 * For consistency with other productions, issue any
@@ -668,7 +668,7 @@ other			.
 					POP_YYLLOC();
 				}
 <xeu>{xeunicode} {
-					pg_wchar	c = strtoul(yytext + 2, NULL, 16);
+					char32_t	c = strtoul(yytext + 2, NULL, 16);
 
 					/* Remember start of overall string token ... */
 					PUSH_YYLLOC();
@@ -1376,7 +1376,7 @@ process_integer_literal(const char *token, YYSTYPE *lval, int base)
 }
 
 static void
-addunicode(pg_wchar c, core_yyscan_t yyscanner)
+addunicode(char32_t c, core_yyscan_t yyscanner)
 {
 	ScannerCallbackState scbstate;
 	char		buf[MAX_UNICODE_EQUIVALENT_STRING + 1];
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index c7aab83eeb4..8c3a0a9c642 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -574,7 +574,7 @@ hexval(char c, int *result, struct Node *escontext, yyscan_t yyscanner)
 
 /* Add given unicode character to scanstring */
 static bool
-addUnicodeChar(int ch, struct Node *escontext, yyscan_t yyscanner)
+addUnicodeChar(char32_t ch, struct Node *escontext, yyscan_t yyscanner)
 {
 	if (ch == 0)
 	{
@@ -607,7 +607,7 @@ addUnicodeChar(int ch, struct Node *escontext, yyscan_t yyscanner)
 
 /* Add unicode character, processing any surrogate pairs */
 static bool
-addUnicode(int ch, int *hi_surrogate, struct Node *escontext, yyscan_t yyscanner)
+addUnicode(char32_t ch, int *hi_surrogate, struct Node *escontext, yyscan_t yyscanner)
 {
 	if (is_utf16_surrogate_first(ch))
 	{
@@ -655,7 +655,7 @@ parseUnicode(char *s, int l, struct Node *escontext, yyscan_t yyscanner)
 
 	for (i = 2; i < l; i += 2)	/* skip '\u' */
 	{
-		int			ch = 0;
+		char32_t		ch = 0;
 		int			j,
 					si;
 
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 3dc611b50e1..1021e0d129b 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -15,7 +15,6 @@
 #include "catalog/pg_collation.h"
 #include "common/unicode_case.h"
 #include "common/unicode_category.h"
-#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "utils/builtins.h"
 #include "utils/pg_locale.h"
@@ -35,6 +34,23 @@ struct WordBoundaryState
 	bool		prev_alnum;
 };
 
+/*
+ * In UTF-8, pg_wchar is guaranteed to be the code point value.
+ */
+static inline char32_t
+to_char32(pg_wchar wc)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+	return (char32_t) wc;
+}
+
+static inline pg_wchar
+to_pg_wchar(char32_t c32)
+{
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+	return (pg_wchar) c32;
+}
+
 /*
  * Simple word boundary iterator that draws boundaries each time the result of
  * pg_u_isalnum() changes.
@@ -47,7 +63,7 @@ initcap_wbnext(void *state)
 	while (wbstate->offset < wbstate->len &&
 		   wbstate->str[wbstate->offset] != '\0')
 	{
-		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
+		char32_t	u = utf8_to_unicode((unsigned char *) wbstate->str +
 										wbstate->offset);
 		bool		curr_alnum = pg_u_isalnum(u, wbstate->posix);
 
@@ -112,61 +128,61 @@ strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 static bool
 wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isdigit(wc, !locale->builtin.casemap_full);
+	return pg_u_isdigit(to_char32(wc), !locale->builtin.casemap_full);
 }
 
 static bool
 wc_isalpha_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isalpha(wc);
+	return pg_u_isalpha(to_char32(wc));
 }
 
 static bool
 wc_isalnum_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isalnum(wc, !locale->builtin.casemap_full);
+	return pg_u_isalnum(to_char32(wc), !locale->builtin.casemap_full);
 }
 
 static bool
 wc_isupper_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isupper(wc);
+	return pg_u_isupper(to_char32(wc));
 }
 
 static bool
 wc_islower_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_islower(wc);
+	return pg_u_islower(to_char32(wc));
 }
 
 static bool
 wc_isgraph_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isgraph(wc);
+	return pg_u_isgraph(to_char32(wc));
 }
 
 static bool
 wc_isprint_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isprint(wc);
+	return pg_u_isprint(to_char32(wc));
 }
 
 static bool
 wc_ispunct_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_ispunct(wc, !locale->builtin.casemap_full);
+	return pg_u_ispunct(to_char32(wc), !locale->builtin.casemap_full);
 }
 
 static bool
 wc_isspace_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isspace(wc);
+	return pg_u_isspace(to_char32(wc));
 }
 
 static bool
 wc_isxdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return pg_u_isxdigit(wc, !locale->builtin.casemap_full);
+	return pg_u_isxdigit(to_char32(wc), !locale->builtin.casemap_full);
 }
 
 static bool
@@ -179,13 +195,13 @@ char_is_cased_builtin(char ch, pg_locale_t locale)
 static pg_wchar
 wc_toupper_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return unicode_uppercase_simple(wc);
+	return to_pg_wchar(unicode_uppercase_simple(to_char32(wc)));
 }
 
 static pg_wchar
 wc_tolower_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return unicode_lowercase_simple(wc);
+	return to_pg_wchar(unicode_lowercase_simple(to_char32(wc)));
 }
 
 static const struct ctype_methods ctype_methods_builtin = {
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 2c398cd9e5c..8d735786e51 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5419,12 +5419,12 @@ unicode_assigned(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errmsg("Unicode categorization can only be performed if server encoding is UTF8")));
 
-	/* convert to pg_wchar */
+	/* convert to char32_t */
 	size = pg_mbstrlen_with_len(VARDATA_ANY(input), VARSIZE_ANY_EXHDR(input));
 	p = (unsigned char *) VARDATA_ANY(input);
 	for (int i = 0; i < size; i++)
 	{
-		pg_wchar	uchar = utf8_to_unicode(p);
+		char32_t	uchar = utf8_to_unicode(p);
 		int			category = unicode_category(uchar);
 
 		if (category == PG_U_UNASSIGNED)
@@ -5443,24 +5443,24 @@ unicode_normalize_func(PG_FUNCTION_ARGS)
 	char	   *formstr = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	UnicodeNormalizationForm form;
 	int			size;
-	pg_wchar   *input_chars;
-	pg_wchar   *output_chars;
+	char32_t   *input_chars;
+	char32_t   *output_chars;
 	unsigned char *p;
 	text	   *result;
 	int			i;
 
 	form = unicode_norm_form_from_string(formstr);
 
-	/* convert to pg_wchar */
+	/* convert to char32_t */
 	size = pg_mbstrlen_with_len(VARDATA_ANY(input), VARSIZE_ANY_EXHDR(input));
-	input_chars = palloc((size + 1) * sizeof(pg_wchar));
+	input_chars = palloc((size + 1) * sizeof(char32_t));
 	p = (unsigned char *) VARDATA_ANY(input);
 	for (i = 0; i < size; i++)
 	{
 		input_chars[i] = utf8_to_unicode(p);
 		p += pg_utf_mblen(p);
 	}
-	input_chars[i] = (pg_wchar) '\0';
+	input_chars[i] = (char32_t) '\0';
 	Assert((char *) p == VARDATA_ANY(input) + VARSIZE_ANY_EXHDR(input));
 
 	/* action */
@@ -5468,7 +5468,7 @@ unicode_normalize_func(PG_FUNCTION_ARGS)
 
 	/* convert back to UTF-8 string */
 	size = 0;
-	for (pg_wchar *wp = output_chars; *wp; wp++)
+	for (char32_t *wp = output_chars; *wp; wp++)
 	{
 		unsigned char buf[4];
 
@@ -5480,7 +5480,7 @@ unicode_normalize_func(PG_FUNCTION_ARGS)
 	SET_VARSIZE(result, size + VARHDRSZ);
 
 	p = (unsigned char *) VARDATA_ANY(result);
-	for (pg_wchar *wp = output_chars; *wp; wp++)
+	for (char32_t *wp = output_chars; *wp; wp++)
 	{
 		unicode_to_utf8(*wp, p);
 		p += pg_utf_mblen(p);
@@ -5509,8 +5509,8 @@ unicode_is_normalized(PG_FUNCTION_ARGS)
 	char	   *formstr = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	UnicodeNormalizationForm form;
 	int			size;
-	pg_wchar   *input_chars;
-	pg_wchar   *output_chars;
+	char32_t   *input_chars;
+	char32_t   *output_chars;
 	unsigned char *p;
 	int			i;
 	UnicodeNormalizationQC quickcheck;
@@ -5519,16 +5519,16 @@ unicode_is_normalized(PG_FUNCTION_ARGS)
 
 	form = unicode_norm_form_from_string(formstr);
 
-	/* convert to pg_wchar */
+	/* convert to char32_t */
 	size = pg_mbstrlen_with_len(VARDATA_ANY(input), VARSIZE_ANY_EXHDR(input));
-	input_chars = palloc((size + 1) * sizeof(pg_wchar));
+	input_chars = palloc((size + 1) * sizeof(char32_t));
 	p = (unsigned char *) VARDATA_ANY(input);
 	for (i = 0; i < size; i++)
 	{
 		input_chars[i] = utf8_to_unicode(p);
 		p += pg_utf_mblen(p);
 	}
-	input_chars[i] = (pg_wchar) '\0';
+	input_chars[i] = (char32_t) '\0';
 	Assert((char *) p == VARDATA_ANY(input) + VARSIZE_ANY_EXHDR(input));
 
 	/* quick check (see UAX #15) */
@@ -5542,11 +5542,11 @@ unicode_is_normalized(PG_FUNCTION_ARGS)
 	output_chars = unicode_normalize(form, input_chars);
 
 	output_size = 0;
-	for (pg_wchar *wp = output_chars; *wp; wp++)
+	for (char32_t *wp = output_chars; *wp; wp++)
 		output_size++;
 
 	result = (size == output_size) &&
-		(memcmp(input_chars, output_chars, size * sizeof(pg_wchar)) == 0);
+		(memcmp(input_chars, output_chars, size * sizeof(char32_t)) == 0);
 
 	PG_RETURN_BOOL(result);
 }
@@ -5602,7 +5602,7 @@ unistr(PG_FUNCTION_ARGS)
 	int			len;
 	StringInfoData str;
 	text	   *result;
-	pg_wchar	pair_first = 0;
+	char16_t	pair_first = 0;
 	char		cbuf[MAX_UNICODE_EQUIVALENT_STRING + 1];
 
 	instr = VARDATA_ANY(input_text);
@@ -5626,7 +5626,7 @@ unistr(PG_FUNCTION_ARGS)
 			else if ((len >= 5 && isxdigits_n(instr + 1, 4)) ||
 					 (len >= 6 && instr[1] == 'u' && isxdigits_n(instr + 2, 4)))
 			{
-				pg_wchar	unicode;
+				char32_t	unicode;
 				int			offset = instr[1] == 'u' ? 2 : 1;
 
 				unicode = hexval_n(instr + offset, 4);
@@ -5662,7 +5662,7 @@ unistr(PG_FUNCTION_ARGS)
 			}
 			else if (len >= 8 && instr[1] == '+' && isxdigits_n(instr + 2, 6))
 			{
-				pg_wchar	unicode;
+				char32_t	unicode;
 
 				unicode = hexval_n(instr + 2, 6);
 
@@ -5697,7 +5697,7 @@ unistr(PG_FUNCTION_ARGS)
 			}
 			else if (len >= 10 && instr[1] == 'U' && isxdigits_n(instr + 2, 8))
 			{
-				pg_wchar	unicode;
+				char32_t	unicode;
 
 				unicode = hexval_n(instr + 2, 8);
 
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 886ecbad871..fb629ed5c8f 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -862,7 +862,7 @@ perform_default_encoding_conversion(const char *src, int len,
  * may call this outside any transaction, or in an aborted transaction.
  */
 void
-pg_unicode_to_server(pg_wchar c, unsigned char *s)
+pg_unicode_to_server(char32_t c, unsigned char *s)
 {
 	unsigned char c_as_utf8[MAX_MULTIBYTE_CHAR_LEN + 1];
 	int			c_as_utf8_len;
@@ -924,7 +924,7 @@ pg_unicode_to_server(pg_wchar c, unsigned char *s)
  * but simply return false on conversion failure.
  */
 bool
-pg_unicode_to_server_noerror(pg_wchar c, unsigned char *s)
+pg_unicode_to_server_noerror(char32_t c, unsigned char *s)
 {
 	unsigned char c_as_utf8[MAX_MULTIBYTE_CHAR_LEN + 1];
 	int			c_as_utf8_len;
diff --git a/src/common/saslprep.c b/src/common/saslprep.c
index 97beb47940b..101e8d65a4d 100644
--- a/src/common/saslprep.c
+++ b/src/common/saslprep.c
@@ -47,7 +47,7 @@
 
 /* Prototypes for local functions */
 static int	codepoint_range_cmp(const void *a, const void *b);
-static bool is_code_in_table(pg_wchar code, const pg_wchar *map, int mapsize);
+static bool is_code_in_table(char32_t code, const char32_t *map, int mapsize);
 static int	pg_utf8_string_len(const char *source);
 
 /*
@@ -64,7 +64,7 @@ static int	pg_utf8_string_len(const char *source);
  *
  * These are all mapped to the ASCII space character (U+00A0).
  */
-static const pg_wchar non_ascii_space_ranges[] =
+static const char32_t non_ascii_space_ranges[] =
 {
 	0x00A0, 0x00A0,
 	0x1680, 0x1680,
@@ -79,7 +79,7 @@ static const pg_wchar non_ascii_space_ranges[] =
  *
  * If any of these appear in the input, they are removed.
  */
-static const pg_wchar commonly_mapped_to_nothing_ranges[] =
+static const char32_t commonly_mapped_to_nothing_ranges[] =
 {
 	0x00AD, 0x00AD,
 	0x034F, 0x034F,
@@ -114,7 +114,7 @@ static const pg_wchar commonly_mapped_to_nothing_ranges[] =
  * tables, so one code might originate from multiple source tables.
  * Adjacent ranges have also been merged together, to save space.
  */
-static const pg_wchar prohibited_output_ranges[] =
+static const char32_t prohibited_output_ranges[] =
 {
 	0x0000, 0x001F,				/* C.2.1 */
 	0x007F, 0x00A0,				/* C.1.2, C.2.1, C.2.2 */
@@ -155,7 +155,7 @@ static const pg_wchar prohibited_output_ranges[] =
 };
 
 /* A.1 Unassigned code points in Unicode 3.2 */
-static const pg_wchar unassigned_codepoint_ranges[] =
+static const char32_t unassigned_codepoint_ranges[] =
 {
 	0x0221, 0x0221,
 	0x0234, 0x024F,
@@ -556,7 +556,7 @@ static const pg_wchar unassigned_codepoint_ranges[] =
 };
 
 /* D.1 Characters with bidirectional property "R" or "AL" */
-static const pg_wchar RandALCat_codepoint_ranges[] =
+static const char32_t RandALCat_codepoint_ranges[] =
 {
 	0x05BE, 0x05BE,
 	0x05C0, 0x05C0,
@@ -595,7 +595,7 @@ static const pg_wchar RandALCat_codepoint_ranges[] =
 };
 
 /* D.2 Characters with bidirectional property "L" */
-static const pg_wchar LCat_codepoint_ranges[] =
+static const char32_t LCat_codepoint_ranges[] =
 {
 	0x0041, 0x005A,
 	0x0061, 0x007A,
@@ -968,8 +968,8 @@ static const pg_wchar LCat_codepoint_ranges[] =
 static int
 codepoint_range_cmp(const void *a, const void *b)
 {
-	const pg_wchar *key = (const pg_wchar *) a;
-	const pg_wchar *range = (const pg_wchar *) b;
+	const char32_t *key = (const char32_t *) a;
+	const char32_t *range = (const char32_t *) b;
 
 	if (*key < range[0])
 		return -1;				/* less than lower bound */
@@ -980,14 +980,14 @@ codepoint_range_cmp(const void *a, const void *b)
 }
 
 static bool
-is_code_in_table(pg_wchar code, const pg_wchar *map, int mapsize)
+is_code_in_table(char32_t code, const char32_t *map, int mapsize)
 {
 	Assert(mapsize % 2 == 0);
 
 	if (code < map[0] || code > map[mapsize - 1])
 		return false;
 
-	if (bsearch(&code, map, mapsize / 2, sizeof(pg_wchar) * 2,
+	if (bsearch(&code, map, mapsize / 2, sizeof(char32_t) * 2,
 				codepoint_range_cmp))
 		return true;
 	else
@@ -1046,8 +1046,8 @@ pg_utf8_string_len(const char *source)
 pg_saslprep_rc
 pg_saslprep(const char *input, char **output)
 {
-	pg_wchar   *input_chars = NULL;
-	pg_wchar   *output_chars = NULL;
+	char32_t   *input_chars = NULL;
+	char32_t   *output_chars = NULL;
 	int			input_size;
 	char	   *result;
 	int			result_size;
@@ -1055,7 +1055,7 @@ pg_saslprep(const char *input, char **output)
 	int			i;
 	bool		contains_RandALCat;
 	unsigned char *p;
-	pg_wchar   *wp;
+	char32_t   *wp;
 
 	/* Ensure we return *output as NULL on failure */
 	*output = NULL;
@@ -1080,10 +1080,10 @@ pg_saslprep(const char *input, char **output)
 	input_size = pg_utf8_string_len(input);
 	if (input_size < 0)
 		return SASLPREP_INVALID_UTF8;
-	if (input_size >= MaxAllocSize / sizeof(pg_wchar))
+	if (input_size >= MaxAllocSize / sizeof(char32_t))
 		goto oom;
 
-	input_chars = ALLOC((input_size + 1) * sizeof(pg_wchar));
+	input_chars = ALLOC((input_size + 1) * sizeof(char32_t));
 	if (!input_chars)
 		goto oom;
 
@@ -1093,7 +1093,7 @@ pg_saslprep(const char *input, char **output)
 		input_chars[i] = utf8_to_unicode(p);
 		p += pg_utf_mblen(p);
 	}
-	input_chars[i] = (pg_wchar) '\0';
+	input_chars[i] = (char32_t) '\0';
 
 	/*
 	 * The steps below correspond to the steps listed in [RFC3454], Section
@@ -1107,7 +1107,7 @@ pg_saslprep(const char *input, char **output)
 	count = 0;
 	for (i = 0; i < input_size; i++)
 	{
-		pg_wchar	code = input_chars[i];
+		char32_t	code = input_chars[i];
 
 		if (IS_CODE_IN_TABLE(code, non_ascii_space_ranges))
 			input_chars[count++] = 0x0020;
@@ -1118,7 +1118,7 @@ pg_saslprep(const char *input, char **output)
 		else
 			input_chars[count++] = code;
 	}
-	input_chars[count] = (pg_wchar) '\0';
+	input_chars[count] = (char32_t) '\0';
 	input_size = count;
 
 	if (input_size == 0)
@@ -1138,7 +1138,7 @@ pg_saslprep(const char *input, char **output)
 	 */
 	for (i = 0; i < input_size; i++)
 	{
-		pg_wchar	code = input_chars[i];
+		char32_t	code = input_chars[i];
 
 		if (IS_CODE_IN_TABLE(code, prohibited_output_ranges))
 			goto prohibited;
@@ -1170,7 +1170,7 @@ pg_saslprep(const char *input, char **output)
 	contains_RandALCat = false;
 	for (i = 0; i < input_size; i++)
 	{
-		pg_wchar	code = input_chars[i];
+		char32_t	code = input_chars[i];
 
 		if (IS_CODE_IN_TABLE(code, RandALCat_codepoint_ranges))
 		{
@@ -1181,12 +1181,12 @@ pg_saslprep(const char *input, char **output)
 
 	if (contains_RandALCat)
 	{
-		pg_wchar	first = input_chars[0];
-		pg_wchar	last = input_chars[input_size - 1];
+		char32_t	first = input_chars[0];
+		char32_t	last = input_chars[input_size - 1];
 
 		for (i = 0; i < input_size; i++)
 		{
-			pg_wchar	code = input_chars[i];
+			char32_t	code = input_chars[i];
 
 			if (IS_CODE_IN_TABLE(code, LCat_codepoint_ranges))
 				goto prohibited;
diff --git a/src/common/unicode/case_test.c b/src/common/unicode/case_test.c
index fdfb62e8552..00d4f85e5a5 100644
--- a/src/common/unicode/case_test.c
+++ b/src/common/unicode/case_test.c
@@ -24,6 +24,7 @@
 #include "common/unicode_case.h"
 #include "common/unicode_category.h"
 #include "common/unicode_version.h"
+#include "mb/pg_wchar.h"
 
 /* enough to hold largest source or result string, including NUL */
 #define BUFSZ 256
@@ -54,7 +55,7 @@ initcap_wbnext(void *state)
 	while (wbstate->offset < wbstate->len &&
 		   wbstate->str[wbstate->offset] != '\0')
 	{
-		pg_wchar	u = utf8_to_unicode((unsigned char *) wbstate->str +
+		char32_t	u = utf8_to_unicode((unsigned char *) wbstate->str +
 										wbstate->offset);
 		bool		curr_alnum = pg_u_isalnum(u, wbstate->posix);
 
@@ -77,16 +78,16 @@ initcap_wbnext(void *state)
 #ifdef USE_ICU
 
 static void
-icu_test_simple(pg_wchar code)
+icu_test_simple(char32_t code)
 {
-	pg_wchar	lower = unicode_lowercase_simple(code);
-	pg_wchar	title = unicode_titlecase_simple(code);
-	pg_wchar	upper = unicode_uppercase_simple(code);
-	pg_wchar	fold = unicode_casefold_simple(code);
-	pg_wchar	iculower = u_tolower(code);
-	pg_wchar	icutitle = u_totitle(code);
-	pg_wchar	icuupper = u_toupper(code);
-	pg_wchar	icufold = u_foldCase(code, U_FOLD_CASE_DEFAULT);
+	char32_t	lower = unicode_lowercase_simple(code);
+	char32_t	title = unicode_titlecase_simple(code);
+	char32_t	upper = unicode_uppercase_simple(code);
+	char32_t	fold = unicode_casefold_simple(code);
+	char32_t	iculower = u_tolower(code);
+	char32_t	icutitle = u_totitle(code);
+	char32_t	icuupper = u_toupper(code);
+	char32_t	icufold = u_foldCase(code, U_FOLD_CASE_DEFAULT);
 
 	if (lower != iculower || title != icutitle || upper != icuupper ||
 		fold != icufold)
@@ -172,7 +173,7 @@ test_icu(void)
 	int			successful = 0;
 	int			skipped_mismatch = 0;
 
-	for (pg_wchar code = 0; code <= 0x10ffff; code++)
+	for (char32_t code = 0; code <= 0x10ffff; code++)
 	{
 		pg_unicode_category category = unicode_category(code);
 
diff --git a/src/common/unicode/category_test.c b/src/common/unicode/category_test.c
index 5d37ba39196..1e8c1f7905f 100644
--- a/src/common/unicode/category_test.c
+++ b/src/common/unicode/category_test.c
@@ -22,6 +22,7 @@
 
 #include "common/unicode_category.h"
 #include "common/unicode_version.h"
+#include "mb/pg_wchar.h"
 
 static int	pg_unicode_version = 0;
 #ifdef USE_ICU
@@ -59,7 +60,7 @@ icu_test()
 	int			pg_skipped_codepoints = 0;
 	int			icu_skipped_codepoints = 0;
 
-	for (pg_wchar code = 0; code <= 0x10ffff; code++)
+	for (char32_t code = 0; code <= 0x10ffff; code++)
 	{
 		uint8_t		pg_category = unicode_category(code);
 		uint8_t		icu_category = u_charType(code);
diff --git a/src/common/unicode/generate-norm_test_table.pl b/src/common/unicode/generate-norm_test_table.pl
index 1b401be9409..1a8b908ff33 100644
--- a/src/common/unicode/generate-norm_test_table.pl
+++ b/src/common/unicode/generate-norm_test_table.pl
@@ -47,8 +47,8 @@ print $OUTPUT <<HEADER;
 typedef struct
 {
 	int			linenum;
-	pg_wchar	input[50];
-	pg_wchar	output[4][50];
+	char32_t	input[50];
+	char32_t	output[4][50];
 } pg_unicode_test;
 
 /* test table */
diff --git a/src/common/unicode/generate-unicode_case_table.pl b/src/common/unicode/generate-unicode_case_table.pl
index 5d9ddd62803..f71eb25c94e 100644
--- a/src/common/unicode/generate-unicode_case_table.pl
+++ b/src/common/unicode/generate-unicode_case_table.pl
@@ -270,7 +270,6 @@ print $OT <<"EOS";
  */
 
 #include "common/unicode_case.h"
-#include "mb/pg_wchar.h"
 
 /*
  * The maximum number of codepoints that can result from case mapping
@@ -297,7 +296,7 @@ typedef enum
 typedef struct
 {
 	int16		conditions;
-	pg_wchar	map[NCaseKind][MAX_CASE_EXPANSION];
+	char32_t	map[NCaseKind][MAX_CASE_EXPANSION];
 } pg_special_case;
 
 /*
@@ -430,7 +429,7 @@ foreach my $kind ('lower', 'title', 'upper', 'fold')
  * The entry case_map_${kind}[case_index(codepoint)] is the mapping for the
  * given codepoint.
  */
-static const pg_wchar case_map_$kind\[$index\] =
+static const char32_t case_map_$kind\[$index\] =
 {
 EOS
 
@@ -502,7 +501,7 @@ print $OT <<"EOS";
  * the offset into the mapping tables.
  */
 static inline uint16
-case_index(pg_wchar cp)
+case_index(char32_t cp)
 {
 	/* Fast path for codepoints < $fastpath_limit */
 	if (cp < $fastpath_limit)
diff --git a/src/common/unicode/generate-unicode_category_table.pl b/src/common/unicode/generate-unicode_category_table.pl
index abab5cd9696..7e094b13720 100644
--- a/src/common/unicode/generate-unicode_category_table.pl
+++ b/src/common/unicode/generate-unicode_category_table.pl
@@ -366,15 +366,15 @@ print $OT <<"EOS";
  */
 typedef struct
 {
-	uint32		first;			/* Unicode codepoint */
-	uint32		last;			/* Unicode codepoint */
+	char32_t	first;			/* Unicode codepoint */
+	char32_t	last;			/* Unicode codepoint */
 	uint8		category;		/* General Category */
 } pg_category_range;
 
 typedef struct
 {
-	uint32		first;			/* Unicode codepoint */
-	uint32		last;			/* Unicode codepoint */
+	char32_t	first;			/* Unicode codepoint */
+	char32_t	last;			/* Unicode codepoint */
 } pg_unicode_range;
 
 typedef struct
diff --git a/src/common/unicode/norm_test.c b/src/common/unicode/norm_test.c
index 25bc59463f2..058817f1719 100644
--- a/src/common/unicode/norm_test.c
+++ b/src/common/unicode/norm_test.c
@@ -20,7 +20,7 @@
 #include "norm_test_table.h"
 
 static char *
-print_wchar_str(const pg_wchar *s)
+print_wchar_str(const char32_t *s)
 {
 #define BUF_DIGITS 50
 	static char buf[BUF_DIGITS * 11 + 1];
@@ -41,7 +41,7 @@ print_wchar_str(const pg_wchar *s)
 }
 
 static int
-pg_wcscmp(const pg_wchar *s1, const pg_wchar *s2)
+pg_wcscmp(const char32_t *s1, const char32_t *s2)
 {
 	for (;;)
 	{
@@ -65,7 +65,7 @@ main(int argc, char **argv)
 	{
 		for (int form = 0; form < 4; form++)
 		{
-			pg_wchar   *result;
+			char32_t   *result;
 
 			result = unicode_normalize(form, test->input);
 
diff --git a/src/common/unicode_case.c b/src/common/unicode_case.c
index 073faf6a0d5..e5e494db43c 100644
--- a/src/common/unicode_case.c
+++ b/src/common/unicode_case.c
@@ -30,7 +30,7 @@ enum CaseMapResult
 /*
  * Map for each case kind.
  */
-static const pg_wchar *const casekind_map[NCaseKind] =
+static const char32_t *const casekind_map[NCaseKind] =
 {
 	[CaseLower] = case_map_lower,
 	[CaseTitle] = case_map_title,
@@ -38,42 +38,42 @@ static const pg_wchar *const casekind_map[NCaseKind] =
 	[CaseFold] = case_map_fold,
 };
 
-static pg_wchar find_case_map(pg_wchar ucs, const pg_wchar *map);
+static char32_t find_case_map(char32_t ucs, const char32_t *map);
 static size_t convert_case(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 						   CaseKind str_casekind, bool full, WordBoundaryNext wbnext,
 						   void *wbstate);
-static enum CaseMapResult casemap(pg_wchar u1, CaseKind casekind, bool full,
+static enum CaseMapResult casemap(char32_t u1, CaseKind casekind, bool full,
 								  const char *src, size_t srclen, size_t srcoff,
-								  pg_wchar *simple, const pg_wchar **special);
+								  char32_t *simple, const char32_t **special);
 
-pg_wchar
-unicode_lowercase_simple(pg_wchar code)
+char32_t
+unicode_lowercase_simple(char32_t code)
 {
-	pg_wchar	cp = find_case_map(code, case_map_lower);
+	char32_t	cp = find_case_map(code, case_map_lower);
 
 	return cp != 0 ? cp : code;
 }
 
-pg_wchar
-unicode_titlecase_simple(pg_wchar code)
+char32_t
+unicode_titlecase_simple(char32_t code)
 {
-	pg_wchar	cp = find_case_map(code, case_map_title);
+	char32_t	cp = find_case_map(code, case_map_title);
 
 	return cp != 0 ? cp : code;
 }
 
-pg_wchar
-unicode_uppercase_simple(pg_wchar code)
+char32_t
+unicode_uppercase_simple(char32_t code)
 {
-	pg_wchar	cp = find_case_map(code, case_map_upper);
+	char32_t	cp = find_case_map(code, case_map_upper);
 
 	return cp != 0 ? cp : code;
 }
 
-pg_wchar
-unicode_casefold_simple(pg_wchar code)
+char32_t
+unicode_casefold_simple(char32_t code)
 {
-	pg_wchar	cp = find_case_map(code, case_map_fold);
+	char32_t	cp = find_case_map(code, case_map_fold);
 
 	return cp != 0 ? cp : code;
 }
@@ -231,10 +231,10 @@ convert_case(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 
 	while ((srclen < 0 || srcoff < srclen) && src[srcoff] != '\0')
 	{
-		pg_wchar	u1 = utf8_to_unicode((unsigned char *) src + srcoff);
+		char32_t	u1 = utf8_to_unicode((unsigned char *) src + srcoff);
 		int			u1len = unicode_utf8len(u1);
-		pg_wchar	simple = 0;
-		const pg_wchar *special = NULL;
+		char32_t	simple = 0;
+		const char32_t *special = NULL;
 		enum CaseMapResult casemap_result;
 
 		if (str_casekind == CaseTitle)
@@ -265,8 +265,8 @@ convert_case(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			case CASEMAP_SIMPLE:
 				{
 					/* replace with single character */
-					pg_wchar	u2 = simple;
-					pg_wchar	u2len = unicode_utf8len(u2);
+					char32_t	u2 = simple;
+					char32_t	u2len = unicode_utf8len(u2);
 
 					Assert(special == NULL);
 					if (result_len + u2len <= dstsize)
@@ -280,7 +280,7 @@ convert_case(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 				Assert(simple == 0);
 				for (int i = 0; i < MAX_CASE_EXPANSION && special[i]; i++)
 				{
-					pg_wchar	u2 = special[i];
+					char32_t	u2 = special[i];
 					size_t		u2len = unicode_utf8len(u2);
 
 					if (result_len + u2len <= dstsize)
@@ -320,7 +320,7 @@ check_final_sigma(const unsigned char *str, size_t len, size_t offset)
 	{
 		if ((str[i] & 0x80) == 0 || (str[i] & 0xC0) == 0xC0)
 		{
-			pg_wchar	curr = utf8_to_unicode(str + i);
+			char32_t	curr = utf8_to_unicode(str + i);
 
 			if (pg_u_prop_case_ignorable(curr))
 				continue;
@@ -344,7 +344,7 @@ check_final_sigma(const unsigned char *str, size_t len, size_t offset)
 	{
 		if ((str[i] & 0x80) == 0 || (str[i] & 0xC0) == 0xC0)
 		{
-			pg_wchar	curr = utf8_to_unicode(str + i);
+			char32_t	curr = utf8_to_unicode(str + i);
 
 			if (pg_u_prop_case_ignorable(curr))
 				continue;
@@ -394,9 +394,9 @@ check_special_conditions(int conditions, const char *str, size_t len,
  * character without modification.
  */
 static enum CaseMapResult
-casemap(pg_wchar u1, CaseKind casekind, bool full,
+casemap(char32_t u1, CaseKind casekind, bool full,
 		const char *src, size_t srclen, size_t srcoff,
-		pg_wchar *simple, const pg_wchar **special)
+		char32_t *simple, const char32_t **special)
 {
 	uint16		idx;
 
@@ -434,8 +434,8 @@ casemap(pg_wchar u1, CaseKind casekind, bool full,
  * Find entry in simple case map.
  * If the entry does not exist, 0 will be returned.
  */
-static pg_wchar
-find_case_map(pg_wchar ucs, const pg_wchar *map)
+static char32_t
+find_case_map(char32_t ucs, const char32_t *map)
 {
 	/* Fast path for codepoints < 0x80 */
 	if (ucs < 0x80)
diff --git a/src/common/unicode_category.c b/src/common/unicode_category.c
index 4136c4d4f92..aab667a7bb4 100644
--- a/src/common/unicode_category.c
+++ b/src/common/unicode_category.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  * unicode_category.c
  *		Determine general category and character properties of Unicode
- *		characters. Encoding must be UTF8, where we assume that the pg_wchar
+ *		characters. Encoding must be UTF8, where we assume that the char32_t
  *		representation is a code point.
  *
  * Portions Copyright (c) 2017-2025, PostgreSQL Global Development Group
@@ -76,13 +76,13 @@
 #define PG_U_CHARACTER_TAB	0x09
 
 static bool range_search(const pg_unicode_range *tbl, size_t size,
-						 pg_wchar code);
+						 char32_t code);
 
 /*
  * Unicode general category for the given codepoint.
  */
 pg_unicode_category
-unicode_category(pg_wchar code)
+unicode_category(char32_t code)
 {
 	int			min = 0;
 	int			mid;
@@ -108,7 +108,7 @@ unicode_category(pg_wchar code)
 }
 
 bool
-pg_u_prop_alphabetic(pg_wchar code)
+pg_u_prop_alphabetic(char32_t code)
 {
 	if (code < 0x80)
 		return unicode_opt_ascii[code].properties & PG_U_PROP_ALPHABETIC;
@@ -119,7 +119,7 @@ pg_u_prop_alphabetic(pg_wchar code)
 }
 
 bool
-pg_u_prop_lowercase(pg_wchar code)
+pg_u_prop_lowercase(char32_t code)
 {
 	if (code < 0x80)
 		return unicode_opt_ascii[code].properties & PG_U_PROP_LOWERCASE;
@@ -130,7 +130,7 @@ pg_u_prop_lowercase(pg_wchar code)
 }
 
 bool
-pg_u_prop_uppercase(pg_wchar code)
+pg_u_prop_uppercase(char32_t code)
 {
 	if (code < 0x80)
 		return unicode_opt_ascii[code].properties & PG_U_PROP_UPPERCASE;
@@ -141,7 +141,7 @@ pg_u_prop_uppercase(pg_wchar code)
 }
 
 bool
-pg_u_prop_cased(pg_wchar code)
+pg_u_prop_cased(char32_t code)
 {
 	uint32		category_mask;
 
@@ -156,7 +156,7 @@ pg_u_prop_cased(pg_wchar code)
 }
 
 bool
-pg_u_prop_case_ignorable(pg_wchar code)
+pg_u_prop_case_ignorable(char32_t code)
 {
 	if (code < 0x80)
 		return unicode_opt_ascii[code].properties & PG_U_PROP_CASE_IGNORABLE;
@@ -167,7 +167,7 @@ pg_u_prop_case_ignorable(pg_wchar code)
 }
 
 bool
-pg_u_prop_white_space(pg_wchar code)
+pg_u_prop_white_space(char32_t code)
 {
 	if (code < 0x80)
 		return unicode_opt_ascii[code].properties & PG_U_PROP_WHITE_SPACE;
@@ -178,7 +178,7 @@ pg_u_prop_white_space(pg_wchar code)
 }
 
 bool
-pg_u_prop_hex_digit(pg_wchar code)
+pg_u_prop_hex_digit(char32_t code)
 {
 	if (code < 0x80)
 		return unicode_opt_ascii[code].properties & PG_U_PROP_HEX_DIGIT;
@@ -189,7 +189,7 @@ pg_u_prop_hex_digit(pg_wchar code)
 }
 
 bool
-pg_u_prop_join_control(pg_wchar code)
+pg_u_prop_join_control(char32_t code)
 {
 	if (code < 0x80)
 		return unicode_opt_ascii[code].properties & PG_U_PROP_JOIN_CONTROL;
@@ -208,7 +208,7 @@ pg_u_prop_join_control(pg_wchar code)
  */
 
 bool
-pg_u_isdigit(pg_wchar code, bool posix)
+pg_u_isdigit(char32_t code, bool posix)
 {
 	if (posix)
 		return ('0' <= code && code <= '9');
@@ -217,19 +217,19 @@ pg_u_isdigit(pg_wchar code, bool posix)
 }
 
 bool
-pg_u_isalpha(pg_wchar code)
+pg_u_isalpha(char32_t code)
 {
 	return pg_u_prop_alphabetic(code);
 }
 
 bool
-pg_u_isalnum(pg_wchar code, bool posix)
+pg_u_isalnum(char32_t code, bool posix)
 {
 	return pg_u_isalpha(code) || pg_u_isdigit(code, posix);
 }
 
 bool
-pg_u_isword(pg_wchar code)
+pg_u_isword(char32_t code)
 {
 	uint32		category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
 
@@ -240,32 +240,32 @@ pg_u_isword(pg_wchar code)
 }
 
 bool
-pg_u_isupper(pg_wchar code)
+pg_u_isupper(char32_t code)
 {
 	return pg_u_prop_uppercase(code);
 }
 
 bool
-pg_u_islower(pg_wchar code)
+pg_u_islower(char32_t code)
 {
 	return pg_u_prop_lowercase(code);
 }
 
 bool
-pg_u_isblank(pg_wchar code)
+pg_u_isblank(char32_t code)
 {
 	return code == PG_U_CHARACTER_TAB ||
 		unicode_category(code) == PG_U_SPACE_SEPARATOR;
 }
 
 bool
-pg_u_iscntrl(pg_wchar code)
+pg_u_iscntrl(char32_t code)
 {
 	return unicode_category(code) == PG_U_CONTROL;
 }
 
 bool
-pg_u_isgraph(pg_wchar code)
+pg_u_isgraph(char32_t code)
 {
 	uint32		category_mask = PG_U_CATEGORY_MASK(unicode_category(code));
 
@@ -276,7 +276,7 @@ pg_u_isgraph(pg_wchar code)
 }
 
 bool
-pg_u_isprint(pg_wchar code)
+pg_u_isprint(char32_t code)
 {
 	pg_unicode_category category = unicode_category(code);
 
@@ -287,7 +287,7 @@ pg_u_isprint(pg_wchar code)
 }
 
 bool
-pg_u_ispunct(pg_wchar code, bool posix)
+pg_u_ispunct(char32_t code, bool posix)
 {
 	uint32		category_mask;
 
@@ -308,13 +308,13 @@ pg_u_ispunct(pg_wchar code, bool posix)
 }
 
 bool
-pg_u_isspace(pg_wchar code)
+pg_u_isspace(char32_t code)
 {
 	return pg_u_prop_white_space(code);
 }
 
 bool
-pg_u_isxdigit(pg_wchar code, bool posix)
+pg_u_isxdigit(char32_t code, bool posix)
 {
 	if (posix)
 		return (('0' <= code && code <= '9') ||
@@ -478,7 +478,7 @@ unicode_category_abbrev(pg_unicode_category category)
  * given table.
  */
 static bool
-range_search(const pg_unicode_range *tbl, size_t size, pg_wchar code)
+range_search(const pg_unicode_range *tbl, size_t size, char32_t code)
 {
 	int			min = 0;
 	int			mid;
diff --git a/src/common/unicode_norm.c b/src/common/unicode_norm.c
index 6654b4cbc49..489d99cd5ab 100644
--- a/src/common/unicode_norm.c
+++ b/src/common/unicode_norm.c
@@ -69,7 +69,7 @@ conv_compare(const void *p1, const void *p2)
  * lookup, while the frontend version uses a binary search.
  */
 static const pg_unicode_decomposition *
-get_code_entry(pg_wchar code)
+get_code_entry(char32_t code)
 {
 #ifndef FRONTEND
 	int			h;
@@ -109,7 +109,7 @@ get_code_entry(pg_wchar code)
  * Get the combining class of the given codepoint.
  */
 static uint8
-get_canonical_class(pg_wchar code)
+get_canonical_class(char32_t code)
 {
 	const pg_unicode_decomposition *entry = get_code_entry(code);
 
@@ -130,15 +130,15 @@ get_canonical_class(pg_wchar code)
  * Note: the returned pointer can point to statically allocated buffer, and
  * is only valid until next call to this function!
  */
-static const pg_wchar *
+static const char32_t *
 get_code_decomposition(const pg_unicode_decomposition *entry, int *dec_size)
 {
-	static pg_wchar x;
+	static char32_t x;
 
 	if (DECOMPOSITION_IS_INLINE(entry))
 	{
 		Assert(DECOMPOSITION_SIZE(entry) == 1);
-		x = (pg_wchar) entry->dec_index;
+		x = (char32_t) entry->dec_index;
 		*dec_size = 1;
 		return &x;
 	}
@@ -156,7 +156,7 @@ get_code_decomposition(const pg_unicode_decomposition *entry, int *dec_size)
  * are, in turn, decomposable.
  */
 static int
-get_decomposed_size(pg_wchar code, bool compat)
+get_decomposed_size(char32_t code, bool compat)
 {
 	const pg_unicode_decomposition *entry;
 	int			size = 0;
@@ -318,7 +318,7 @@ recompose_code(uint32 start, uint32 code, uint32 *result)
  * in the array result.
  */
 static void
-decompose_code(pg_wchar code, bool compat, pg_wchar **result, int *current)
+decompose_code(char32_t code, bool compat, char32_t **result, int *current)
 {
 	const pg_unicode_decomposition *entry;
 	int			i;
@@ -337,7 +337,7 @@ decompose_code(pg_wchar code, bool compat, pg_wchar **result, int *current)
 					v,
 					tindex,
 					sindex;
-		pg_wchar   *res = *result;
+		char32_t   *res = *result;
 
 		sindex = code - SBASE;
 		l = LBASE + sindex / (VCOUNT * TCOUNT);
@@ -369,7 +369,7 @@ decompose_code(pg_wchar code, bool compat, pg_wchar **result, int *current)
 	if (entry == NULL || DECOMPOSITION_SIZE(entry) == 0 ||
 		(!compat && DECOMPOSITION_IS_COMPAT(entry)))
 	{
-		pg_wchar   *res = *result;
+		char32_t   *res = *result;
 
 		res[*current] = code;
 		(*current)++;
@@ -382,7 +382,7 @@ decompose_code(pg_wchar code, bool compat, pg_wchar **result, int *current)
 	decomp = get_code_decomposition(entry, &dec_size);
 	for (i = 0; i < dec_size; i++)
 	{
-		pg_wchar	lcode = (pg_wchar) decomp[i];
+		char32_t	lcode = (char32_t) decomp[i];
 
 		/* Leave if no more decompositions */
 		decompose_code(lcode, compat, result, current);
@@ -398,17 +398,17 @@ decompose_code(pg_wchar code, bool compat, pg_wchar **result, int *current)
  * malloc. Or NULL if we run out of memory. In backend, the returned
  * string is palloc'd instead, and OOM is reported with ereport().
  */
-pg_wchar *
-unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input)
+char32_t *
+unicode_normalize(UnicodeNormalizationForm form, const char32_t *input)
 {
 	bool		compat = (form == UNICODE_NFKC || form == UNICODE_NFKD);
 	bool		recompose = (form == UNICODE_NFC || form == UNICODE_NFKC);
-	pg_wchar   *decomp_chars;
-	pg_wchar   *recomp_chars;
+	char32_t   *decomp_chars;
+	char32_t   *recomp_chars;
 	int			decomp_size,
 				current_size;
 	int			count;
-	const pg_wchar *p;
+	const char32_t *p;
 
 	/* variables for recomposition */
 	int			last_class;
@@ -425,7 +425,7 @@ unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input)
 	for (p = input; *p; p++)
 		decomp_size += get_decomposed_size(*p, compat);
 
-	decomp_chars = (pg_wchar *) ALLOC((decomp_size + 1) * sizeof(pg_wchar));
+	decomp_chars = (char32_t *) ALLOC((decomp_size + 1) * sizeof(char32_t));
 	if (decomp_chars == NULL)
 		return NULL;
 
@@ -448,9 +448,9 @@ unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input)
 	 */
 	for (count = 1; count < decomp_size; count++)
 	{
-		pg_wchar	prev = decomp_chars[count - 1];
-		pg_wchar	next = decomp_chars[count];
-		pg_wchar	tmp;
+		char32_t	prev = decomp_chars[count - 1];
+		char32_t	next = decomp_chars[count];
+		char32_t	tmp;
 		const uint8 prevClass = get_canonical_class(prev);
 		const uint8 nextClass = get_canonical_class(next);
 
@@ -487,7 +487,7 @@ unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input)
 	 * longer than the decomposed one, so make the allocation of the output
 	 * string based on that assumption.
 	 */
-	recomp_chars = (pg_wchar *) ALLOC((decomp_size + 1) * sizeof(pg_wchar));
+	recomp_chars = (char32_t *) ALLOC((decomp_size + 1) * sizeof(char32_t));
 	if (!recomp_chars)
 	{
 		FREE(decomp_chars);
@@ -501,9 +501,9 @@ unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input)
 
 	for (count = 1; count < decomp_size; count++)
 	{
-		pg_wchar	ch = decomp_chars[count];
+		char32_t	ch = decomp_chars[count];
 		int			ch_class = get_canonical_class(ch);
-		pg_wchar	composite;
+		char32_t	composite;
 
 		if (last_class < ch_class &&
 			recompose_code(starter_ch, ch, &composite))
@@ -524,7 +524,7 @@ unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input)
 			recomp_chars[target_pos++] = ch;
 		}
 	}
-	recomp_chars[target_pos] = (pg_wchar) '\0';
+	recomp_chars[target_pos] = (char32_t) '\0';
 
 	FREE(decomp_chars);
 
@@ -540,7 +540,7 @@ unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input)
 #ifndef FRONTEND
 
 static const pg_unicode_normprops *
-qc_hash_lookup(pg_wchar ch, const pg_unicode_norminfo *norminfo)
+qc_hash_lookup(char32_t ch, const pg_unicode_norminfo *norminfo)
 {
 	int			h;
 	uint32		hashkey;
@@ -571,7 +571,7 @@ qc_hash_lookup(pg_wchar ch, const pg_unicode_norminfo *norminfo)
  * Look up the normalization quick check character property
  */
 static UnicodeNormalizationQC
-qc_is_allowed(UnicodeNormalizationForm form, pg_wchar ch)
+qc_is_allowed(UnicodeNormalizationForm form, char32_t ch)
 {
 	const pg_unicode_normprops *found = NULL;
 
@@ -595,7 +595,7 @@ qc_is_allowed(UnicodeNormalizationForm form, pg_wchar ch)
 }
 
 UnicodeNormalizationQC
-unicode_is_normalized_quickcheck(UnicodeNormalizationForm form, const pg_wchar *input)
+unicode_is_normalized_quickcheck(UnicodeNormalizationForm form, const char32_t *input)
 {
 	uint8		lastCanonicalClass = 0;
 	UnicodeNormalizationQC result = UNICODE_NORM_QC_YES;
@@ -610,9 +610,9 @@ unicode_is_normalized_quickcheck(UnicodeNormalizationForm form, const pg_wchar *
 	if (form == UNICODE_NFD || form == UNICODE_NFKD)
 		return UNICODE_NORM_QC_MAYBE;
 
-	for (const pg_wchar *p = input; *p; p++)
+	for (const char32_t *p = input; *p; p++)
 	{
-		pg_wchar	ch = *p;
+		char32_t	ch = *p;
 		uint8		canonicalClass;
 		UnicodeNormalizationQC check;
 
diff --git a/src/fe_utils/mbprint.c b/src/fe_utils/mbprint.c
index eb3eeee9925..abffdbe18a2 100644
--- a/src/fe_utils/mbprint.c
+++ b/src/fe_utils/mbprint.c
@@ -49,20 +49,20 @@ pg_get_utf8_id(void)
  *
  * No error checks here, c must point to a long-enough string.
  */
-static pg_wchar
+static char32_t
 utf8_to_unicode(const unsigned char *c)
 {
 	if ((*c & 0x80) == 0)
-		return (pg_wchar) c[0];
+		return (char32_t) c[0];
 	else if ((*c & 0xe0) == 0xc0)
-		return (pg_wchar) (((c[0] & 0x1f) << 6) |
+		return (char32_t) (((c[0] & 0x1f) << 6) |
 						   (c[1] & 0x3f));
 	else if ((*c & 0xf0) == 0xe0)
-		return (pg_wchar) (((c[0] & 0x0f) << 12) |
+		return (char32_t) (((c[0] & 0x0f) << 12) |
 						   ((c[1] & 0x3f) << 6) |
 						   (c[2] & 0x3f));
 	else if ((*c & 0xf8) == 0xf0)
-		return (pg_wchar) (((c[0] & 0x07) << 18) |
+		return (char32_t) (((c[0] & 0x07) << 18) |
 						   ((c[1] & 0x3f) << 12) |
 						   ((c[2] & 0x3f) << 6) |
 						   (c[3] & 0x3f));
diff --git a/src/include/c.h b/src/include/c.h
index 9ab5e617995..54fa20b0e83 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -513,6 +513,25 @@ typedef void (*pg_funcptr_t) (void);
 
 #include <stdbool.h>
 
+/*
+ * char16_t and char32_t
+ *      Unicode code points.
+ */
+#ifndef __cplusplus
+#ifdef HAVE_UCHAR_H
+#include <uchar.h>
+#ifndef __STDC_UTF_16__
+#error "char16_t must use UTF-16 encoding"
+#endif
+#ifndef __STDC_UTF_32__
+#error "char32_t must use UTF-32 encoding"
+#endif
+#else
+typedef uint16_t char16_t;
+typedef uint32_t char32_t;
+#endif
+#endif
+
 
 /* ----------------------------------------------------------------
  *				Section 3:	standard system types
diff --git a/src/include/common/unicode_case.h b/src/include/common/unicode_case.h
index 41e2c1f4b33..6bcffd349c2 100644
--- a/src/include/common/unicode_case.h
+++ b/src/include/common/unicode_case.h
@@ -14,14 +14,12 @@
 #ifndef UNICODE_CASE_H
 #define UNICODE_CASE_H
 
-#include "mb/pg_wchar.h"
-
 typedef size_t (*WordBoundaryNext) (void *wbstate);
 
-pg_wchar	unicode_lowercase_simple(pg_wchar code);
-pg_wchar	unicode_titlecase_simple(pg_wchar code);
-pg_wchar	unicode_uppercase_simple(pg_wchar code);
-pg_wchar	unicode_casefold_simple(pg_wchar code);
+char32_t	unicode_lowercase_simple(char32_t code);
+char32_t	unicode_titlecase_simple(char32_t code);
+char32_t	unicode_uppercase_simple(char32_t code);
+char32_t	unicode_casefold_simple(char32_t code);
 size_t		unicode_strlower(char *dst, size_t dstsize, const char *src,
 							 ssize_t srclen, bool full);
 size_t		unicode_strtitle(char *dst, size_t dstsize, const char *src,
diff --git a/src/include/common/unicode_case_table.h b/src/include/common/unicode_case_table.h
index d5311786582..0a14fb2d97b 100644
--- a/src/include/common/unicode_case_table.h
+++ b/src/include/common/unicode_case_table.h
@@ -18,7 +18,6 @@
  */
 
 #include "common/unicode_case.h"
-#include "mb/pg_wchar.h"
 
 /*
  * The maximum number of codepoints that can result from case mapping
@@ -45,7 +44,7 @@ typedef enum
 typedef struct
 {
 	int16		conditions;
-	pg_wchar	map[NCaseKind][MAX_CASE_EXPANSION];
+	char32_t	map[NCaseKind][MAX_CASE_EXPANSION];
 } pg_special_case;
 
 /*
@@ -166,7 +165,7 @@ static const pg_special_case special_case[106] =
  * The entry case_map_lower[case_index(codepoint)] is the mapping for the
  * given codepoint.
  */
-static const pg_wchar case_map_lower[1704] =
+static const char32_t case_map_lower[1704] =
 {
 	0x000000,					/* reserved */
 	0x000000,					/* U+000000 */
@@ -1879,7 +1878,7 @@ static const pg_wchar case_map_lower[1704] =
  * The entry case_map_title[case_index(codepoint)] is the mapping for the
  * given codepoint.
  */
-static const pg_wchar case_map_title[1704] =
+static const char32_t case_map_title[1704] =
 {
 	0x000000,					/* reserved */
 	0x000000,					/* U+000000 */
@@ -3592,7 +3591,7 @@ static const pg_wchar case_map_title[1704] =
  * The entry case_map_upper[case_index(codepoint)] is the mapping for the
  * given codepoint.
  */
-static const pg_wchar case_map_upper[1704] =
+static const char32_t case_map_upper[1704] =
 {
 	0x000000,					/* reserved */
 	0x000000,					/* U+000000 */
@@ -5305,7 +5304,7 @@ static const pg_wchar case_map_upper[1704] =
  * The entry case_map_fold[case_index(codepoint)] is the mapping for the
  * given codepoint.
  */
-static const pg_wchar case_map_fold[1704] =
+static const char32_t case_map_fold[1704] =
 {
 	0x000000,					/* reserved */
 	0x000000,					/* U+000000 */
@@ -13522,7 +13521,7 @@ static const uint16 case_map[4778] =
  * the offset into the mapping tables.
  */
 static inline uint16
-case_index(pg_wchar cp)
+case_index(char32_t cp)
 {
 	/* Fast path for codepoints < 0x0588 */
 	if (cp < 0x0588)
diff --git a/src/include/common/unicode_category.h b/src/include/common/unicode_category.h
index 8fd8b67a416..684143d3c8a 100644
--- a/src/include/common/unicode_category.h
+++ b/src/include/common/unicode_category.h
@@ -14,8 +14,6 @@
 #ifndef UNICODE_CATEGORY_H
 #define UNICODE_CATEGORY_H
 
-#include "mb/pg_wchar.h"
-
 /*
  * Unicode General Category Values
  *
@@ -61,31 +59,31 @@ typedef enum pg_unicode_category
 	PG_U_FINAL_PUNCTUATION = 29 /* Pf */
 } pg_unicode_category;
 
-extern pg_unicode_category unicode_category(pg_wchar code);
+extern pg_unicode_category unicode_category(char32_t code);
 extern const char *unicode_category_string(pg_unicode_category category);
 extern const char *unicode_category_abbrev(pg_unicode_category category);
 
-extern bool pg_u_prop_alphabetic(pg_wchar code);
-extern bool pg_u_prop_lowercase(pg_wchar code);
-extern bool pg_u_prop_uppercase(pg_wchar code);
-extern bool pg_u_prop_cased(pg_wchar code);
-extern bool pg_u_prop_case_ignorable(pg_wchar code);
-extern bool pg_u_prop_white_space(pg_wchar code);
-extern bool pg_u_prop_hex_digit(pg_wchar code);
-extern bool pg_u_prop_join_control(pg_wchar code);
+extern bool pg_u_prop_alphabetic(char32_t code);
+extern bool pg_u_prop_lowercase(char32_t code);
+extern bool pg_u_prop_uppercase(char32_t code);
+extern bool pg_u_prop_cased(char32_t code);
+extern bool pg_u_prop_case_ignorable(char32_t code);
+extern bool pg_u_prop_white_space(char32_t code);
+extern bool pg_u_prop_hex_digit(char32_t code);
+extern bool pg_u_prop_join_control(char32_t code);
 
-extern bool pg_u_isdigit(pg_wchar code, bool posix);
-extern bool pg_u_isalpha(pg_wchar code);
-extern bool pg_u_isalnum(pg_wchar code, bool posix);
-extern bool pg_u_isword(pg_wchar code);
-extern bool pg_u_isupper(pg_wchar code);
-extern bool pg_u_islower(pg_wchar code);
-extern bool pg_u_isblank(pg_wchar code);
-extern bool pg_u_iscntrl(pg_wchar code);
-extern bool pg_u_isgraph(pg_wchar code);
-extern bool pg_u_isprint(pg_wchar code);
-extern bool pg_u_ispunct(pg_wchar code, bool posix);
-extern bool pg_u_isspace(pg_wchar code);
-extern bool pg_u_isxdigit(pg_wchar code, bool posix);
+extern bool pg_u_isdigit(char32_t code, bool posix);
+extern bool pg_u_isalpha(char32_t code);
+extern bool pg_u_isalnum(char32_t code, bool posix);
+extern bool pg_u_isword(char32_t code);
+extern bool pg_u_isupper(char32_t code);
+extern bool pg_u_islower(char32_t code);
+extern bool pg_u_isblank(char32_t code);
+extern bool pg_u_iscntrl(char32_t code);
+extern bool pg_u_isgraph(char32_t code);
+extern bool pg_u_isprint(char32_t code);
+extern bool pg_u_ispunct(char32_t code, bool posix);
+extern bool pg_u_isspace(char32_t code);
+extern bool pg_u_isxdigit(char32_t code, bool posix);
 
 #endif							/* UNICODE_CATEGORY_H */
diff --git a/src/include/common/unicode_category_table.h b/src/include/common/unicode_category_table.h
index 95a1c65da7e..466a41b72b0 100644
--- a/src/include/common/unicode_category_table.h
+++ b/src/include/common/unicode_category_table.h
@@ -20,15 +20,15 @@
  */
 typedef struct
 {
-	uint32		first;			/* Unicode codepoint */
-	uint32		last;			/* Unicode codepoint */
+	char32_t	first;			/* Unicode codepoint */
+	char32_t	last;			/* Unicode codepoint */
 	uint8		category;		/* General Category */
 } pg_category_range;
 
 typedef struct
 {
-	uint32		first;			/* Unicode codepoint */
-	uint32		last;			/* Unicode codepoint */
+	char32_t	first;			/* Unicode codepoint */
+	char32_t	last;			/* Unicode codepoint */
 } pg_unicode_range;
 
 typedef struct
diff --git a/src/include/common/unicode_norm.h b/src/include/common/unicode_norm.h
index 5bc3b79e78e..516c192cc4c 100644
--- a/src/include/common/unicode_norm.h
+++ b/src/include/common/unicode_norm.h
@@ -14,8 +14,6 @@
 #ifndef UNICODE_NORM_H
 #define UNICODE_NORM_H
 
-#include "mb/pg_wchar.h"
-
 typedef enum
 {
 	UNICODE_NFC = 0,
@@ -32,8 +30,8 @@ typedef enum
 	UNICODE_NORM_QC_MAYBE = -1,
 } UnicodeNormalizationQC;
 
-extern pg_wchar *unicode_normalize(UnicodeNormalizationForm form, const pg_wchar *input);
+extern char32_t *unicode_normalize(UnicodeNormalizationForm form, const char32_t *input);
 
-extern UnicodeNormalizationQC unicode_is_normalized_quickcheck(UnicodeNormalizationForm form, const pg_wchar *input);
+extern UnicodeNormalizationQC unicode_is_normalized_quickcheck(UnicodeNormalizationForm form, const char32_t *input);
 
 #endif							/* UNICODE_NORM_H */
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 4b4a9974b75..4d84bdc81e4 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -532,25 +532,25 @@ typedef uint32 (*utf_local_conversion_func) (uint32 code);
  * Some handy functions for Unicode-specific tests.
  */
 static inline bool
-is_valid_unicode_codepoint(pg_wchar c)
+is_valid_unicode_codepoint(char32_t c)
 {
 	return (c > 0 && c <= 0x10FFFF);
 }
 
 static inline bool
-is_utf16_surrogate_first(pg_wchar c)
+is_utf16_surrogate_first(char32_t c)
 {
 	return (c >= 0xD800 && c <= 0xDBFF);
 }
 
 static inline bool
-is_utf16_surrogate_second(pg_wchar c)
+is_utf16_surrogate_second(char32_t c)
 {
 	return (c >= 0xDC00 && c <= 0xDFFF);
 }
 
-static inline pg_wchar
-surrogate_pair_to_codepoint(pg_wchar first, pg_wchar second)
+static inline char32_t
+surrogate_pair_to_codepoint(char16_t first, char16_t second)
 {
 	return ((first & 0x3FF) << 10) + 0x10000 + (second & 0x3FF);
 }
@@ -561,20 +561,20 @@ surrogate_pair_to_codepoint(pg_wchar first, pg_wchar second)
  *
  * No error checks here, c must point to a long-enough string.
  */
-static inline pg_wchar
+static inline char32_t
 utf8_to_unicode(const unsigned char *c)
 {
 	if ((*c & 0x80) == 0)
-		return (pg_wchar) c[0];
+		return (char32_t) c[0];
 	else if ((*c & 0xe0) == 0xc0)
-		return (pg_wchar) (((c[0] & 0x1f) << 6) |
+		return (char32_t) (((c[0] & 0x1f) << 6) |
 						   (c[1] & 0x3f));
 	else if ((*c & 0xf0) == 0xe0)
-		return (pg_wchar) (((c[0] & 0x0f) << 12) |
+		return (char32_t) (((c[0] & 0x0f) << 12) |
 						   ((c[1] & 0x3f) << 6) |
 						   (c[2] & 0x3f));
 	else if ((*c & 0xf8) == 0xf0)
-		return (pg_wchar) (((c[0] & 0x07) << 18) |
+		return (char32_t) (((c[0] & 0x07) << 18) |
 						   ((c[1] & 0x3f) << 12) |
 						   ((c[2] & 0x3f) << 6) |
 						   (c[3] & 0x3f));
@@ -588,7 +588,7 @@ utf8_to_unicode(const unsigned char *c)
  * unicode_utf8len(c) bytes available.
  */
 static inline unsigned char *
-unicode_to_utf8(pg_wchar c, unsigned char *utf8string)
+unicode_to_utf8(char32_t c, unsigned char *utf8string)
 {
 	if (c <= 0x7F)
 	{
@@ -620,7 +620,7 @@ unicode_to_utf8(pg_wchar c, unsigned char *utf8string)
  * Number of bytes needed to represent the given char in UTF8.
  */
 static inline int
-unicode_utf8len(pg_wchar c)
+unicode_utf8len(char32_t c)
 {
 	if (c <= 0x7F)
 		return 1;
@@ -676,8 +676,8 @@ extern int	pg_valid_server_encoding(const char *name);
 extern bool is_encoding_supported_by_icu(int encoding);
 extern const char *get_encoding_name_for_icu(int encoding);
 
-extern unsigned char *unicode_to_utf8(pg_wchar c, unsigned char *utf8string);
-extern pg_wchar utf8_to_unicode(const unsigned char *c);
+extern unsigned char *unicode_to_utf8(char32_t c, unsigned char *utf8string);
+extern char32_t utf8_to_unicode(const unsigned char *c);
 extern bool pg_utf8_islegal(const unsigned char *source, int length);
 extern int	pg_utf_mblen(const unsigned char *s);
 extern int	pg_mule_mblen(const unsigned char *s);
@@ -739,8 +739,8 @@ extern char *pg_server_to_client(const char *s, int len);
 extern char *pg_any_to_server(const char *s, int len, int encoding);
 extern char *pg_server_to_any(const char *s, int len, int encoding);
 
-extern void pg_unicode_to_server(pg_wchar c, unsigned char *s);
-extern bool pg_unicode_to_server_noerror(pg_wchar c, unsigned char *s);
+extern void pg_unicode_to_server(char32_t c, unsigned char *s);
+extern bool pg_unicode_to_server_noerror(char32_t c, unsigned char *s);
 
 extern unsigned short BIG5toCNS(unsigned short big5, unsigned char *lc);
 extern unsigned short CNStoBIG5(unsigned short cns, unsigned char lc);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 43fe3bcd593..ce382c546e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3506,6 +3506,8 @@ cb_cleanup_dir
 cb_options
 cb_tablespace
 cb_tablespace_mapping
+char16_t
+char32_t
 check_agg_arguments_context
 check_function_callback
 check_network_data
-- 
2.43.0

Re: C11: should we use char32_t for unicode code points?

Reply via email to