Remember the discussion about error handling while parsing/scanning
multibyte strings, that we had in July 2023? Paul coined the terms
"MEE" and "SEE".
<https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00145.html>

Now I got interested in
  - whether the mb*iter* modules actually implement MEE,
  - what's the behavioural difference between MEE and SEE, function by
    function.

As a first step to understanding this, I'm enhancing the unit tests
to cover incomplete characters, both at the end of the string and
inside a string.


2026-05-25  Bruno Haible  <[email protected]>

        trim tests: Enhance tests.
        * tests/test-trim.c (main): Add test cases with incomplete characters.

2026-05-25  Bruno Haible  <[email protected]>

        mbmemcasecmp tests: Enhance tests.
        * tests/test-mbmemcasecmp.h (test_utf_8): Add test cases with incomplete
        characters.

2026-05-25  Bruno Haible  <[email protected]>

        mbspcasecmp tests: Enhance tests.
        * tests/test-mbspcasecmp.c (test_ascii): New function, extracted from
        main.
        (test_utf_8): Likewise. Add test cases with incomplete characters.
        (main): Invoke them. Accept a numeric argument.
        * tests/test-mbspcasecmp-4.sh: Renamed from tests/test-mbspcasecmp.sh.
        * tests/test-mbspcasecmp-3.sh: New file, based on
        tests/test-mbmemcasecmp-3.sh.
        * modules/mbspcasecmp-tests (Files): Update after rename. Add
        locale-en.m4, locale-fr.m4.
        (configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
        (Makefile.am): Arrange to run test-mbspcasecmp-3.sh,
        test-mbspcasecmp-4.sh, instead of test-mbspcasecmp.sh.

2026-05-25  Bruno Haible  <[email protected]>

        mbsncasecmp tests: Enhance tests.
        * tests/test-mbsncasecmp.c (test_ascii): New function, extracted from
        main.
        (test_utf_8): Likewise. Add test cases with incomplete characters.
        (main): Invoke them. Accept a numeric argument.
        * tests/test-mbsncasecmp-4.sh: Renamed from tests/test-mbsncasecmp.sh.
        * tests/test-mbsncasecmp-3.sh: New file, based on
        tests/test-mbmemcasecmp-3.sh.
        * modules/mbsncasecmp-tests (Files): Update after rename. Add
        locale-en.m4, locale-fr.m4.
        (configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
        (Makefile.am): Arrange to run test-mbsncasecmp-3.sh,
        test-mbsncasecmp-4.sh, instead of test-mbsncasecmp.sh.

2026-05-25  Bruno Haible  <[email protected]>

        mbscasecmp tests: Enhance tests.
        * tests/test-mbscasecmp.c (test_ascii): New function, extracted from
        main.
        (test_utf_8): Likewise. Add test cases with incomplete characters.
        (main): Invoke them. Accept a numeric argument.
        * tests/test-mbscasecmp-4.sh: Renamed from tests/test-mbscasecmp.sh.
        * tests/test-mbscasecmp-3.sh: New file, based on
        tests/test-mbmemcasecmp-3.sh.
        * modules/mbscasecmp-tests (Files): Update after rename. Add
        locale-en.m4, locale-fr.m4.
        (configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
        (Makefile.am): Arrange to run test-mbscasecmp-3.sh,
        test-mbscasecmp-4.sh, instead of test-mbscasecmp.sh.

2026-05-25  Bruno Haible  <[email protected]>

        mbs_endswith tests: Enhance tests.
        * tests/test-mbs_endswith2.c (main): Add more test cases. Add more
        comments.
        * tests/test-mbs_endswith1.c: Update comments.
        * tests/test-mbs_endswith3.c: Likewise.

2026-05-25  Bruno Haible  <[email protected]>

        mbs_startswith tests: Enhance tests.
        * tests/test-mbs_startswith2.c (OR): New macro, copied from
        tests/test-mbsnlen.c.
        (main): Add more test cases. Add more comments.
        * tests/test-mbs_startswith1.c: Update comments.
        * tests/test-mbs_startswith3.c: Likewise.

>From 900e90c433d17a467a26522b051e3f527102b289 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 17:55:56 +0200
Subject: [PATCH 1/7] mbs_startswith tests: Enhance tests.

* tests/test-mbs_startswith2.c (OR): New macro, copied from
tests/test-mbsnlen.c.
(main): Add more test cases. Add more comments.
* tests/test-mbs_startswith1.c: Update comments.
* tests/test-mbs_startswith3.c: Likewise.
---
 ChangeLog                    |  9 ++++
 tests/test-mbs_startswith1.c |  4 +-
 tests/test-mbs_startswith2.c | 79 +++++++++++++++++++++++++++++++++++-
 tests/test-mbs_startswith3.c |  2 +-
 4 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 556c9bf5b5..3c72ed6dc8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2026-05-25  Bruno Haible  <[email protected]>
+
+	mbs_startswith tests: Enhance tests.
+	* tests/test-mbs_startswith2.c (OR): New macro, copied from
+	tests/test-mbsnlen.c.
+	(main): Add more test cases. Add more comments.
+	* tests/test-mbs_startswith1.c: Update comments.
+	* tests/test-mbs_startswith3.c: Likewise.
+
 2026-05-24  Paul Eggert  <[email protected]>
 
 	regex: pacify 16.1.1 -Wanalyzer-out-of-bounds
diff --git a/tests/test-mbs_startswith1.c b/tests/test-mbs_startswith1.c
index a1b89fa4a5..4c2e8b409d 100644
--- a/tests/test-mbs_startswith1.c
+++ b/tests/test-mbs_startswith1.c
@@ -1,4 +1,4 @@
-/* Test of mbs_startswith() function.
+/* Test of mbs_startswith() function in the "C" locale.
    Copyright (C) 2025-2026 Free Software Foundation, Inc.
 
    This program is free software: you can redistribute it and/or modify
@@ -27,7 +27,7 @@
 int
 main ()
 {
-  /* This test is executed in the C locale.  */
+  /* This test is executed in the "C" locale.  */
 
   ASSERT (mbs_startswith ("", ""));
   ASSERT (mbs_startswith ("abc", ""));
diff --git a/tests/test-mbs_startswith2.c b/tests/test-mbs_startswith2.c
index 38c53dd12b..0ab6a9eeaf 100644
--- a/tests/test-mbs_startswith2.c
+++ b/tests/test-mbs_startswith2.c
@@ -1,4 +1,4 @@
-/* Test of mbs_startswith() function.
+/* Test of mbs_startswith() function in a UTF-8 locale.
    Copyright (C) 2025-2026 Free Software Foundation, Inc.
 
    This program is free software: you can redistribute it and/or modify
@@ -25,6 +25,20 @@
 
 #include "macros.h"
 
+/* The mcel-based implementation of mbsnlen behaves differently than the
+   original one.  Namely, for invalid/incomplete byte sequences:
+   Where we ideally should have multi-byte-per-encoding-error (MEE) behaviour
+   everywhere, mcel implements single-byte-per-encoding-error (SEE) behaviour.
+   See <https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00131.html>,
+       <https://lists.gnu.org/archive/html/bug-gnulib/2023-07/msg00145.html>.
+   Therefore, here we have different expected results, depending on the
+   implementation.  */
+#if GNULIB_MCEL_PREFER
+# define OR(a,b) b
+#else
+# define OR(a,b) a
+#endif
+
 int
 main ()
 {
@@ -70,26 +84,89 @@ main ()
   /* Test cases with invalid or incomplete characters.  */
 
   /* A valid character should not match an invalid character.  */
+  /* "\301\247" = 0xC1 0xA7 is invalid.
+     In fact, "\301" = 0xC1 is already invalid, see
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+   */
   ASSERT (!mbs_startswith ("\303\247", "\301\247"));
   ASSERT (!mbs_startswith ("\301\247", "\303\247"));
 
   /* A valid character should not match an incomplete character.  */
+  /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid.  */
   ASSERT (!mbs_startswith ("\303\247", "\343\247"));
   ASSERT (!mbs_startswith ("\343\247", "\303\247"));
+  ASSERT (!mbs_startswith ("\343\247\214", "\343\247"));
+  ASSERT (!mbs_startswith ("\343\247\214", "\343"));
 
   /* An invalid character should not match an incomplete character.  */
+  /* "\301\247" = 0xC1 0xA7 is invalid.
+     In fact, "\301" = 0xC1 is already invalid, see
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+   */
+  /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid.  */
   ASSERT (!mbs_startswith ("\301\247", "\343\247"));
   ASSERT (!mbs_startswith ("\343\247", "\301\247"));
 
+  /* Incomplete characters.  See
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+     page 128 table 3-11.  */
+  /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020.  */
+  ASSERT (!mbs_startswith ("\341\200\240", "\341\200"));
+  ASSERT (!mbs_startswith ("\341\200\240", "\341"));
+  ASSERT (mbs_startswith ("\341\200", "\341") == OR(false,true));
+  /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0.  */
+  ASSERT (!mbs_startswith ("\360\221\222\240", "\360\221\222"));
+  ASSERT (!mbs_startswith ("\360\221\222\240", "\360\221"));
+  ASSERT (!mbs_startswith ("\360\221\222\240", "\360"));
+  ASSERT (mbs_startswith ("\360\221\222", "\360\221") == OR(false,true));
+  ASSERT (mbs_startswith ("\360\221\222", "\360") == OR(false,true));
+  ASSERT (mbs_startswith ("\360\221", "\360") == OR(false,true));
+
+  /* "\355\240\200" = 0xED 0xA0 0x80 = U+D800 is invalid.
+     In fact, "\355\240" = 0xED 0xA0 is already invalid, see
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7
+     and page 128 table 3-9.  */
+#if 0
+  /* mbs_startswith ("\355\240\200", "\355\240") returns
+     - true on musl libc, macOS, Solaris 11.4, Cygwin, mingw, MSVC
+       and with GNULIB_MCEL_PREFER on newer glibc, FreeBSD, NetBSD, OpenBSD,
+     - false on older glibc (CentOS 5), Solaris 11 OpenIndiana/OmniOS,
+       and with !GNULIB_MCEL_PREFER on newer glibc, FreeBSD, NetBSD, OpenBSD. */
+  ASSERT (!mbs_startswith ("\355\240\200", "\355\240"));
+#endif
+#if 0
+  /* mbs_startswith ("\355\240\200", "\355") returns
+     - true on newer glibc, musl libc, macOS, FreeBSD, NetBSD, OpenBSD,
+       Solaris 11.4, Cygwin, mingw, MSVC,
+     - false on older glibc (CentOS 5), Solaris 11 OpenIndiana/OmniOS.  */
+  ASSERT (!mbs_startswith ("\355\240\200", "\355"));
+#endif
+#if GNULIB_MCEL_PREFER
+  /* Single-byte encoding error (SEE) */
+  ASSERT (mbs_startswith ("\355\240", "\355"));
+#elif 0
+  /* Multi-byte encoding error (MEE) */
+  /* mbs_startswith ("\355\240", "\355") returns
+     - true on musl libc, macOS, Solaris 11.4, Cygwin, mingw, MSVC,
+     - false on glibc, FreeBSD, NetBSD, OpenBSD, Solaris 11 OpenIndiana/OmniOS.
+   */
+  ASSERT (!mbs_startswith ("\355\240", "\355"));
+#endif
+
   /* Two invalid characters should match only if they are identical.  */
+  /* "\301\246" = 0xC1 0xA6 is invalid.  */
+  /* "\301\247" = 0xC1 0xA7 is invalid.  */
   ASSERT (!mbs_startswith ("\301\246", "\301\247"));
   ASSERT (!mbs_startswith ("\301\247", "\301\246"));
   ASSERT (mbs_startswith ("\301\247", "\301\247"));
 
   /* Two incomplete characters should match only if they are identical.  */
+  /* "\343\246" = 0xE3 0xA6 is incomplete, "\343\246\214" = U+398C is valid.  */
+  /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid.  */
   ASSERT (!mbs_startswith ("\343\246", "\343\247"));
   ASSERT (!mbs_startswith ("\343\247", "\343\246"));
   ASSERT (mbs_startswith ("\343\247", "\343\247"));
+  ASSERT (mbs_startswith ("\343\247", "\343") == OR(false,true));
 
   return test_exit_status;
 }
diff --git a/tests/test-mbs_startswith3.c b/tests/test-mbs_startswith3.c
index 1965070401..11d87562c7 100644
--- a/tests/test-mbs_startswith3.c
+++ b/tests/test-mbs_startswith3.c
@@ -1,4 +1,4 @@
-/* Test of mbs_startswith() function.
+/* Test of mbs_startswith() function in a GB18030 locale.
    Copyright (C) 2025-2026 Free Software Foundation, Inc.
 
    This program is free software: you can redistribute it and/or modify
-- 
2.54.0

>From 26621d07249663c0cfba331ee5295efd59bef0f7 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 17:56:44 +0200
Subject: [PATCH 2/7] mbs_endswith tests: Enhance tests.

* tests/test-mbs_endswith2.c (main): Add more test cases. Add more
comments.
* tests/test-mbs_endswith1.c: Update comments.
* tests/test-mbs_endswith3.c: Likewise.
---
 ChangeLog                  |  8 ++++++++
 tests/test-mbs_endswith1.c |  2 +-
 tests/test-mbs_endswith2.c | 30 +++++++++++++++++++++++++++++-
 tests/test-mbs_endswith3.c |  2 +-
 4 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 3c72ed6dc8..02ad380e55 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+2026-05-25  Bruno Haible  <[email protected]>
+
+	mbs_endswith tests: Enhance tests.
+	* tests/test-mbs_endswith2.c (main): Add more test cases. Add more
+	comments.
+	* tests/test-mbs_endswith1.c: Update comments.
+	* tests/test-mbs_endswith3.c: Likewise.
+
 2026-05-25  Bruno Haible  <[email protected]>
 
 	mbs_startswith tests: Enhance tests.
diff --git a/tests/test-mbs_endswith1.c b/tests/test-mbs_endswith1.c
index 63722b0137..7742efbc42 100644
--- a/tests/test-mbs_endswith1.c
+++ b/tests/test-mbs_endswith1.c
@@ -1,4 +1,4 @@
-/* Test of mbs_endswith() function.
+/* Test of mbs_endswith() function in the "C" locale.
    Copyright (C) 2025-2026 Free Software Foundation, Inc.
 
    This program is free software: you can redistribute it and/or modify
diff --git a/tests/test-mbs_endswith2.c b/tests/test-mbs_endswith2.c
index 01c12f47ab..17ccc1f6e0 100644
--- a/tests/test-mbs_endswith2.c
+++ b/tests/test-mbs_endswith2.c
@@ -1,4 +1,4 @@
-/* Test of mbs_endswith() function.
+/* Test of mbs_endswith() function in a UTF-8 locale.
    Copyright (C) 2025-2026 Free Software Foundation, Inc.
 
    This program is free software: you can redistribute it and/or modify
@@ -65,23 +65,51 @@ main ()
   /* Test cases with invalid or incomplete characters.  */
 
   /* A valid character should not match an invalid character.  */
+  /* "\301\247" = 0xC1 0xA7 is invalid.
+     In fact, "\301" = 0xC1 is already invalid, see
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+   */
   ASSERT (!mbs_endswith ("\303\247", "\301\247"));
   ASSERT (!mbs_endswith ("\301\247", "\303\247"));
 
   /* A valid character should not match an incomplete character.  */
+  /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid.  */
   ASSERT (!mbs_endswith ("\303\247", "\343\247"));
   ASSERT (!mbs_endswith ("\343\247", "\303\247"));
 
   /* An invalid character should not match an incomplete character.  */
+  /* "\301\247" = 0xC1 0xA7 is invalid.
+     In fact, "\301" = 0xC1 is already invalid, see
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+   */
+  /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid.  */
   ASSERT (!mbs_endswith ("\301\247", "\343\247"));
   ASSERT (!mbs_endswith ("\343\247", "\301\247"));
 
+  /* Incomplete characters.  See
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+     page 128 table 3-11.  */
+  /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020.  */
+  ASSERT (!mbs_endswith ("\341\200\240", "\200\240"));
+  ASSERT (!mbs_endswith ("\341\200\240", "\240"));
+  /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0.  */
+  ASSERT (!mbs_endswith ("\360\221\222\240", "\221\222\240"));
+  ASSERT (!mbs_endswith ("\360\221\222\240", "\222\240"));
+  ASSERT (!mbs_endswith ("\360\221\222\240", "\240"));
+
   /* Two invalid characters should match only if they are identical.  */
+  /* "\301\246" = 0xC1 0xA6 is invalid.
+     "\301\247" = 0xC1 0xA7 is invalid.
+     In fact, "\301" = 0xC1 is already invalid, see
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf page 125 table 3-7.
+   */
   ASSERT (!mbs_endswith ("\301\246", "\301\247"));
   ASSERT (!mbs_endswith ("\301\247", "\301\246"));
   ASSERT (mbs_endswith ("\301\247", "\301\247"));
 
   /* Two incomplete characters should match only if they are identical.  */
+  /* "\343\246" = 0xE3 0xA6 is incomplete, "\343\246\214" = U+398C is valid.  */
+  /* "\343\247" = 0xE3 0xA7 is incomplete, "\343\247\214" = U+39CC is valid.  */
   ASSERT (!mbs_endswith ("\343\246", "\343\247"));
   ASSERT (!mbs_endswith ("\343\247", "\343\246"));
   ASSERT (mbs_endswith ("\343\247", "\343\247"));
diff --git a/tests/test-mbs_endswith3.c b/tests/test-mbs_endswith3.c
index ad1e24f5e8..e1abd1195e 100644
--- a/tests/test-mbs_endswith3.c
+++ b/tests/test-mbs_endswith3.c
@@ -1,4 +1,4 @@
-/* Test of mbs_endswith() function.
+/* Test of mbs_endswith() function in a GB18030 locale.
    Copyright (C) 2025-2026 Free Software Foundation, Inc.
 
    This program is free software: you can redistribute it and/or modify
-- 
2.54.0

From 2e73a29a97c7fc0e7b3d5737cd84172cb82b4069 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:32:18 +0200
Subject: [PATCH 5/7] mbspcasecmp tests: Enhance tests.

* tests/test-mbspcasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbspcasecmp-4.sh: Renamed from tests/test-mbspcasecmp.sh.
* tests/test-mbspcasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbspcasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbspcasecmp-3.sh,
test-mbspcasecmp-4.sh, instead of test-mbspcasecmp.sh.
---
 ChangeLog                                     |  16 +++
 modules/mbspcasecmp-tests                     |  14 ++-
 tests/test-mbspcasecmp-3.sh                   |  23 ++++
 ...t-mbspcasecmp.sh => test-mbspcasecmp-4.sh} |   2 +-
 tests/test-mbspcasecmp.c                      | 114 +++++++++++++++---
 5 files changed, 145 insertions(+), 24 deletions(-)
 create mode 100755 tests/test-mbspcasecmp-3.sh
 rename tests/{test-mbspcasecmp.sh => test-mbspcasecmp-4.sh} (89%)

diff --git a/ChangeLog b/ChangeLog
index 14ce0d68fd..c5b5e39291 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2026-05-25  Bruno Haible  <[email protected]>
+
+	mbspcasecmp tests: Enhance tests.
+	* tests/test-mbspcasecmp.c (test_ascii): New function, extracted from
+	main.
+	(test_utf_8): Likewise. Add test cases with incomplete characters.
+	(main): Invoke them. Accept a numeric argument.
+	* tests/test-mbspcasecmp-4.sh: Renamed from tests/test-mbspcasecmp.sh.
+	* tests/test-mbspcasecmp-3.sh: New file, based on
+	tests/test-mbmemcasecmp-3.sh.
+	* modules/mbspcasecmp-tests (Files): Update after rename. Add
+	locale-en.m4, locale-fr.m4.
+	(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
+	(Makefile.am): Arrange to run test-mbspcasecmp-3.sh,
+	test-mbspcasecmp-4.sh, instead of test-mbspcasecmp.sh.
+
 2026-05-25  Bruno Haible  <[email protected]>
 
 	mbsncasecmp tests: Enhance tests.
diff --git a/modules/mbspcasecmp-tests b/modules/mbspcasecmp-tests
index e82a37eb23..4ca8c4d95e 100644
--- a/modules/mbspcasecmp-tests
+++ b/modules/mbspcasecmp-tests
@@ -1,7 +1,10 @@
 Files:
-tests/test-mbspcasecmp.sh
+tests/test-mbspcasecmp-3.sh
+tests/test-mbspcasecmp-4.sh
 tests/test-mbspcasecmp.c
 tests/macros.h
+m4/locale-en.m4
+m4/locale-fr.m4
 m4/locale-tr.m4
 m4/codeset.m4
 
@@ -9,10 +12,15 @@ Depends-on:
 setlocale
 
 configure.ac:
+gt_LOCALE_EN_UTF8
+gt_LOCALE_FR_UTF8
 gt_LOCALE_TR_UTF8
 
 Makefile.am:
-TESTS += test-mbspcasecmp.sh
-TESTS_ENVIRONMENT += LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
+TESTS += test-mbspcasecmp-3.sh test-mbspcasecmp-4.sh
+TESTS_ENVIRONMENT += \
+  LOCALE_EN_UTF8='@LOCALE_EN_UTF8@' \
+  LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \
+  LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
 check_PROGRAMS += test-mbspcasecmp
 test_mbspcasecmp_LDADD = $(LDADD) $(LIBUNISTRING) $(SETLOCALE_LIB) $(MBRTOWC_LIB) $(LIBC32CONV)
diff --git a/tests/test-mbspcasecmp-3.sh b/tests/test-mbspcasecmp-3.sh
new file mode 100755
index 0000000000..dc4619a0c3
--- /dev/null
+++ b/tests/test-mbspcasecmp-3.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# Test whether a specific UTF-8 locale is installed.
+: "${LOCALE_EN_UTF8=en_US.UTF-8}"
+: "${LOCALE_FR_UTF8=fr_FR.UTF-8}"
+if test "$LOCALE_EN_UTF8" = none && test $LOCALE_FR_UTF8 = none; then
+  if test -f /usr/bin/localedef; then
+    echo "Skipping test: no english or french Unicode locale is installed"
+  else
+    echo "Skipping test: no english or french Unicode locale is supported"
+  fi
+  exit 77
+fi
+
+# It's sufficient to test in one of the two locales.
+if test $LOCALE_FR_UTF8 != none; then
+  testlocale=$LOCALE_FR_UTF8
+else
+  testlocale="$LOCALE_EN_UTF8"
+fi
+
+LC_ALL="$testlocale" \
+${CHECKER} ./test-mbspcasecmp${EXEEXT} 3
diff --git a/tests/test-mbspcasecmp.sh b/tests/test-mbspcasecmp-4.sh
similarity index 89%
rename from tests/test-mbspcasecmp.sh
rename to tests/test-mbspcasecmp-4.sh
index 1e390755f1..daef45b62c 100755
--- a/tests/test-mbspcasecmp.sh
+++ b/tests/test-mbspcasecmp-4.sh
@@ -12,4 +12,4 @@ if test $LOCALE_TR_UTF8 = none; then
 fi
 
 LC_ALL=$LOCALE_TR_UTF8 \
-${CHECKER} ./test-mbspcasecmp${EXEEXT}
+${CHECKER} ./test-mbspcasecmp${EXEEXT} 4
diff --git a/tests/test-mbspcasecmp.c b/tests/test-mbspcasecmp.c
index 8839a7444d..a1407164cd 100644
--- a/tests/test-mbspcasecmp.c
+++ b/tests/test-mbspcasecmp.c
@@ -24,13 +24,9 @@
 
 #include "macros.h"
 
-int
-main ()
+static void
+test_ascii (void)
 {
-  /* configure should already have checked that the locale is supported.  */
-  if (setlocale (LC_ALL, "") == NULL)
-    return 1;
-
   {
     const char string[] = "paragraph";
     ASSERT (mbspcasecmp (string, "Paragraph") == string + 9);
@@ -60,31 +56,109 @@ main ()
     const char string[] = "paragraph";
     ASSERT (mbspcasecmp (string, "para") == string + 4);
   }
+}
 
+static void
+test_utf_8 (bool turkish)
+{
   /* The following tests shows how mbspcasecmp() is different from
      strncasecmp().  */
 
+  if (turkish)
+    {
+      {
+        const char string[] = "\303\266zg\303\274rt\303\274k"; /* ??zg??rt??k */
+        ASSERT (mbspcasecmp (string, "\303\226ZG\303\234R") == string + 7); /* ??zg??r */
+      }
+
+      {
+        const char string[] = "\303\226ZG\303\234Rt\303\274k"; /* ??zg??rt??k */
+        ASSERT (mbspcasecmp (string, "\303\266zg\303\274r") == string + 7); /* ??zg??r */
+      }
+
+      /* This test shows how strings of different size can compare equal.  */
+
+      {
+        const char string[] = "turkishtime";
+        ASSERT (mbspcasecmp (string, "TURK\304\260SH") == string + 7);
+      }
+
+      {
+        const char string[] = "TURK\304\260SHK\303\234LT\303\234R";
+        ASSERT (mbspcasecmp (string, "turkish") == string + 8);
+      }
+    }
+
+  /* Incomplete characters.  See
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+     page 128 table 3-11.  */
+
+  /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020.  */
   {
-    const char string[] = "\303\266zg\303\274rt\303\274k"; /* ??zg??rt??k */
-    ASSERT (mbspcasecmp (string, "\303\226ZG\303\234R") == string + 7); /* ??zg??r */
+    const char string[] = "\341\200";
+    ASSERT (mbspcasecmp (string, "\341\200") == string + 2);
   }
-
   {
-    const char string[] = "\303\226ZG\303\234Rt\303\274k"; /* ??zg??rt??k */
-    ASSERT (mbspcasecmp (string, "\303\266zg\303\274r") == string + 7); /* ??zg??r */
+    const char string[] = "\341\200X";
+    ASSERT (mbspcasecmp (string, "\341\200x") == string + 3);
   }
-
-  /* This test shows how strings of different size can compare equal.  */
-
   {
-    const char string[] = "turkishtime";
-    ASSERT (mbspcasecmp (string, "TURK\304\260SH") == string + 7);
+    const char string[] = "\341";
+    ASSERT (mbspcasecmp (string, "\341") == string + 1);
   }
-
   {
-    const char string[] = "TURK\304\260SHK\303\234LT\303\234R";
-    ASSERT (mbspcasecmp (string, "turkish") == string + 8);
+    const char string[] = "\341X";
+    ASSERT (mbspcasecmp (string, "\341x") == string + 2);
   }
+  /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0.  */
+  {
+    const char string[] = "\360\221\222";
+    ASSERT (mbspcasecmp (string, "\360\221\222") == string + 3);
+  }
+  {
+    const char string[] = "\360\221\222X";
+    ASSERT (mbspcasecmp (string, "\360\221\222x") == string + 4);
+  }
+  {
+    const char string[] = "\360\221";
+    ASSERT (mbspcasecmp (string, "\360\221") == string + 2);
+  }
+  {
+    const char string[] = "\360\221X";
+    ASSERT (mbspcasecmp (string, "\360\221x") == string + 3);
+  }
+  {
+    const char string[] = "\360";
+    ASSERT (mbspcasecmp (string, "\360") == string + 1);
+  }
+  {
+    const char string[] = "\360X";
+    ASSERT (mbspcasecmp (string, "\360x") == string + 2);
+  }
+}
+
+int
+main (int argc, char *argv[])
+{
+  /* configure should already have checked that the locale is supported.  */
+  if (setlocale (LC_ALL, "") == NULL)
+    return 1;
+
+  test_ascii ();
+
+  if (argc > 1)
+    switch (argv[1][0])
+      {
+      case '3':
+        /* Locale encoding is UTF-8, locale is not Turkish.  */
+        test_utf_8 (false);
+        return test_exit_status;
+
+      case '4':
+        /* Locale encoding is UTF-8, locale is Turkish.  */
+        test_utf_8 (true);
+        return test_exit_status;
+      }
 
-  return test_exit_status;
+  return 1;
 }
-- 
2.54.0

From 1d411d24777b1defd3a065300da16b31586ef85f Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:27:18 +0200
Subject: [PATCH 4/7] mbsncasecmp tests: Enhance tests.

* tests/test-mbsncasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbsncasecmp-4.sh: Renamed from tests/test-mbsncasecmp.sh.
* tests/test-mbsncasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbsncasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbsncasecmp-3.sh,
test-mbsncasecmp-4.sh, instead of test-mbsncasecmp.sh.
---
 ChangeLog                                     | 16 +++++
 modules/mbsncasecmp-tests                     | 14 +++-
 tests/test-mbsncasecmp-3.sh                   | 23 +++++++
 ...t-mbsncasecmp.sh => test-mbsncasecmp-4.sh} |  2 +-
 tests/test-mbsncasecmp.c                      | 68 +++++++++++++++----
 5 files changed, 107 insertions(+), 16 deletions(-)
 create mode 100755 tests/test-mbsncasecmp-3.sh
 rename tests/{test-mbsncasecmp.sh => test-mbsncasecmp-4.sh} (89%)

diff --git a/ChangeLog b/ChangeLog
index 5940e0d95f..14ce0d68fd 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2026-05-25  Bruno Haible  <[email protected]>
+
+	mbsncasecmp tests: Enhance tests.
+	* tests/test-mbsncasecmp.c (test_ascii): New function, extracted from
+	main.
+	(test_utf_8): Likewise. Add test cases with incomplete characters.
+	(main): Invoke them. Accept a numeric argument.
+	* tests/test-mbsncasecmp-4.sh: Renamed from tests/test-mbsncasecmp.sh.
+	* tests/test-mbsncasecmp-3.sh: New file, based on
+	tests/test-mbmemcasecmp-3.sh.
+	* modules/mbsncasecmp-tests (Files): Update after rename. Add
+	locale-en.m4, locale-fr.m4.
+	(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
+	(Makefile.am): Arrange to run test-mbsncasecmp-3.sh,
+	test-mbsncasecmp-4.sh, instead of test-mbsncasecmp.sh.
+
 2026-05-25  Bruno Haible  <[email protected]>
 
 	mbscasecmp tests: Enhance tests.
diff --git a/modules/mbsncasecmp-tests b/modules/mbsncasecmp-tests
index 5ed84188ea..c41804a2d5 100644
--- a/modules/mbsncasecmp-tests
+++ b/modules/mbsncasecmp-tests
@@ -1,7 +1,10 @@
 Files:
-tests/test-mbsncasecmp.sh
+tests/test-mbsncasecmp-3.sh
+tests/test-mbsncasecmp-4.sh
 tests/test-mbsncasecmp.c
 tests/macros.h
+m4/locale-en.m4
+m4/locale-fr.m4
 m4/locale-tr.m4
 m4/codeset.m4
 
@@ -9,10 +12,15 @@ Depends-on:
 setlocale
 
 configure.ac:
+gt_LOCALE_EN_UTF8
+gt_LOCALE_FR_UTF8
 gt_LOCALE_TR_UTF8
 
 Makefile.am:
-TESTS += test-mbsncasecmp.sh
-TESTS_ENVIRONMENT += LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
+TESTS += test-mbsncasecmp-3.sh test-mbsncasecmp-4.sh
+TESTS_ENVIRONMENT += \
+  LOCALE_EN_UTF8='@LOCALE_EN_UTF8@' \
+  LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \
+  LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
 check_PROGRAMS += test-mbsncasecmp
 test_mbsncasecmp_LDADD = $(LDADD) $(LIBUNISTRING) $(SETLOCALE_LIB) $(MBRTOWC_LIB) $(LIBC32CONV)
diff --git a/tests/test-mbsncasecmp-3.sh b/tests/test-mbsncasecmp-3.sh
new file mode 100755
index 0000000000..f5bee7f298
--- /dev/null
+++ b/tests/test-mbsncasecmp-3.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# Test whether a specific UTF-8 locale is installed.
+: "${LOCALE_EN_UTF8=en_US.UTF-8}"
+: "${LOCALE_FR_UTF8=fr_FR.UTF-8}"
+if test "$LOCALE_EN_UTF8" = none && test $LOCALE_FR_UTF8 = none; then
+  if test -f /usr/bin/localedef; then
+    echo "Skipping test: no english or french Unicode locale is installed"
+  else
+    echo "Skipping test: no english or french Unicode locale is supported"
+  fi
+  exit 77
+fi
+
+# It's sufficient to test in one of the two locales.
+if test $LOCALE_FR_UTF8 != none; then
+  testlocale=$LOCALE_FR_UTF8
+else
+  testlocale="$LOCALE_EN_UTF8"
+fi
+
+LC_ALL="$testlocale" \
+${CHECKER} ./test-mbsncasecmp${EXEEXT} 3
diff --git a/tests/test-mbsncasecmp.sh b/tests/test-mbsncasecmp-4.sh
similarity index 89%
rename from tests/test-mbsncasecmp.sh
rename to tests/test-mbsncasecmp-4.sh
index baf1e542bd..c7cf85c969 100755
--- a/tests/test-mbsncasecmp.sh
+++ b/tests/test-mbsncasecmp-4.sh
@@ -12,4 +12,4 @@ if test $LOCALE_TR_UTF8 = none; then
 fi
 
 LC_ALL=$LOCALE_TR_UTF8 \
-${CHECKER} ./test-mbsncasecmp${EXEEXT}
+${CHECKER} ./test-mbsncasecmp${EXEEXT} 4
diff --git a/tests/test-mbsncasecmp.c b/tests/test-mbsncasecmp.c
index 1858483f81..fb98f01354 100644
--- a/tests/test-mbsncasecmp.c
+++ b/tests/test-mbsncasecmp.c
@@ -24,13 +24,9 @@
 
 #include "macros.h"
 
-int
-main ()
+static void
+test_ascii (void)
 {
-  /* configure should already have checked that the locale is supported.  */
-  if (setlocale (LC_ALL, "") == NULL)
-    return 1;
-
   ASSERT (mbsncasecmp ("paragraph", "Paragraph", 1000000) == 0);
   ASSERT (mbsncasecmp ("paragraph", "Paragraph", 9) == 0);
 
@@ -54,16 +50,64 @@ main ()
   ASSERT (mbsncasecmp ("paragraph", "para", 9) > 0);
   ASSERT (mbsncasecmp ("paragraph", "para", 5) > 0);
   ASSERT (mbsncasecmp ("paragraph", "para", 4) == 0);
+}
 
+static void
+test_utf_8 (bool turkish)
+{
   /* The following tests shows how mbsncasecmp() is different from
      strncasecmp().  */
 
-  ASSERT (mbsncasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R", 99) == 0); /* ??zg??r */
-  ASSERT (mbsncasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r", 99) == 0); /* ??zg??r */
+  if (turkish)
+    {
+      ASSERT (mbsncasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R", 99) == 0); /* ??zg??r */
+      ASSERT (mbsncasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r", 99) == 0); /* ??zg??r */
+
+      /* This test shows how strings of different size can compare equal.  */
+      ASSERT (mbsncasecmp ("turkish", "TURK\304\260SH", 7) == 0);
+      ASSERT (mbsncasecmp ("TURK\304\260SH", "turkish", 7) == 0);
+    }
+
+  /* Incomplete characters.  See
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+     page 128 table 3-11.  */
+
+  /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020.  */
+  ASSERT (mbsncasecmp ("\341\200", "\341\200", 99) == 0);
+  ASSERT (mbsncasecmp ("\341\200X", "\341\200x", 99) == 0);
+  ASSERT (mbsncasecmp ("\341", "\341", 99) == 0);
+  ASSERT (mbsncasecmp ("\341X", "\341x", 99) == 0);
+  /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0.  */
+  ASSERT (mbsncasecmp ("\360\221\222", "\360\221\222", 99) == 0);
+  ASSERT (mbsncasecmp ("\360\221\222X", "\360\221\222x", 99) == 0);
+  ASSERT (mbsncasecmp ("\360\221", "\360\221", 99) == 0);
+  ASSERT (mbsncasecmp ("\360\221X", "\360\221x", 99) == 0);
+  ASSERT (mbsncasecmp ("\360", "\360", 99) == 0);
+  ASSERT (mbsncasecmp ("\360X", "\360x", 99) == 0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  /* configure should already have checked that the locale is supported.  */
+  if (setlocale (LC_ALL, "") == NULL)
+    return 1;
+
+  test_ascii ();
+
+  if (argc > 1)
+    switch (argv[1][0])
+      {
+      case '3':
+        /* Locale encoding is UTF-8, locale is not Turkish.  */
+        test_utf_8 (false);
+        return test_exit_status;
 
-  /* This test shows how strings of different size can compare equal.  */
-  ASSERT (mbsncasecmp ("turkish", "TURK\304\260SH", 7) == 0);
-  ASSERT (mbsncasecmp ("TURK\304\260SH", "turkish", 7) == 0);
+      case '4':
+        /* Locale encoding is UTF-8, locale is Turkish.  */
+        test_utf_8 (true);
+        return test_exit_status;
+      }
 
-  return test_exit_status;
+  return 1;
 }
-- 
2.54.0

From 25b66bfe9de7c305f641bd815f13be03159bfaec Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:20:40 +0200
Subject: [PATCH 3/7] mbscasecmp tests: Enhance tests.

* tests/test-mbscasecmp.c (test_ascii): New function, extracted from
main.
(test_utf_8): Likewise. Add test cases with incomplete characters.
(main): Invoke them. Accept a numeric argument.
* tests/test-mbscasecmp-4.sh: Renamed from tests/test-mbscasecmp.sh.
* tests/test-mbscasecmp-3.sh: New file, based on
tests/test-mbmemcasecmp-3.sh.
* modules/mbscasecmp-tests (Files): Update after rename. Add
locale-en.m4, locale-fr.m4.
(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
(Makefile.am): Arrange to run test-mbscasecmp-3.sh,
test-mbscasecmp-4.sh, instead of test-mbscasecmp.sh.
---
 ChangeLog                                     | 16 +++++
 modules/mbscasecmp-tests                      | 14 +++-
 tests/test-mbscasecmp-3.sh                    | 23 +++++++
 ...est-mbscasecmp.sh => test-mbscasecmp-4.sh} |  2 +-
 tests/test-mbscasecmp.c                       | 68 +++++++++++++++----
 5 files changed, 107 insertions(+), 16 deletions(-)
 create mode 100755 tests/test-mbscasecmp-3.sh
 rename tests/{test-mbscasecmp.sh => test-mbscasecmp-4.sh} (89%)

diff --git a/ChangeLog b/ChangeLog
index 02ad380e55..5940e0d95f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,19 @@
+2026-05-25  Bruno Haible  <[email protected]>
+
+	mbscasecmp tests: Enhance tests.
+	* tests/test-mbscasecmp.c (test_ascii): New function, extracted from
+	main.
+	(test_utf_8): Likewise. Add test cases with incomplete characters.
+	(main): Invoke them. Accept a numeric argument.
+	* tests/test-mbscasecmp-4.sh: Renamed from tests/test-mbscasecmp.sh.
+	* tests/test-mbscasecmp-3.sh: New file, based on
+	tests/test-mbmemcasecmp-3.sh.
+	* modules/mbscasecmp-tests (Files): Update after rename. Add
+	locale-en.m4, locale-fr.m4.
+	(configure.ac): Invoke gt_LOCALE_EN_UTF8, gt_LOCALE_FR_UTF8.
+	(Makefile.am): Arrange to run test-mbscasecmp-3.sh,
+	test-mbscasecmp-4.sh, instead of test-mbscasecmp.sh.
+
 2026-05-25  Bruno Haible  <[email protected]>
 
 	mbs_endswith tests: Enhance tests.
diff --git a/modules/mbscasecmp-tests b/modules/mbscasecmp-tests
index 61282af862..bdbb0cf17b 100644
--- a/modules/mbscasecmp-tests
+++ b/modules/mbscasecmp-tests
@@ -1,7 +1,10 @@
 Files:
-tests/test-mbscasecmp.sh
+tests/test-mbscasecmp-3.sh
+tests/test-mbscasecmp-4.sh
 tests/test-mbscasecmp.c
 tests/macros.h
+m4/locale-en.m4
+m4/locale-fr.m4
 m4/locale-tr.m4
 m4/codeset.m4
 
@@ -9,10 +12,15 @@ Depends-on:
 setlocale
 
 configure.ac:
+gt_LOCALE_EN_UTF8
+gt_LOCALE_FR_UTF8
 gt_LOCALE_TR_UTF8
 
 Makefile.am:
-TESTS += test-mbscasecmp.sh
-TESTS_ENVIRONMENT += LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
+TESTS += test-mbscasecmp-3.sh test-mbscasecmp-4.sh
+TESTS_ENVIRONMENT += \
+  LOCALE_EN_UTF8='@LOCALE_EN_UTF8@' \
+  LOCALE_FR_UTF8='@LOCALE_FR_UTF8@' \
+  LOCALE_TR_UTF8='@LOCALE_TR_UTF8@'
 check_PROGRAMS += test-mbscasecmp
 test_mbscasecmp_LDADD = $(LDADD) $(LIBUNISTRING) $(SETLOCALE_LIB) $(MBRTOWC_LIB) $(LIBC32CONV)
diff --git a/tests/test-mbscasecmp-3.sh b/tests/test-mbscasecmp-3.sh
new file mode 100755
index 0000000000..72ee7d4738
--- /dev/null
+++ b/tests/test-mbscasecmp-3.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# Test whether a specific UTF-8 locale is installed.
+: "${LOCALE_EN_UTF8=en_US.UTF-8}"
+: "${LOCALE_FR_UTF8=fr_FR.UTF-8}"
+if test "$LOCALE_EN_UTF8" = none && test $LOCALE_FR_UTF8 = none; then
+  if test -f /usr/bin/localedef; then
+    echo "Skipping test: no english or french Unicode locale is installed"
+  else
+    echo "Skipping test: no english or french Unicode locale is supported"
+  fi
+  exit 77
+fi
+
+# It's sufficient to test in one of the two locales.
+if test $LOCALE_FR_UTF8 != none; then
+  testlocale=$LOCALE_FR_UTF8
+else
+  testlocale="$LOCALE_EN_UTF8"
+fi
+
+LC_ALL="$testlocale" \
+${CHECKER} ./test-mbscasecmp${EXEEXT} 3
diff --git a/tests/test-mbscasecmp.sh b/tests/test-mbscasecmp-4.sh
similarity index 89%
rename from tests/test-mbscasecmp.sh
rename to tests/test-mbscasecmp-4.sh
index 73e62b5f50..e5c5a90b17 100755
--- a/tests/test-mbscasecmp.sh
+++ b/tests/test-mbscasecmp-4.sh
@@ -12,4 +12,4 @@ if test $LOCALE_TR_UTF8 = none; then
 fi
 
 LC_ALL=$LOCALE_TR_UTF8 \
-${CHECKER} ./test-mbscasecmp${EXEEXT}
+${CHECKER} ./test-mbscasecmp${EXEEXT} 4
diff --git a/tests/test-mbscasecmp.c b/tests/test-mbscasecmp.c
index 1c12691dea..f309d3e517 100644
--- a/tests/test-mbscasecmp.c
+++ b/tests/test-mbscasecmp.c
@@ -24,13 +24,9 @@
 
 #include "macros.h"
 
-int
-main ()
+static void
+test_ascii (void)
 {
-  /* configure should already have checked that the locale is supported.  */
-  if (setlocale (LC_ALL, "") == NULL)
-    return 1;
-
   ASSERT (mbscasecmp ("paragraph", "Paragraph") == 0);
 
   ASSERT (mbscasecmp ("paragrapH", "parAgRaph") == 0);
@@ -40,16 +36,64 @@ main ()
 
   ASSERT (mbscasecmp ("para", "paragraph") < 0);
   ASSERT (mbscasecmp ("paragraph", "para") > 0);
+}
 
+static void
+test_utf_8 (bool turkish)
+{
   /* The following tests shows how mbscasecmp() is different from
      strcasecmp().  */
 
-  ASSERT (mbscasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R") == 0); /* ??zg??r */
-  ASSERT (mbscasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r") == 0); /* ??zg??r */
+  if (turkish)
+    {
+      ASSERT (mbscasecmp ("\303\266zg\303\274r", "\303\226ZG\303\234R") == 0); /* ??zg??r */
+      ASSERT (mbscasecmp ("\303\226ZG\303\234R", "\303\266zg\303\274r") == 0); /* ??zg??r */
+
+      /* This test shows how strings of different size can compare equal.  */
+      ASSERT (mbscasecmp ("turkish", "TURK\304\260SH") == 0);
+      ASSERT (mbscasecmp ("TURK\304\260SH", "turkish") == 0);
+    }
+
+  /* Incomplete characters.  See
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+     page 128 table 3-11.  */
+
+  /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020.  */
+  ASSERT (mbscasecmp ("\341\200", "\341\200") == 0);
+  ASSERT (mbscasecmp ("\341\200X", "\341\200x") == 0);
+  ASSERT (mbscasecmp ("\341", "\341") == 0);
+  ASSERT (mbscasecmp ("\341X", "\341x") == 0);
+  /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0.  */
+  ASSERT (mbscasecmp ("\360\221\222", "\360\221\222") == 0);
+  ASSERT (mbscasecmp ("\360\221\222X", "\360\221\222x") == 0);
+  ASSERT (mbscasecmp ("\360\221", "\360\221") == 0);
+  ASSERT (mbscasecmp ("\360\221X", "\360\221x") == 0);
+  ASSERT (mbscasecmp ("\360", "\360") == 0);
+  ASSERT (mbscasecmp ("\360X", "\360x") == 0);
+}
+
+int
+main (int argc, char *argv[])
+{
+  /* configure should already have checked that the locale is supported.  */
+  if (setlocale (LC_ALL, "") == NULL)
+    return 1;
+
+  test_ascii ();
+
+  if (argc > 1)
+    switch (argv[1][0])
+      {
+      case '3':
+        /* Locale encoding is UTF-8, locale is not Turkish.  */
+        test_utf_8 (false);
+        return test_exit_status;
 
-  /* This test shows how strings of different size can compare equal.  */
-  ASSERT (mbscasecmp ("turkish", "TURK\304\260SH") == 0);
-  ASSERT (mbscasecmp ("TURK\304\260SH", "turkish") == 0);
+      case '4':
+        /* Locale encoding is UTF-8, locale is Turkish.  */
+        test_utf_8 (true);
+        return test_exit_status;
+      }
 
-  return test_exit_status;
+  return 1;
 }
-- 
2.54.0

>From 8d19402b5bd78976c08312da1e387d16c8fb8ff9 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:36:12 +0200
Subject: [PATCH 6/7] mbmemcasecmp tests: Enhance tests.

* tests/test-mbmemcasecmp.h (test_utf_8): Add test cases with incomplete
characters.
---
 ChangeLog                 |  6 ++++++
 tests/test-mbmemcasecmp.h | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index c5b5e39291..766a5860a5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2026-05-25  Bruno Haible  <[email protected]>
+
+	mbmemcasecmp tests: Enhance tests.
+	* tests/test-mbmemcasecmp.h (test_utf_8): Add test cases with incomplete
+	characters.
+
 2026-05-25  Bruno Haible  <[email protected]>
 
 	mbspcasecmp tests: Enhance tests.
diff --git a/tests/test-mbmemcasecmp.h b/tests/test-mbmemcasecmp.h
index c2175815b2..ff19c70b5c 100644
--- a/tests/test-mbmemcasecmp.h
+++ b/tests/test-mbmemcasecmp.h
@@ -395,4 +395,40 @@ test_utf_8 (int (*my_casecmp) (const char *, size_t, const char *, size_t), bool
     ASSERT (my_casecmp (input, countof (input), casefolded_decomposed, countof (casefolded_decomposed)) == 0);
   }
   #endif
+
+  /* Incomplete characters.  See
+     https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+     page 128 table 3-11.  */
+  /* 0xE1 0x80 0xA0 = U+1020.  */
+  {
+    static const char input1[] = { 0xE1, 0x80, 'x', 0xE1, 0x80 };
+    static const char input2[] = { 0xE1, 0x80, 'X', 0xE1, 0x80 };
+
+    ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+  }
+  {
+    static const char input1[] = { 0xE1, 'x', 0xE1 };
+    static const char input2[] = { 0xE1, 'X', 0xE1 };
+
+    ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+  }
+  /* 0xF0 0x91 0x92 0xA0 = U+114A0.  */
+  {
+    static const char input1[] = { 0xF0, 0x91, 0x92, 'x', 0xF0, 0x91, 0x92 };
+    static const char input2[] = { 0xF0, 0x91, 0x92, 'X', 0xF0, 0x91, 0x92 };
+
+    ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+  }
+  {
+    static const char input1[] = { 0xF0, 0x91, 'x', 0xF0, 0x91 };
+    static const char input2[] = { 0xF0, 0x91, 'X', 0xF0, 0x91 };
+
+    ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+  }
+  {
+    static const char input1[] = { 0xF0, 'x', 0xF0 };
+    static const char input2[] = { 0xF0, 'X', 0xF0 };
+
+    ASSERT (my_casecmp (input1, countof (input1), input2, countof (input2)) == 0);
+  }
 }
-- 
2.54.0

>From 070f9259d67373ef7530e8a2523b04258920a449 Mon Sep 17 00:00:00 2001
From: Bruno Haible <[email protected]>
Date: Mon, 25 May 2026 18:37:25 +0200
Subject: [PATCH 7/7] trim tests: Enhance tests.

* tests/test-trim.c (main): Add test cases with incomplete characters.
---
 ChangeLog         |  5 +++++
 tests/test-trim.c | 30 ++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 766a5860a5..0611c6c7a1 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2026-05-25  Bruno Haible  <[email protected]>
+
+	trim tests: Enhance tests.
+	* tests/test-trim.c (main): Add test cases with incomplete characters.
+
 2026-05-25  Bruno Haible  <[email protected]>
 
 	mbmemcasecmp tests: Enhance tests.
diff --git a/tests/test-trim.c b/tests/test-trim.c
index 745c7492dd..27a7a193c4 100644
--- a/tests/test-trim.c
+++ b/tests/test-trim.c
@@ -133,6 +133,36 @@ main (int argc, char *argv[])
           ASSERT (streq (result, "\302\267foo"));
           free (result);
         }
+        /* Incomplete characters.  See
+           https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf
+           page 128 table 3-11.  */
+        /* "\341\200\240" = 0xE1 0x80 0xA0 = U+1020.  */
+        {
+          char *result = trim ("\342\200\202\341\200\342\200\202");
+          ASSERT (streq (result, "\341\200"));
+          free (result);
+        }
+        {
+          char *result = trim ("\342\200\202\341\342\200\202");
+          ASSERT (streq (result, "\341"));
+          free (result);
+        }
+        /* "\360\221\222\240" = 0xF0 0x91 0x92 0xA0 = U+114A0.  */
+        {
+          char *result = trim ("\342\200\202\360\221\222\342\200\202");
+          ASSERT (streq (result, "\360\221\222"));
+          free (result);
+        }
+        {
+          char *result = trim ("\342\200\202\360\221\342\200\202");
+          ASSERT (streq (result, "\360\221"));
+          free (result);
+        }
+        {
+          char *result = trim ("\342\200\202\360\342\200\202");
+          ASSERT (streq (result, "\360"));
+          free (result);
+        }
         return test_exit_status;
 
       case '3':
-- 
2.54.0

Reply via email to