On 02/04/2026 22:38, Pádraig Brady wrote:
Anyway I tested your change and it works really well.
I need to remove the '#undef mbrtoc32' from mcel.h to
get the win there of course. Again I get the same 2.6x win
as seen with my previous patch:
$ yes $(yes éééááé | head -n9 | paste -s -d,) |
head -n1M > mb.in
$ time LC_ALL=C.UTF-8 src/cut-before -c1 mb.in >/dev/null
real 0m1.582s
$ time LC_ALL=C.UTF-8 src/cut-after -c1 mb.in >/dev/null
real 0m0.592s
I pushed the attached to remove the 'undef mbrtoc32' from mcel.h
thanks,
Padraig
From 305e58f2f803035bccfe051c29d8ffd3d13dfdbc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Sun, 5 Apr 2026 12:30:21 +0100
Subject: [PATCH] mcel: remove forced use of GLIBC's mbrtoc32
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This is a performance win on GLIBC,
as tested on the coreutils multi-byte update to cut(1):
$ yes $(yes éééááé | head -n9 | paste -s -d,) |
head -n1M > mb.in
$ time LC_ALL=C.UTF-8 src/cut-before -c1 mb.in >/dev/null
real 0m1.582s
$ time LC_ALL=C.UTF-8 src/cut-after -c1 mb.in >/dev/null
real 0m0.592s
* lib/mcel.h: While GLIBC's mbrtoc32 is functional for mcel,
it is seen to be 2.6x slower than gnulib's implementation
due to GLIBC's per call locale handling.
---
ChangeLog | 7 +++++++
lib/mcel.h | 7 -------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 55cf2efc34..9586cedb01 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2026-04-05 Pádraig Brady <[email protected]>
+
+ mcel: remove forced use of GLIBC's mbrtoc32
+ * lib/mcel.h: While GLIBC's mbrtoc32 is functional for mcel,
+ it is seen to be 2.6x slower than gnulib's implementation
+ due to GLIBC's per call locale handling.
+
2026-04-04 Bruno Haible <[email protected]>
posix_spawn-internal: Remove a FIXME.
diff --git a/lib/mcel.h b/lib/mcel.h
index 757a97593f..5eedd5b610 100644
--- a/lib/mcel.h
+++ b/lib/mcel.h
@@ -217,13 +217,6 @@ mcel_isbasic (char c)
return _GL_LIKELY (0 <= c && c < MCEL_ERR_MIN);
}
-/* With mcel there should be no need for the performance overhead of
- replacing glibc mbrtoc32, as callers shouldn't care whether the
- C locale treats a byte with the high bit set as an encoding error. */
-#ifdef __GLIBC__
-# undef mbrtoc32
-#endif
-
/* Scan bytes from P inclusive to LIM exclusive. P must be less than LIM.
Return the character or encoding error starting at P. */
MCEL_INLINE mcel_t
--
2.53.0