On 02/04/2026 22:38, Pádraig Brady wrote:
Anyway I tested your change and it works really well.
I need to remove the '#undef mbrtoc32' from mcel.h to
get the win there of course.  Again I get the same 2.6x win
as seen with my previous patch:

    $ yes $(yes éééááé | head -n9 | paste -s -d,) |
      head -n1M > mb.in

    $ time LC_ALL=C.UTF-8 src/cut-before -c1 mb.in >/dev/null
    real    0m1.582s

    $ time LC_ALL=C.UTF-8 src/cut-after -c1 mb.in >/dev/null
    real    0m0.592s
I pushed the attached to remove the 'undef mbrtoc32' from mcel.h

thanks,
Padraig
From 305e58f2f803035bccfe051c29d8ffd3d13dfdbc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Sun, 5 Apr 2026 12:30:21 +0100
Subject: [PATCH] mcel: remove forced use of GLIBC's mbrtoc32
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is a performance win on GLIBC,
as tested on the coreutils multi-byte update to cut(1):

   $ yes $(yes éééááé | head -n9 | paste -s -d,) |
     head -n1M > mb.in

   $ time LC_ALL=C.UTF-8 src/cut-before -c1 mb.in >/dev/null
   real    0m1.582s

   $ time LC_ALL=C.UTF-8 src/cut-after -c1 mb.in >/dev/null
   real    0m0.592s

* lib/mcel.h: While GLIBC's mbrtoc32 is functional for mcel,
it is seen to be 2.6x slower than gnulib's implementation
due to GLIBC's per call locale handling.
---
 ChangeLog  | 7 +++++++
 lib/mcel.h | 7 -------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 55cf2efc34..9586cedb01 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2026-04-05  Pádraig Brady  <[email protected]>
+
+	mcel: remove forced use of GLIBC's mbrtoc32
+	* lib/mcel.h: While GLIBC's mbrtoc32 is functional for mcel,
+	it is seen to be 2.6x slower than gnulib's implementation
+	due to GLIBC's per call locale handling.
+
 2026-04-04  Bruno Haible  <[email protected]>
 
 	posix_spawn-internal: Remove a FIXME.
diff --git a/lib/mcel.h b/lib/mcel.h
index 757a97593f..5eedd5b610 100644
--- a/lib/mcel.h
+++ b/lib/mcel.h
@@ -217,13 +217,6 @@ mcel_isbasic (char c)
   return _GL_LIKELY (0 <= c && c < MCEL_ERR_MIN);
 }
 
-/* With mcel there should be no need for the performance overhead of
-   replacing glibc mbrtoc32, as callers shouldn't care whether the
-   C locale treats a byte with the high bit set as an encoding error.  */
-#ifdef __GLIBC__
-# undef mbrtoc32
-#endif
-
 /* Scan bytes from P inclusive to LIM exclusive.  P must be less than LIM.
    Return the character or encoding error starting at P.  */
 MCEL_INLINE mcel_t
-- 
2.53.0

Reply via email to