Re: coreutils i18n

Bruno Haible via GNU coreutils General Discussion Mon, 25 Aug 2025 00:19:19 -0700

PS:Just for completeness.

I wrote:
>     One of the approaches was to write code like
> 
>       if (MB_CUR_MAX > 1)
>         {
>           ...code for multibyte locales...
>         }
>       else
>         {
>           ...code for unibyte locales...
>         }
> 
>     Jim did not like this one because it duplicates the logic. (And right he 
> is.
>     I like to say that code duplication is a professional mistake.)
> 
>     Another approach that I proposed was to write code with the mbchar.h 
> module
>     from Gnulib. This does not duplicate the logic, but it came with a
>     performance penalty for the unibyte locales; Jim rejected it for this
>     reason. At that time, most of the locales were unibyte locales. Still 
> today,
>     the "C" locale is unibyte and is used in many places. Therefore this
>     argument is still valid today.
> 
>     Another approach that I proposed was the one used by the 'fnmatch' module
>     in Gnulib: Move out the core loop to a separate file, and parameterize 
> this
>     file so that it can be used in two modes: for the unibyte case, working on
>     types such as 'char', and for the multibyte case, working on types such as
>     'wchar_t'. (Nowadays that should be 'char32_t', not 'wchar_t'.) Jim 
> rejected
>     this approach as well. (Or maybe Paul did? I don't remember in detail.)


A fourth possible approach is the polymorphic / object-oriented approach:
Define a struct containing function pointers that define the processing of
each multibyte character.

  struct vtable { ... several function pointers ... };
  struct vtable vtable;
  if (MB_CUR_MAX > 1)
    {
      ... Initialize vtable with functions for the multibyte case ...
    }
  else
    {
      ... Initialize vtable with functions for the unibyte case ...
    }
  Do the processing loop, using vtable.

This approach avoids code duplication. It also avoids a separate .h file.
But it cannot achieve speed goals: The overhead of the added function calls
(at least one per multibyte character) makes it impossible to attain the
previous speed in the "C" locale. So, this approach should only be chosen
if minimizing binary code size matters more than speed.

Bruno

Re: coreutils i18n

Reply via email to