On 09/03/2026 21:09, Pádraig Brady wrote:
On 09/03/2026 20:36, Bruno Haible via GNU coreutils General Discussion wrote:
Hi,

The expand/mb and unexpand/mb tests fail on several platforms, as reported
by the multi-platform CI.

Find attached two zips:
    - one with the expand/mb failure logs,
    - one with the unexpand/mb failure logs.

Oops they assume the presence of the en_US.UTF-8 locale.
I've adjusted master to use $LOCALE_FR_UTF8 if available.

There are other issues, that I need to think a bit about.

On alpine at least \u3000 is _not_ blank while on Linux it is.
I.e., c32isblank(0x3000) returns true on Linux.

\u3000 is ideographic space, i.e. a space generally used in east asian text
so that alignment is maintained. Since it's a space, and not non breaking space
it should be treated as a blank character IMHO.

Now we had similar issues with Solaris where it considered non-breaking space
characters as blank characters, so we defined the c32isnbspace() to cater for 
that.

The following tweak gets it to pass on both systems,
but isspace() is too encompassing.

diff --git a/src/unexpand.c b/src/unexpand.c
index 16d0f0031..82e3ab99a 100644
--- a/src/unexpand.c
+++ b/src/unexpand.c
@@ -176,7 +176,7 @@ unexpand (void)

           if (convert)
             {
-              bool blank = !! (c32isblank (g.ch) && ! c32isnbspace (g.ch));
+              bool blank = !! (g.ch != '\n' && c32isspace (g.ch) && ! 
c32isnbspace (g.ch));

               if (blank)
                 {

Maybe we should define a c32isvertspace() { g.ch=='\n' || ... } to allow:

  bool blank = c32isspace() && ! c32isvertspace() && ! c23isnbspace();

I do think both Solaris and Alpine are wrong in this regard though,
so I need to think a bit more if skipping in tests in appropriate
(and honoring system differences in c32isblank() implementation),
or trying to get consistent behavior across systems like this is appropriate.

any input appreciated, as I'm mostly hand waving on this.

cheers,
Padraig

Reply via email to