On 27/12/2023 20:52, Jim Meyering wrote:
Thanks. Nice improvement.
I compared --help and --version output, which showed no difference
other than the intended ones.

Thanks for the review. Pushed.


make check passed, modulo this unrelated failure on Fedora 38:

numfmt.pl: test lcl-dbl-to-human-1: stdout mismatch, comparing
lcl-dbl-to-human-1.1 (expected) and lcl-dbl-to-human-1.O (actual)
*** lcl-dbl-to-human-1.1        Wed Dec 27 12:47:48 2023
--- lcl-dbl-to-human-1.O        Wed Dec 27 12:47:48 2023
***************
*** 1 ****
! 1,1K
--- 1 ----
! 1,1k

Oh interesting.
I suspect that's due to glibc-all-langpacks not being installed on your system.
That would mean a regular space was used as the thousands grouping char,
and thus the test was run for you, and not for me.

I see a similar skipping issue in tests/sort/sort-h-thousands-sep.sh,
and in fact that was hiding a sort bug introduced in v9.1 !

Hopefully the 3 patches attached addresses these issues.
Paul are you Ok with the change to sort in the last patch?

thanks,
Pádraig.
From 4a29be29401b7e10c8d628eb6469f41249b0c771 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Wed, 27 Dec 2023 22:47:48 +0000
Subject: [PATCH 1/3] tests: run locale tests on more systems

* tests/misc/numfmt.pl: Determine the thousands grouping character
in use, rather than skipping locale tests when it's not a space.
For example fr_FR.UTF-8 uses "NARROW NO-BREAK SPACE" as the grouping
char on modern glibc systems at least.
* tests/sort/sort-h-thousands-sep.sh: Likewise.
---
 tests/misc/numfmt.pl               | 28 +++++++++++++++++++---------
 tests/sort/sort-h-thousands-sep.sh | 23 ++++++++++++-----------
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/tests/misc/numfmt.pl b/tests/misc/numfmt.pl
index 3047f806d..b54f827c3 100755
--- a/tests/misc/numfmt.pl
+++ b/tests/misc/numfmt.pl
@@ -1011,6 +1011,15 @@ my @Limit_Tests =
 (system "$prog ---debug 1 2>&1|grep 'MAX_UNSCALED_DIGITS: 18' > /dev/null") == 0
   and push @Tests, @Limit_Tests;
 
+my $lg = ' ';
+if ($locale ne 'C')
+  {
+    open(LOC_GRP, "env LC_ALL=$locale printf \"%'d\" 1111|tr -d 1|")
+      or die "Can't fork command: $!";
+    $lg = <LOC_GRP>;
+    close(LOC_GRP) || die "Failed to read locale grouping from printf";
+  }
+
 my @Locale_Tests =
   (
      # Locale that supports grouping, but without '--grouping' parameter
@@ -1018,11 +1027,12 @@ my @Locale_Tests =
              {ENV=>"LC_ALL=$locale"}],
 
      # Locale with grouping
-     ['lcl-grp-2', '--from=si --grouping 7M',   {OUT=>"7 000 000"},
+     ['lcl-grp-2', '--from=si --grouping 7M',   {OUT=>"7${lg}000${lg}000"},
              {ENV=>"LC_ALL=$locale"}],
 
      # Locale with grouping and debug - no debug warning message
-     ['lcl-grp-3', '--from=si --debug --grouping 7M',   {OUT=>"7 000 000"},
+     ['lcl-grp-3', '--from=si --debug --grouping 7M',
+             {OUT=>"7${lg}000${lg}000"},
              {ENV=>"LC_ALL=$locale"}],
 
      # Input with locale'd decimal-point
@@ -1033,21 +1043,21 @@ my @Locale_Tests =
              {ENV=>"LC_ALL=$locale"}],
 
      # Format + Grouping
-     ['lcl-fmt-1', '--format "%\'f" 50000',{OUT=>"50 000"},
+     ['lcl-fmt-1', '--format "%\'f" 50000',{OUT=>"50${lg}000"},
              {ENV=>"LC_ALL=$locale"}],
-     ['lcl-fmt-2', '--format "--%\'10f--" 50000', {OUT=>"--    50 000--"},
+     ['lcl-fmt-2', '--format "--%\'10f--" 50000', {OUT=>"--    50${lg}000--"},
              {ENV=>"LC_ALL=$locale"}],
-     ['lcl-fmt-3', '--format "--%\'-10f--" 50000',{OUT=>"--50 000    --"},
+     ['lcl-fmt-3', '--format "--%\'-10f--" 50000',{OUT=>"--50${lg}000    --"},
              {ENV=>"LC_ALL=$locale"}],
      ['lcl-fmt-4', '--format "--%-10f--" --to=si 5000000',
              {OUT=>"--5,0M      --"},
              {ENV=>"LC_ALL=$locale"}],
      # handle zero/grouping in combination
-     ['lcl-fmt-5', '--format="%\'06f" 1234',{OUT=>"01 234"},
+     ['lcl-fmt-5', '--format="%\'06f" 1234',{OUT=>"01${lg}234"},
              {ENV=>"LC_ALL=$locale"}],
-     ['lcl-fmt-6', '--format="%0\'6f" 1234',{OUT=>"01 234"},
+     ['lcl-fmt-6', '--format="%0\'6f" 1234',{OUT=>"01${lg}234"},
              {ENV=>"LC_ALL=$locale"}],
-     ['lcl-fmt-7', '--format="%0\'\'6f" 1234',{OUT=>"01 234"},
+     ['lcl-fmt-7', '--format="%0\'\'6f" 1234',{OUT=>"01${lg}234"},
              {ENV=>"LC_ALL=$locale"}],
 
   );
@@ -1059,7 +1069,7 @@ if ($locale ne 'C')
       or die "Can't fork command: $!";
     my $loc_num = <LOC_NUM>;
     close(LOC_NUM) || die "Failed to read grouped number from printf";
-    if ($loc_num ne '1 234')
+    if ($loc_num ne "1${lg}234")
       {
         warn "skipping locale grouping tests as 1234 groups like $loc_num\n";
         $locale = 'C';
diff --git a/tests/sort/sort-h-thousands-sep.sh b/tests/sort/sort-h-thousands-sep.sh
index 5d1ff4f33..447f99f5c 100755
--- a/tests/sort/sort-h-thousands-sep.sh
+++ b/tests/sort/sort-h-thousands-sep.sh
@@ -21,25 +21,26 @@ print_ver_ sort
 
 TEST_LOCALE='sv_SE'
 
-test "$(LC_ALL="$TEST_LOCALE" locale thousands_sep)" = ' ' ||
+lg="$(LC_ALL="$TEST_LOCALE" locale thousands_sep)"
+test "$lg" ||
   skip_ 'The Swedish locale with blank thousands separator is unavailable.'
 
 tee exp1 exp3 > in << _EOF_
-1       1k      1 M     4 003   1M
-2k      2M      2 k     4 002   2
-3M      3       3 G     4 001   3k
+1       1k      1 M     4${lg}003   1M
+2k      2M      2 k     4${lg}002   2
+3M      3       3 G     4${lg}001   3k
 _EOF_
-
+cp in /tmp/pb.in
 cat > exp2 << _EOF_
-3M      3       3 G     4 001   3k
-1       1k      1 M     4 003   1M
-2k      2M      2 k     4 002   2
+3M      3       3 G     4${lg}001   3k
+1       1k      1 M     4${lg}003   1M
+2k      2M      2 k     4${lg}002   2
 _EOF_
 
 cat > exp5 << _EOF_
-3M      3       3 G     4 001   3k
-2k      2M      2 k     4 002   2
-1       1k      1 M     4 003   1M
+3M      3       3 G     4${lg}001   3k
+2k      2M      2 k     4${lg}002   2
+1       1k      1 M     4${lg}003   1M
 _EOF_
 
 for i in 1 2 3 5; do
-- 
2.43.0

From 5e857871c6302116412cd7f3b761e57df89753f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Wed, 27 Dec 2023 23:37:17 +0000
Subject: [PATCH 2/3] tests: numfmt: fix test related to lower case 'k' SI unit

* tests/misc/numfmt.pl: Following on from v9.4-86-g615167cc4,
adjust this test accordingly.  This test was being skipped
on some systems, and so only noticed now.
Reported by Jim Meyering.
---
 tests/misc/numfmt.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/misc/numfmt.pl b/tests/misc/numfmt.pl
index b54f827c3..03a94d101 100755
--- a/tests/misc/numfmt.pl
+++ b/tests/misc/numfmt.pl
@@ -1039,7 +1039,7 @@ my @Locale_Tests =
      ['lcl-stdtod-1', '--from=si 12,2K', {OUT=>"12200"},
              {ENV=>"LC_ALL=$locale"}],
 
-     ['lcl-dbl-to-human-1', '--to=si 1100', {OUT=>"1,1K"},
+     ['lcl-dbl-to-human-1', '--to=si 1100', {OUT=>"1,1k"},
              {ENV=>"LC_ALL=$locale"}],
 
      # Format + Grouping
-- 
2.43.0

From 1e4bc09dfde518d09333c8d453ad2b9e191cdab4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Thu, 28 Dec 2023 00:02:42 +0000
Subject: [PATCH 3/3] sort: fix thousands grouping handling on single byte
 locales

* gl/lib/strnumcmp-in.h (numcompare): After commit v9.0-8-g6cafb122f,
we need to treat characters as signed to avoid invalid comparisons
between negative integers and unsigned characters.
* NEWS: Mention the bug fix.
---
 NEWS                  | 5 +++++
 gl/lib/strnumcmp-in.h | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/NEWS b/NEWS
index 26da9993e..61ba3cf19 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,11 @@ GNU coreutils NEWS                                    -*- outline -*-
   numfmt options like --suffix no longer have an arbitrary 127-byte limit.
   [bug introduced with numfmt in coreutils-8.21]
 
+  sort again handles thousands grouping characters in single-byte locales
+  where the grouping character is greater than CHAR_MAX.  For e.g. signed
+  character platforms with a 0xA0 (aka &nbsp) grouping character.
+  [bug introduced in coreutils-9.1]
+
   tail no longer mishandles input from files in /proc and /sys file systems,
   on systems with a page size larger than the stdio BUFSIZ.
   [This bug was present in "the beginning".]
diff --git a/gl/lib/strnumcmp-in.h b/gl/lib/strnumcmp-in.h
index 39b5caf77..3a69db732 100644
--- a/gl/lib/strnumcmp-in.h
+++ b/gl/lib/strnumcmp-in.h
@@ -114,8 +114,8 @@ static inline int _GL_ATTRIBUTE_PURE
 numcompare (char const *a, char const *b,
             int decimal_point, int thousands_sep)
 {
-  unsigned char tmpa = *a;
-  unsigned char tmpb = *b;
+  char tmpa = *a;
+  char tmpb = *b;
   int tmp;
   size_t log_a;
   size_t log_b;
-- 
2.43.0

Reply via email to