On 27/12/2023 20:52, Jim Meyering wrote:
Thanks. Nice improvement.
I compared --help and --version output, which showed no difference
other than the intended ones.
Thanks for the review. Pushed.
make check passed, modulo this unrelated failure on Fedora 38:
numfmt.pl: test lcl-dbl-to-human-1: stdout mismatch, comparing
lcl-dbl-to-human-1.1 (expected) and lcl-dbl-to-human-1.O (actual)
*** lcl-dbl-to-human-1.1 Wed Dec 27 12:47:48 2023
--- lcl-dbl-to-human-1.O Wed Dec 27 12:47:48 2023
***************
*** 1 ****
! 1,1K
--- 1 ----
! 1,1k
Oh interesting.
I suspect that's due to glibc-all-langpacks not being installed on your system.
That would mean a regular space was used as the thousands grouping char,
and thus the test was run for you, and not for me.
I see a similar skipping issue in tests/sort/sort-h-thousands-sep.sh,
and in fact that was hiding a sort bug introduced in v9.1 !
Hopefully the 3 patches attached addresses these issues.
Paul are you Ok with the change to sort in the last patch?
thanks,
Pádraig.
From 4a29be29401b7e10c8d628eb6469f41249b0c771 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Wed, 27 Dec 2023 22:47:48 +0000
Subject: [PATCH 1/3] tests: run locale tests on more systems
* tests/misc/numfmt.pl: Determine the thousands grouping character
in use, rather than skipping locale tests when it's not a space.
For example fr_FR.UTF-8 uses "NARROW NO-BREAK SPACE" as the grouping
char on modern glibc systems at least.
* tests/sort/sort-h-thousands-sep.sh: Likewise.
---
tests/misc/numfmt.pl | 28 +++++++++++++++++++---------
tests/sort/sort-h-thousands-sep.sh | 23 ++++++++++++-----------
2 files changed, 31 insertions(+), 20 deletions(-)
diff --git a/tests/misc/numfmt.pl b/tests/misc/numfmt.pl
index 3047f806d..b54f827c3 100755
--- a/tests/misc/numfmt.pl
+++ b/tests/misc/numfmt.pl
@@ -1011,6 +1011,15 @@ my @Limit_Tests =
(system "$prog ---debug 1 2>&1|grep 'MAX_UNSCALED_DIGITS: 18' > /dev/null") == 0
and push @Tests, @Limit_Tests;
+my $lg = ' ';
+if ($locale ne 'C')
+ {
+ open(LOC_GRP, "env LC_ALL=$locale printf \"%'d\" 1111|tr -d 1|")
+ or die "Can't fork command: $!";
+ $lg = <LOC_GRP>;
+ close(LOC_GRP) || die "Failed to read locale grouping from printf";
+ }
+
my @Locale_Tests =
(
# Locale that supports grouping, but without '--grouping' parameter
@@ -1018,11 +1027,12 @@ my @Locale_Tests =
{ENV=>"LC_ALL=$locale"}],
# Locale with grouping
- ['lcl-grp-2', '--from=si --grouping 7M', {OUT=>"7 000 000"},
+ ['lcl-grp-2', '--from=si --grouping 7M', {OUT=>"7${lg}000${lg}000"},
{ENV=>"LC_ALL=$locale"}],
# Locale with grouping and debug - no debug warning message
- ['lcl-grp-3', '--from=si --debug --grouping 7M', {OUT=>"7 000 000"},
+ ['lcl-grp-3', '--from=si --debug --grouping 7M',
+ {OUT=>"7${lg}000${lg}000"},
{ENV=>"LC_ALL=$locale"}],
# Input with locale'd decimal-point
@@ -1033,21 +1043,21 @@ my @Locale_Tests =
{ENV=>"LC_ALL=$locale"}],
# Format + Grouping
- ['lcl-fmt-1', '--format "%\'f" 50000',{OUT=>"50 000"},
+ ['lcl-fmt-1', '--format "%\'f" 50000',{OUT=>"50${lg}000"},
{ENV=>"LC_ALL=$locale"}],
- ['lcl-fmt-2', '--format "--%\'10f--" 50000', {OUT=>"-- 50 000--"},
+ ['lcl-fmt-2', '--format "--%\'10f--" 50000', {OUT=>"-- 50${lg}000--"},
{ENV=>"LC_ALL=$locale"}],
- ['lcl-fmt-3', '--format "--%\'-10f--" 50000',{OUT=>"--50 000 --"},
+ ['lcl-fmt-3', '--format "--%\'-10f--" 50000',{OUT=>"--50${lg}000 --"},
{ENV=>"LC_ALL=$locale"}],
['lcl-fmt-4', '--format "--%-10f--" --to=si 5000000',
{OUT=>"--5,0M --"},
{ENV=>"LC_ALL=$locale"}],
# handle zero/grouping in combination
- ['lcl-fmt-5', '--format="%\'06f" 1234',{OUT=>"01 234"},
+ ['lcl-fmt-5', '--format="%\'06f" 1234',{OUT=>"01${lg}234"},
{ENV=>"LC_ALL=$locale"}],
- ['lcl-fmt-6', '--format="%0\'6f" 1234',{OUT=>"01 234"},
+ ['lcl-fmt-6', '--format="%0\'6f" 1234',{OUT=>"01${lg}234"},
{ENV=>"LC_ALL=$locale"}],
- ['lcl-fmt-7', '--format="%0\'\'6f" 1234',{OUT=>"01 234"},
+ ['lcl-fmt-7', '--format="%0\'\'6f" 1234',{OUT=>"01${lg}234"},
{ENV=>"LC_ALL=$locale"}],
);
@@ -1059,7 +1069,7 @@ if ($locale ne 'C')
or die "Can't fork command: $!";
my $loc_num = <LOC_NUM>;
close(LOC_NUM) || die "Failed to read grouped number from printf";
- if ($loc_num ne '1 234')
+ if ($loc_num ne "1${lg}234")
{
warn "skipping locale grouping tests as 1234 groups like $loc_num\n";
$locale = 'C';
diff --git a/tests/sort/sort-h-thousands-sep.sh b/tests/sort/sort-h-thousands-sep.sh
index 5d1ff4f33..447f99f5c 100755
--- a/tests/sort/sort-h-thousands-sep.sh
+++ b/tests/sort/sort-h-thousands-sep.sh
@@ -21,25 +21,26 @@ print_ver_ sort
TEST_LOCALE='sv_SE'
-test "$(LC_ALL="$TEST_LOCALE" locale thousands_sep)" = ' ' ||
+lg="$(LC_ALL="$TEST_LOCALE" locale thousands_sep)"
+test "$lg" ||
skip_ 'The Swedish locale with blank thousands separator is unavailable.'
tee exp1 exp3 > in << _EOF_
-1 1k 1 M 4 003 1M
-2k 2M 2 k 4 002 2
-3M 3 3 G 4 001 3k
+1 1k 1 M 4${lg}003 1M
+2k 2M 2 k 4${lg}002 2
+3M 3 3 G 4${lg}001 3k
_EOF_
-
+cp in /tmp/pb.in
cat > exp2 << _EOF_
-3M 3 3 G 4 001 3k
-1 1k 1 M 4 003 1M
-2k 2M 2 k 4 002 2
+3M 3 3 G 4${lg}001 3k
+1 1k 1 M 4${lg}003 1M
+2k 2M 2 k 4${lg}002 2
_EOF_
cat > exp5 << _EOF_
-3M 3 3 G 4 001 3k
-2k 2M 2 k 4 002 2
-1 1k 1 M 4 003 1M
+3M 3 3 G 4${lg}001 3k
+2k 2M 2 k 4${lg}002 2
+1 1k 1 M 4${lg}003 1M
_EOF_
for i in 1 2 3 5; do
--
2.43.0
From 5e857871c6302116412cd7f3b761e57df89753f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Wed, 27 Dec 2023 23:37:17 +0000
Subject: [PATCH 2/3] tests: numfmt: fix test related to lower case 'k' SI unit
* tests/misc/numfmt.pl: Following on from v9.4-86-g615167cc4,
adjust this test accordingly. This test was being skipped
on some systems, and so only noticed now.
Reported by Jim Meyering.
---
tests/misc/numfmt.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/misc/numfmt.pl b/tests/misc/numfmt.pl
index b54f827c3..03a94d101 100755
--- a/tests/misc/numfmt.pl
+++ b/tests/misc/numfmt.pl
@@ -1039,7 +1039,7 @@ my @Locale_Tests =
['lcl-stdtod-1', '--from=si 12,2K', {OUT=>"12200"},
{ENV=>"LC_ALL=$locale"}],
- ['lcl-dbl-to-human-1', '--to=si 1100', {OUT=>"1,1K"},
+ ['lcl-dbl-to-human-1', '--to=si 1100', {OUT=>"1,1k"},
{ENV=>"LC_ALL=$locale"}],
# Format + Grouping
--
2.43.0
From 1e4bc09dfde518d09333c8d453ad2b9e191cdab4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Thu, 28 Dec 2023 00:02:42 +0000
Subject: [PATCH 3/3] sort: fix thousands grouping handling on single byte
locales
* gl/lib/strnumcmp-in.h (numcompare): After commit v9.0-8-g6cafb122f,
we need to treat characters as signed to avoid invalid comparisons
between negative integers and unsigned characters.
* NEWS: Mention the bug fix.
---
NEWS | 5 +++++
gl/lib/strnumcmp-in.h | 4 ++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/NEWS b/NEWS
index 26da9993e..61ba3cf19 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,11 @@ GNU coreutils NEWS -*- outline -*-
numfmt options like --suffix no longer have an arbitrary 127-byte limit.
[bug introduced with numfmt in coreutils-8.21]
+ sort again handles thousands grouping characters in single-byte locales
+ where the grouping character is greater than CHAR_MAX. For e.g. signed
+ character platforms with a 0xA0 (aka  ) grouping character.
+ [bug introduced in coreutils-9.1]
+
tail no longer mishandles input from files in /proc and /sys file systems,
on systems with a page size larger than the stdio BUFSIZ.
[This bug was present in "the beginning".]
diff --git a/gl/lib/strnumcmp-in.h b/gl/lib/strnumcmp-in.h
index 39b5caf77..3a69db732 100644
--- a/gl/lib/strnumcmp-in.h
+++ b/gl/lib/strnumcmp-in.h
@@ -114,8 +114,8 @@ static inline int _GL_ATTRIBUTE_PURE
numcompare (char const *a, char const *b,
int decimal_point, int thousands_sep)
{
- unsigned char tmpa = *a;
- unsigned char tmpb = *b;
+ char tmpa = *a;
+ char tmpb = *b;
int tmp;
size_t log_a;
size_t log_b;
--
2.43.0