On 23/08/2025 06:28, Collin Funk wrote:
Pádraig Brady <[email protected]> writes:
On the subject of testing, this is one place where the i18n patch lacked,
though it does have some adjustments for testing fold.
It would be good to incorporate it's tests/.../fold.pl adjustments.
I've just done this in the attached 2 patches,
which I'll push in a bit.
Also it would be good to add tests for invalid multi-byte characters
to see that they're handled appropriately.
It looks like we'll need to adjust the code to handle invalid chars
appropriately,
(and add tests). The following shows how upstream and i18n patch fold
treat invalid utf8 char \xC3 :
$ for fold in src/fold /bin/fold; do
for locale in C en_US.UTF-8; do
echo "LC_ALL=$locale $fold"
printf '\xC3' | LC_ALL=$locale $fold -w1 | od -Ax -tx1z -v | head -n1
done
done
LC_ALL=C src/fold
000000
LC_ALL=en_US.UTF-8 src/fold
000000
LC_ALL=C /bin/fold
000000 c3 >.<
LC_ALL=en_US.UTF-8 /bin/fold
000000 c3 >.<
cheers,
Padraig.From 0001bbc3e287ea76685adb021f9ef4819929f194 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Wed, 27 Aug 2025 13:08:57 +0100
Subject: [PATCH 1/2] tests: fold: copy i18n patch tests
* tests/fold/fold.pl: Copy tests from Fedora,
removing copy & pasted logic that was
extraneous to either the i18n patch or upstream.
---
tests/fold/fold.pl | 34 +++++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/tests/fold/fold.pl b/tests/fold/fold.pl
index 877322e0a..de34177fd 100755
--- a/tests/fold/fold.pl
+++ b/tests/fold/fold.pl
@@ -20,9 +20,15 @@ use strict;
(my $program_name = $0) =~ s|.*/||;
+my $prog = 'fold';
+
# Turn off localization of executable's output.
@ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3;
-my $prog = 'fold';
+
+# uncommented to enable multibyte paths
+my $mb_locale = $ENV{LOCALE_FR_UTF8};
+! defined $mb_locale || $mb_locale eq 'none'
+ and $mb_locale = 'C';
my @Tests =
(
@@ -44,6 +50,32 @@ my @Tests =
{OUT=>"123456\n7890\nabcdef\nghij\n123456\n7890"}],
);
+if ($mb_locale ne 'C')
+ {
+ # Duplicate each test vector, appending "-mb" to the test name and
+ # inserting {ENV => "LC_ALL=$mb_locale"} in the copy, so that we
+ # provide coverage for multi-byte code paths.
+ my @new;
+ foreach my $t (@Tests)
+ {
+ my @new_t = @$t;
+ my $test_name = shift @new_t;
+
+ push @new, ["$test_name-mb", @new_t, {ENV => "LC_ALL=$mb_locale"}];
+ }
+ push @Tests, @new;
+ }
+
+@Tests = triple_test \@Tests;
+
+# Remember that triple_test creates from each test with exactly one "IN"
+# file two more tests (.p and .r suffix on name) corresponding to reading
+# input from a file and from a pipe. The pipe-reading test would fail
+# due to a race condition about 1 in 20 times.
+# Remove the IN_PIPE version of the "output-is-input" test above.
+# The others aren't susceptible because they have three inputs each.
+@Tests = grep {$_->[0] ne 'output-is-input.p'} @Tests;
+
my $save_temps = $ENV{DEBUG};
my $verbose = $ENV{VERBOSE};
--
2.50.1
From 17dd8e049abc12fe6321ab7d33805d04a1523127 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Wed, 27 Aug 2025 14:34:46 +0100
Subject: [PATCH 2/2] tests: fold: add tests for width of multi-byte characters
* tests/fold/fold.pl: The i18n patch didn't actually test folding
of multi-byte characters, so add tests for various multi-byte forms.
---
tests/fold/fold.pl | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tests/fold/fold.pl b/tests/fold/fold.pl
index de34177fd..ea9bc5c6a 100755
--- a/tests/fold/fold.pl
+++ b/tests/fold/fold.pl
@@ -50,12 +50,25 @@ my @Tests =
{OUT=>"123456\n7890\nabcdef\nghij\n123456\n7890"}],
);
+# define UTF-8 encoded multi-byte characters for tests
+my $eaC = "\xC3\xA9"; # e acute NFC form
+my $eaD = "\x65\xCC\x81"; # e acute NFD form (zero width combining)
+my $eFW = "\xEF\xBD\x85"; # e fullwidth
+
+my @mbTests =
+ (
+ ['smb1', '-w2', {IN=>$eaC x 3}, {OUT=>$eaC x 2 . "\n" . $eaC}],
+ ['smb2', '-w2', {IN=>$eaD x 3}, {OUT=>$eaD x 2 . "\n" . $eaD}],
+ ['smb3', '-w2', {IN=>$eFW x 2}, {OUT=>$eFW . "\n" . $eFW}],
+ );
+
if ($mb_locale ne 'C')
{
# Duplicate each test vector, appending "-mb" to the test name and
# inserting {ENV => "LC_ALL=$mb_locale"} in the copy, so that we
# provide coverage for multi-byte code paths.
my @new;
+
foreach my $t (@Tests)
{
my @new_t = @$t;
@@ -64,6 +77,15 @@ if ($mb_locale ne 'C')
push @new, ["$test_name-mb", @new_t, {ENV => "LC_ALL=$mb_locale"}];
}
push @Tests, @new;
+
+ @new = ();
+ foreach my $t (@mbTests)
+ {
+ my @new_t = @$t;
+
+ push @new, [@new_t, {ENV => "LC_ALL=$mb_locale"}];
+ }
+ push @Tests, @new;
}
@Tests = triple_test \@Tests;
--
2.50.1