Pádraig Brady <[email protected]> writes:

> On 28/11/2025 20:05, Collin Funk wrote:
>> Pádraig Brady <[email protected]> writes:
>> 
>>> On 28/11/2025 00:57, Collin Funk wrote:
>>>> +print_ver_ tac
>>>
>>> I'd also rely on printf
>> Right. Don't we also need to use 'env' to make sure a shell builtin
>> doesn't get used?
>> 
>>>> +  tac --separator=$(printf "$1") inp > out && printf '\n' >> out \
>>>> +    || framework_failure_
>>>
>>> Since we're testing tac, I'd explicitly check its return value.
>>> Also this looks under quoted, so I'd change to:
>>>
>>>    tac --separator="$(printf -- "$1")" inp > out || fail=1
>>>    printf '\n' >> out || framework_failure_
>> Good catch.
>> I'll probably add another test to make sure invalid UTF-8 is treated
>> as
>> bytes.
>> Okay to move bad_unicode() from tests/fold/fold-characters.sh to
>> init.cfg? I'm sure it will be useful for other tests as well, since it
>> checks a few different ways UTF-8 can be bad.
>
> All sounds good.

Actually that idea doesn't work since bad_unicode emits a NUL.

Using it with that removed, i.e.,
'\xC3|\xED\xBA\xAD|\u0089|\xED\xA6\xBF\xED\xBF\xBF\n', doesn't seem to
work though, so I'll have to look into that.

I pushed the attached without the bad unicode check regardless, since
these cases are more likely to be used.

Collin

>From 7d94684f2cc0e09aefeb505bfb171f7b7a21b4d5 Mon Sep 17 00:00:00 2001
Message-ID: <7d94684f2cc0e09aefeb505bfb171f7b7a21b4d5.1764365406.git.collin.fu...@gmail.com>
From: Collin Funk <[email protected]>
Date: Thu, 27 Nov 2025 16:55:18 -0800
Subject: [PATCH v2] test: tac: test with non-ASCII values for --separator

* tests/tac/tac-locale.sh: New test.
* tests/local.mk (all_tests): Add it.
---
 tests/local.mk          |  1 +
 tests/tac/tac-locale.sh | 43 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)
 create mode 100755 tests/tac/tac-locale.sh

diff --git a/tests/local.mk b/tests/local.mk
index 26d140dcc..4ae003719 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -458,6 +458,7 @@ all_tests =					\
   tests/misc/sync.sh				\
   tests/tac/tac.pl				\
   tests/tac/tac-continue.sh			\
+  tests/tac/tac-locale.sh			\
   tests/tac/tac-2-nonseekable.sh		\
   tests/tail/tail.pl				\
   tests/misc/tee.sh				\
diff --git a/tests/tac/tac-locale.sh b/tests/tac/tac-locale.sh
new file mode 100755
index 000000000..2bb6e404c
--- /dev/null
+++ b/tests/tac/tac-locale.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+# Test that tac --separator=SEP works if SEP is not ASCII.
+
+# Copyright (C) 2025 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
+print_ver_ tac printf
+
+check_separator ()
+{
+  env printf "1$12$13$1" > inp || framework_failure_
+  env printf "3$12$11$1\n" > exp || framework_failure_
+  tac --separator="$(env printf -- "$1")" inp > out || fail=1
+  env printf '\n' >> out || framework_failure_
+  compare exp out || fail=1
+}
+
+export LC_ALL=en_US.iso8859-1  # only lowercase form works on macOS 10.15.7
+if test "$(locale charmap 2>/dev/null | sed 's/iso/ISO-/')" = ISO-8859-1; then
+  check_separator '\xe9'  # é
+  check_separator '\xe9\xea'  # éê
+fi
+
+export LC_ALL=$LOCALE_FR_UTF8
+if test "$(locale charmap 2>/dev/null)" = UTF-8; then
+  check_separator '\u0434'  # д
+  check_separator '\u0434\u0436'  # дж
+fi
+
+Exit $fail
-- 
2.52.0

Reply via email to