On 13/01/2026 02:52, Collin Funk wrote:
Pádraig Brady <[email protected]> writes:
I often need to count characters, so wanted to make ruler helper like:
$ ruler() { yes 123456789 | head -n100 | src/paste -s -d¹²³⁴⁵⁶⁷⁸⁹⁰ | cut
-c-$COLUMNS; }
So I could do things like:
$ yes foo | head -n10 | paste -s -d.; ruler
foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
123456789¹123456789²123456789³123456789⁴123456789⁵123456789⁶123456789⁷
That is a useful function.
One small improvement would be adjusting cut -c-$COLUMNS so that it does
not split the multi-byte delimiter. When I first invoked it I saw a
REPLACEMENT CHARACTER (U+FFFD) at the end of the line:
$ echo $COLUMNS
80
$ yes foo | head -n10 | paste -s -d.; ruler \
| od -An -t x1 | tail -n 2
foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
38 39 e2 81 b6 31 32 33 34 35 36 37 38 39 e2 81
0a
Where ⁷ is 0xE2 0x81 0xB7. That probably confused me longer than it
should have.
Oh right, I was using Fedora's cut(1) that has the i18n patch applied.
We'll get to that in coreutils soon.
But paste(1) needs multi-byte support for that,
which the attached implements.
Patch looks good from my quick look. I only did a few tests with 3 byte
long Chinese UTF-8 characters outside of your test, though.
+# Test UTF-8 multi-byte delimiters
+export LC_ALL=en_US.UTF-8
+
+# Skip if UTF-8 is not supported
+test "$(locale charmap 2>/dev/null)" = UTF-8 ||
+ skip_ 'UTF-8 locale not available'
Shouldn't $LOCALE_FR_UTF8 work fine here?
The rest of the test looks good. Testing invalid Unicode and character
sets other than ASCII and UTF-8 is nice.
Right, I used $LOCALE_FR_UTF8 instead.
cheers,
Padraig