On 13/01/2026 02:52, Collin Funk wrote:
Pádraig Brady <[email protected]> writes:

I often need to count characters, so wanted to make ruler helper like:

   $ ruler() { yes 123456789 | head -n100 | src/paste -s -d¹²³⁴⁵⁶⁷⁸⁹⁰ | cut 
-c-$COLUMNS; }

So I could do things like:

   $ yes foo | head -n10 | paste -s -d.; ruler
   foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
   123456789¹123456789²123456789³123456789⁴123456789⁵123456789⁶123456789⁷

That is a useful function.

One small improvement would be adjusting cut -c-$COLUMNS so that it does
not split the multi-byte delimiter. When I first invoked it I saw a
REPLACEMENT CHARACTER (U+FFFD) at the end of the line:

     $ echo $COLUMNS
     80
     $ yes foo | head -n10 | paste -s -d.; ruler \
         | od -An -t x1 | tail -n 2
     foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
      38 39 e2 81 b6 31 32 33 34 35 36 37 38 39 e2 81
      0a

Where ⁷ is 0xE2 0x81 0xB7. That probably confused me longer than it
should have.

Oh right, I was using Fedora's cut(1) that has the i18n patch applied.
We'll get to that in coreutils soon.

But paste(1) needs multi-byte support for that,
which the attached implements.

Patch looks good from my quick look. I only did a few tests with 3 byte
long Chinese UTF-8 characters outside of your test, though.

+# Test UTF-8 multi-byte delimiters
+export LC_ALL=en_US.UTF-8
+
+# Skip if UTF-8 is not supported
+test "$(locale charmap 2>/dev/null)" = UTF-8 ||
+  skip_ 'UTF-8 locale not available'

Shouldn't $LOCALE_FR_UTF8 work fine here?

The rest of the test looks good. Testing invalid Unicode and character
sets other than ASCII and UTF-8 is nice.

Right, I used $LOCALE_FR_UTF8 instead.

cheers,
Padraig

Reply via email to