Pádraig Brady <[email protected]> writes:
> I often need to count characters, so wanted to make ruler helper like:
>
> $ ruler() { yes 123456789 | head -n100 | src/paste -s -d¹²³⁴⁵⁶⁷⁸⁹⁰ | cut
> -c-$COLUMNS; }
>
> So I could do things like:
>
> $ yes foo | head -n10 | paste -s -d.; ruler
> foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
> 123456789¹123456789²123456789³123456789⁴123456789⁵123456789⁶123456789⁷
That is a useful function.
One small improvement would be adjusting cut -c-$COLUMNS so that it does
not split the multi-byte delimiter. When I first invoked it I saw a
REPLACEMENT CHARACTER (U+FFFD) at the end of the line:
$ echo $COLUMNS
80
$ yes foo | head -n10 | paste -s -d.; ruler \
| od -An -t x1 | tail -n 2
foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
38 39 e2 81 b6 31 32 33 34 35 36 37 38 39 e2 81
0a
Where ⁷ is 0xE2 0x81 0xB7. That probably confused me longer than it
should have.
> But paste(1) needs multi-byte support for that,
> which the attached implements.
Patch looks good from my quick look. I only did a few tests with 3 byte
long Chinese UTF-8 characters outside of your test, though.
> +# Test UTF-8 multi-byte delimiters
> +export LC_ALL=en_US.UTF-8
> +
> +# Skip if UTF-8 is not supported
> +test "$(locale charmap 2>/dev/null)" = UTF-8 ||
> + skip_ 'UTF-8 locale not available'
Shouldn't $LOCALE_FR_UTF8 work fine here?
The rest of the test looks good. Testing invalid Unicode and character
sets other than ASCII and UTF-8 is nice.
Collin