Pádraig Brady <[email protected]> writes:

> I often need to count characters, so wanted to make ruler helper like:
>
>   $ ruler() { yes 123456789 | head -n100 | src/paste -s -d¹²³⁴⁵⁶⁷⁸⁹⁰ | cut 
> -c-$COLUMNS; }
>
> So I could do things like:
>
>   $ yes foo | head -n10 | paste -s -d.; ruler
>   foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
>   123456789¹123456789²123456789³123456789⁴123456789⁵123456789⁶123456789⁷

That is a useful function.

One small improvement would be adjusting cut -c-$COLUMNS so that it does
not split the multi-byte delimiter. When I first invoked it I saw a
REPLACEMENT CHARACTER (U+FFFD) at the end of the line:

    $ echo $COLUMNS 
    80
    $ yes foo | head -n10 | paste -s -d.; ruler \
        | od -An -t x1 | tail -n 2
    foo.foo.foo.foo.foo.foo.foo.foo.foo.foo
     38 39 e2 81 b6 31 32 33 34 35 36 37 38 39 e2 81
     0a

Where ⁷ is 0xE2 0x81 0xB7. That probably confused me longer than it
should have.

> But paste(1) needs multi-byte support for that,
> which the attached implements.

Patch looks good from my quick look. I only did a few tests with 3 byte
long Chinese UTF-8 characters outside of your test, though.

> +# Test UTF-8 multi-byte delimiters
> +export LC_ALL=en_US.UTF-8
> +
> +# Skip if UTF-8 is not supported
> +test "$(locale charmap 2>/dev/null)" = UTF-8 ||
> +  skip_ 'UTF-8 locale not available'

Shouldn't $LOCALE_FR_UTF8 work fine here?

The rest of the test looks good. Testing invalid Unicode and character
sets other than ASCII and UTF-8 is nice.

Collin

Reply via email to