Hi Ralph, On 9/5/22 12:02, Ralph Corderoy wrote:
Hi Alejandro,If you know a (hopefully trivial) filter that transforms any multi-byte sequences in exactly the number of bytes that will be visible (and hopefully those bytes should be similar to the original UTF-8 content), that would greatly help....tbl man1/memusage.1 \ | eqn -Tutf8 \ | troff -man -t -M ./etc/groff/tmac -m checkstyle -rCHECKSTYLE=3 \ -ww -Tutf8 -rLL=78n \ | grotty -c \ | col -b -x \ | toplaintext \ | (! grep -n '.\{80\}.' >&2)I'm unclear on the problem trying to be solved. grep(1) in a UTF-8 locale already treats a multi-byte UTF-8 sequence for one rune as matched by ā.ā which leaves the terminal's escape sequences, but they've been disabled by grotty's ā-cā, and over-striking for underlining, dealt with by col(1). In other words, what's wrong with zcat man7/groff_char.7.gz | eqn -Tutf8 | troff -man -t -ww -Tutf8 -rLL=78n | grotty -c | col -pbx | (! grep -n '.\{80\}.' >&2) Does it miss overlong lines or wrongly report a short line as too long? If so, an example would help target further suggestions.
Hmmm, I found it to behave differently from what you say here, but I can't reproduce it now. I guess I had some other issue, and that it was my mistake. Since now it seems to work correctly, I'll change it to use -Tutf8 again, and assume that my problems were just EBCAK.
Thanks, Alex -- Alejandro Colomar <http://www.alejandro-colomar.es/>
OpenPGP_signature
Description: OpenPGP digital signature