2013/1/22 Peter A. Shevtsov <petr.shevt...@gmail.com>: > On 22/01/13 at 02:32pm, Peter A. Shevtsov wrote: > >> It seems that it counts every cyrillic letter as two, i. e. it ain't count >> letters >> (or runes) but bytes. > > Indeed, > > echo latin кириллица | /usr/local/plan9/bin/awk '{printf("%d %d\n", > length($1), > length($2))}' > > 5 18 >
Also, awk can't know beforehand if the input string is UTF-8 encoded or not, so the only thing it can do is to count bytes....