Package: gawk
Version: 1:3.1.4-2
Severity: important
gawk does not handle UTF-8 multibyte characters properly. Here's an
example:
$ cat example.txt
A Only_a_singlebyte_character_here_(UTF-8:_41)
Ö A_letter_which_takes_two_bytes_(UTF-8:_c3_96)
€ A_currency_symbol_which_takes_three_bytes_(UTF-8:_e2_82_ac)
$ cat example.txt | awk '{ printf "%-5s%s\n",$1, $2 }'
A Only_a_singlebyte_character_here_(UTF-8:_41)
Ö A_letter_which_takes_two_bytes_(UTF-8:_c3_96)
€ A_currency_symbol_which_takes_three_bytes_(UTF-8:_e2_82_ac)
As we can see the format specifier %-5s does not calculate field widths
correctly when string contains multibyte characters. Unfortunately this
makes gawk's field widths mostly unusable with UTF-8 locale.
-- System Information:
Debian Release: 3.1
APT prefers testing
APT policy: (850, 'testing'), (800, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.8-2-k7
Locale: LANG=fi_FI.UTF-8, LC_CTYPE=fi_FI.UTF-8 (charmap=UTF-8)
Versions of packages gawk depends on:
ii libc6 2.3.2.ds1-22 GNU C Library: Shared libraries an
-- no debconf information
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]