Hi, Of dd(1), POSIX says
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html lcase Map uppercase characters specified by the LC_CTYPE keyword tolower to the corresponding lowercase character. Characters for which no mapping is specified shall not be modified by this conversion. and similarly for `ucase'. But dd in coreutils 8.29-1 on Arch Linux just has a simple 256-byte translation table that's mapped through tolower(3) or toupper(3). http://pubs.opengroup.org/onlinepubs/9699919799/functions/tolower.html describes tolower(3) as handling only `unsigned char' or EOF, and being the identity function on all values where there isn't a lowercase letter for the uppercase value. This deviation isn't documented AFAICS. It means ASCII and ISO-8859-1 are re-cased just fine. UTF-8 has its ASCII subset altered, and other bytes left alone, so the end result is valid UTF-8, but not fully re-cased. But charmaps like /usr/share/i18n/charmaps/CP949.gz, https://en.wikipedia.org/wiki/Unified_Hangul_Code, have variable-length byte sequences where 0x41, for example, isn't always an ASCII `A' and thus shouldn't become 0x61, `a'. Aside from improving the documentation, actually fixing dd to match POSIX will need to handle the re-cased character being a different number of bytes; particularly noticeable if the output file is the input file with `conv=notrunc'. $ locale | grep LC_CTYPE LC_CTYPE="en_GB.utf8" $ $ sed 'l; s/./\u&/; l' <<<ȿ \310\277$ \342\261\276$ Ȿ $ sed 'l; s/./\l&/; l' <<<Ȿ \342\261\276$ \310\277$ ȿ $ -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy