On Mon, Jun 25, 2012 at 5:23 PM, Roland Mainz <[email protected]> wrote: > Hi! > > ---- > > I've been testing ksh93's GB18030 (GB18030 is related since this > standard requires support for characters _outside_ the BMP, e.g. all > GB81030-conforming applicatinos must support unicode code points > > 0xFFFF without problems) and Unicode support two weeks ago and hit a > very bad issue with printf '%q\n'. > > In theory (the remainder of the text assumes a *.UTF-8 locale) if a > character is not printable (e.g. |iswprint()| returns |0|) then > ksh93's printf '%q' quoting support should use "\u<hex unicode code > point>". > The problem with that is... > 1. ... characters beyond unicode code point 0xffff do not work because > the implementation somehow doesn't get it right on SuSE 12.1 > Linux/AMD64. For example $ LC_ALL=en_US.UTF-8 ksh -c 'printf > "\u[1F640]\n"' # prints garbage instead of a valid unicode character > encoded in UTF-8
Mhhh... even entering such characters in the gmacs editor mode causes the characters to be mangled to death (tested with ast-ksh.2012-06-20). Warning... Unicode outside BMP in the "-- snip --"-section below: -- snip -- $ cat uc.sh printf '%x\n' $(( '🂓 )) -- snip -- ... should print: -- snip -- $ ~/bin/ksh -x uc.sh + printf '%x\n' 127123 1f093 -- snip -- (the character is a Mahjong tile... see http://en.wikipedia.org/wiki/Mahjong#Unicode). The issue is that entering the code characher in gmacs mode caueses weired '?' characters when the character with the Mahjong is entered (easy way to reproduce may be to use a browser in the KDE desktop and use clipboard or selection to paste the character in a terminal). ---- Bye, Roland --  __ . . __  (o.\ \/ /.o) [email protected]  \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer  /O /==\ O\ TEL +49 641 3992797  (;O/ \/ \O;) _______________________________________________ ast-developers mailing list [email protected] https://mailman.research.att.com/mailman/listinfo/ast-developers
