Hi chet can you please remove the following from the unicode.c file localconv = iconv_open (charset, "ASCII");
This is invalid fall back. zhis creates a translation config. The primary attempt is utf-8 to destination codeset. If that conversion fails this tries selecting ASCII to codeset. !!!!! But the code still inputs utf-8 as input to the icconv. this means that this is less likely to successfully encode than a simple assignment. consider U+80 becomes utf-8 "\xc2\x80" which because we tell iconv this is ascii becomes ascii "\xc2\x80". do this line takes a U+80 and turns it into a U+c3 and a U+80. The way i rewrote the icconv code made it cleaner, safer and quicker, please consider using it. I avoided the need for the strcpy among other things. On 02/21/2012 03:42 AM, Chet Ramey wrote: > On 2/18/12 5:39 AM, John Kearney wrote: > >> Bash Version: 4.2 Patch Level: 10 Release Status: release >> >> Description: Current u32toutf8 only encode values below 0xffff >> correctly. wchar_t can be ambiguous size better in my opinion to >> use unsigned long, or uint32_t, or something clearer. > > Thanks for the patch. It's good to have a complete > implementation, though as a practical matter you won't see UTF-8 > characters longer than four bytes. I agree with you about the > unsigned 32-bit int type; wchar_t is signed, even if it's 32 bits, > on several systems I use. > > Chet >