-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi,
It came to my attention that some LC_CTYPE source files for UTF-8, UTF-8.src and zh_TW.UTF-8.src, are inconsistent with all other LC_CTYPE source files. The literals in all other LC_CTYPE source files, including am_ET.UTF-8.src, are written in the native byte sequence of that specific locale, whereas UTF-8.src and zh_TW.UTF-8.src are written in Unicode (It must be noted that UTF-8 is NOT the same as Unicode.). This creates headaches for locale-aware applications supporting UTF-8. For example, the usages and behaviors of the is*() and isw*() functions, like iswspace(), are different under all other locales including am_ET.UTF-8 and under other UTF-8 locales. Under all other locales including am_ET.UTF-8, the argument for the isw*() functions is the wide character literal in that locale, whereas under other UTF-8 locales the application must first convert the wide character from UTF-8 to Unicode before feeding into the isw*() functions. Is there any good reason to have such inconsistency? Shall we change UTF-8.src and zh_TW.UTF-8.src so that the behaviors are consistent with other locales? Sincerely, Li-Lun Wang -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (FreeBSD) iD8DBQFEb1D7CQM7t5B2mhARAgMEAJ9FMpNx1IaUGIn0NNBaaHLj3DFQqACbBSJg tWnXCT2N15U+SntjmuTrGjI= =JNXG -----END PGP SIGNATURE----- _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"