I like to wrap up my argument.

I recommend to use UTF-8 as the sole string encoding.
If we end up with multiple encodings, there is absolutely
no point for this argument.

Benefits of UTF-8 is more compact, less encoding conversion,
more friendly to C API. UTF-16 is variable length encoding
too, if considering the surrogates. UTF-32 is way too big.

The main disadvantage of UTF-8 is O(n) random access, which I
personally believe is not very important, since most text
processing require linear scan of text. Multi-byte encoding
has been widely used in Asian countries for years. It does
not seem to be a significant problem.

If Perl intends to have supurior of Unicode, i18n and l10n,
the benefits of UTF-16 will fade away pretty quickly.

Overall, both UTF-8 and UTF-16 are acceptable. But I believe
UTF-8 is a slightly better choice.

Hong

Reply via email to