On 11/30/2017 9:56 AM, Jonathan M Davis wrote:
I'm sure that we could come up with a better encoding than UTF-8 (e.g. getting rid of Unicode normalization as being a thing and never having multiple encodings for the same character), but _that_'s never going to happen.
UTF-8 is not the cause of that particular problem, it's caused by the Unicode committee being a committee. Other Unicode problems are caused by the committee trying to add semantic information to code points, which causes nothing but problems. I.e. the committee forgot that Unicode is a character set, and nothing more.