> Konovalov, Vadim wrote: > > > However, this is *the* unfixable UTF-8 bug in Perl 5 - the > > > fact that 1 bit > > > is used as a flag that both signals "buffer is encoded as > UTF-8" and > > > "string should use Unicode rather than bytes semantics" > > > > But may be those two concepts should be considered synonyms > in this context? > > You can convert from bytes to Unicode. The problem is that > perl silently > assumes latin-1 encoding when doing so, for backwards compatibility > reasons. > > The module encoding::warnings, which is core in bleadperl, is actually > quite helpful in those cases, as it will detect potential bugs.
I agree and dare to repeat that it was user's fault to not specify encoding, it was not lack of bits in SV. Also, it is a pity that semantic of "\x{A0}" and "\xA0" changed between 5.6.x and 5.8.x WRT UTF8-ness, as it was easy way to give Perl a high-sign of Unicode-ness of a string. Vadim.