> Konovalov, Vadim wrote:
> > > However, this is *the* unfixable UTF-8 bug in Perl 5 - the 
> > > fact that 1 bit
> > > is used as a flag that both signals "buffer is encoded as 
> UTF-8" and
> > > "string should use Unicode rather than bytes semantics"
> > 
> > But may be those two concepts should be considered synonyms 
> in this context?
> 
> You can convert from bytes to Unicode. The problem is that 
> perl silently
> assumes latin-1 encoding when doing so, for backwards compatibility
> reasons.
> 
> The module encoding::warnings, which is core in bleadperl, is actually
> quite helpful in those cases, as it will detect potential bugs.

I agree and dare to repeat that it was user's fault to not specify encoding,
it was not lack of bits in SV.

Also, it is a pity that semantic of "\x{A0}" and "\xA0" changed between
5.6.x and 5.8.x WRT UTF8-ness, as it was easy way to give Perl a high-sign
of Unicode-ness of a string.

Vadim.

Reply via email to