Re: encoding vs charset

Moritz Lenz Tue, 15 Jul 2008 15:32:04 -0700

NotFound wrote:
> To open another can of worms, I think that we can live without
> character set specification. We can stablish that the character set is
> always unicode, and to deal only with encodings.


We had that discussion already, and the answer was "no" for several reasons:
* Strings might contain binary data, it doesn't make sense to view them
as Unicode
* Unicode isn't necessarily universal, or might stop to be so in future.
If a character is not representable in Unicode, and you chose to use
Unicode for everything, you're screwed
* related to the previous point, some other character encodings might
not have a lossless round-trip conversion.

> Ascii is an encoding
> that maps directly to codepoints and only allows 0-127 values.
> iso-8859-1 is the same with 0-255 range. Any other 8 bit encoding just
> need a translation table. The only point to solve is we need some
> special way to work with fixed-8 with no intended character
> representation.

Introducing the "no character set" character set is just a special case
of arbitrary character sets. I see no point in using the special case
over the generic one.

Here's the discussion we had on this subject:
http://irclog.perlgeek.de/parrot/2008-06-23#i_362697

Cheers,
Moritz

-- 
Moritz Lenz
http://moritz.faui2k3.org/ |  http://perl-6.de/

Re: encoding vs charset

Reply via email to