On Fri, Mar 23, 2001 at 06:16:58PM -0500, Dan Sugalski wrote:
> At 11:09 PM 3/23/2001 +0000, Simon Cozens wrote:
> >For instance, chr() will produce Unicode codepoints. But you can pretend that
> >they're ASCII codepoints, it's only the EBCDIC folk that'll get hurt. I hope
> >and suspect there'll be an equivalent of "use bytes" which makes chr(256)
> >either blow up or wrap around.
>
> Actually no it won't. If the string you're doing a chr on is tagged as
> EBCDIC, you'll get the EBCDIC value. Yes, it does mean that this:
>
> chr($foo) == chr($bar);
>
> could evaluate to false if one of the strings is EBCDIC and the other
> isn't. Odd but I don't see a good reason not to. Otherwise we'd want to
> force everything to Unicode, and then what do we do if one of the strings
> is plain binary data?
Are you thinking of ord rather than chr? I can't seem to make the
above make sense otherwise. chr takes a number, not a string as its
argument...
Your initial description of character set handling didn't mention
that different strings can be tagged as having different encodings,
and didn't cover the implications of this. Could you give a list
of the specific occasions when the encoding of a string would be
visible to a programmer?
- Damien