Re: utf8::valid and \x14_000 - \x1F_0000

Juerd Waalboer Tue, 11 Mar 2008 12:28:21 -0700

Chris Hall skribis 2008-03-11 18:48 (+0000):
> I'm comfortable with the notion that perl characters are unsigned
> integers that overlap UCS, and happen to be held internally as a
> superset of UTF-8.
> I wonder if perl is completely comfortable.


It isn't. There are some very unfortunate "features".

> chr(n) throws various runtime warnings where 'n' isn't kosher UCS, and
> "\x{h...h}" throws the same ones at compile time.
> (...)I'm not sure I see the point of picking on a few values to warn
> about.

I don't see the point, but Perl's warnings are arbitrary in several
ways. Abigail has a lightning talk about the "interpreted as function"
warning, that illustrates this.

> In any case, is chr(n) supposed to be utf8 or UTF-8 ?  AFAIKS, it's
> neither.

It's supposed to be neither on the outside. Internally, it's utf8.

>      If chr(-1) doesn't exist, then undef looks like a reasonable
>      return value -- returning "\x{FFFD}" makes chr(-1)
>      indistinguishable from chr(0xFFFD) -- where the first is
>      nonsense and the second is entirely proper.

0xFFFD is the Unicode equivalent of undef. I think it makse sense in
this case.

> >Could you please report this bug with perlbug?
> Done.

Cheers.
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <[EMAIL PROTECTED]>  <http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy <[EMAIL PROTECTED]>

Re: utf8::valid and \x14_000 - \x1F_0000

Reply via email to