Re: List of unsupported unicode characters?

Oliver Block Wed, 10 Jan 2007 13:46:55 -0800

Hello

U+00A0 is not a UTF-8 character. The UTF-8 pendant for U+00A0 is C2 A0. 
What's interesting here is that A0 is part of the UTF-8 Sequence. So if that 
file is UTF-8, perl misses further bytes in the sequence. Otherwise it might 
not be UTF-8.


Regards,

Oliver

Am Mittwoch, 10. Januar 2007 19:59 schrieb John Costello:
> On Wed, 10 Jan 2007, Paul Bijnens wrote:
> > On 2007-01-10 08:10, John Costello wrote:
> > > Is there a list of utf8 characters that perl cannot map, for example
> > > "\xA0"?  This is with Perl 5.8.3.
> >
> > AFAIK there is no problem with "\xA0" if you mean the "\xA0" in
> > latin1 (iso8819-1) or similar encodings.  That is just the "no-break
> > space".
>
> Yes, that is the character I mean, though it is ISO-8859 (I seem to recall
> that one is a subset of the other).
>
> > What exactly is your problem with that character?
>
> perl 5.8.3 complains
>
>       utf8 "\xA0" does not map to Unicode
>
> when the file is read.  I'm specifying open(INFILE,
> "<:encoding($this->{'encoding'})", $this->{filename}), where
> $this->{'encoding'} is set to utf8 (confirmed that).
>
> The file originally was generated by perl 5.6.1 with utf encoding
> specified via binmode.  The file then was tarred, gzipped, scp'd, and
> ungzipped and untarred and fed to perl 5.8.3.
>
> Thanks to Darren for the pointer to perldelta and the Unicode versions.  I
> see that Unicode 4.0.0 does support \xA0, as well as the 110 other
> characters that perl 5.8.3 complains about.
>
> If I drop the encoding statement and change the open command to
>       open(INFILE, "<$this->{'filename'}"
>
> the errors disappear.
>
> ..
>
> This leads me to think that perl 5.6.1 isn't encoding the output into
> utf8, but that's a bit of a wild guess.

Re: List of unsupported unicode characters?

Reply via email to