Hello U+00A0 is not a UTF-8 character. The UTF-8 pendant for U+00A0 is C2 A0. What's interesting here is that A0 is part of the UTF-8 Sequence. So if that file is UTF-8, perl misses further bytes in the sequence. Otherwise it might not be UTF-8.
Regards, Oliver Am Mittwoch, 10. Januar 2007 19:59 schrieb John Costello: > On Wed, 10 Jan 2007, Paul Bijnens wrote: > > On 2007-01-10 08:10, John Costello wrote: > > > Is there a list of utf8 characters that perl cannot map, for example > > > "\xA0"? This is with Perl 5.8.3. > > > > AFAIK there is no problem with "\xA0" if you mean the "\xA0" in > > latin1 (iso8819-1) or similar encodings. That is just the "no-break > > space". > > Yes, that is the character I mean, though it is ISO-8859 (I seem to recall > that one is a subset of the other). > > > What exactly is your problem with that character? > > perl 5.8.3 complains > > utf8 "\xA0" does not map to Unicode > > when the file is read. I'm specifying open(INFILE, > "<:encoding($this->{'encoding'})", $this->{filename}), where > $this->{'encoding'} is set to utf8 (confirmed that). > > The file originally was generated by perl 5.6.1 with utf encoding > specified via binmode. The file then was tarred, gzipped, scp'd, and > ungzipped and untarred and fed to perl 5.8.3. > > Thanks to Darren for the pointer to perldelta and the Unicode versions. I > see that Unicode 4.0.0 does support \xA0, as well as the 110 other > characters that perl 5.8.3 complains about. > > If I drop the encoding statement and change the open command to > open(INFILE, "<$this->{'filename'}" > > the errors disappear. > > .. > > This leads me to think that perl 5.6.1 isn't encoding the output into > utf8, but that's a bit of a wild guess.