On Wed, 10 Jan 2007, Paul Bijnens wrote: > On 2007-01-10 08:10, John Costello wrote: > > Is there a list of utf8 characters that perl cannot map, for example > > "\xA0"? This is with Perl 5.8.3. > > AFAIK there is no problem with "\xA0" if you mean the "\xA0" in > latin1 (iso8819-1) or similar encodings. That is just the "no-break > space".
Yes, that is the character I mean, though it is ISO-8859 (I seem to recall that one is a subset of the other). > What exactly is your problem with that character? perl 5.8.3 complains utf8 "\xA0" does not map to Unicode when the file is read. I'm specifying open(INFILE, "<:encoding($this->{'encoding'})", $this->{filename}), where $this->{'encoding'} is set to utf8 (confirmed that). The file originally was generated by perl 5.6.1 with utf encoding specified via binmode. The file then was tarred, gzipped, scp'd, and ungzipped and untarred and fed to perl 5.8.3. Thanks to Darren for the pointer to perldelta and the Unicode versions. I see that Unicode 4.0.0 does support \xA0, as well as the 110 other characters that perl 5.8.3 complains about. If I drop the encoding statement and change the open command to open(INFILE, "<$this->{'filename'}" the errors disappear. ... This leads me to think that perl 5.6.1 isn't encoding the output into utf8, but that's a bit of a wild guess.