> > use Encode 'from_to'; > > > > my $orjan = 'ÖRJAN'; > > my $lundstrom = 'LUNDSTRÖM'; > > > > print $orjan . ' ' . $lundstrom . "\n"; > > > > from_to $orjan,'latin1','utf-8'; > > from_to $lundstrom,'latin1','utf-8'; > > It is my understanding that from_to is the wrong thing to use here. The
Your understanding is correct. > - you obtain some character data, for example by putting it literally in > your script. If the script itself is in utf-8, it should contain > "use utf8;". If not (like your script), perl will assume ISO-8859-1. Or "use encoding 'whatever';", and Perl actually assumes whatever is your native encoding, be it ISO 8859-1, or -2, or CP1252, or EBCDIC, or whatever. > A different source of data would be reading from a file, which is > opened with the correct encoding specified (see Andreas' reply). > > A third source would be by reading a file or a socket and obtainng raw > bytes which can be interpreted as characters using decode(). In this case, e.g.: $lundstrom = decode("latin-1", $lundstrom); > - Manipulate the data using perl string operations > > - Output the data to a filehandle which is opened using the correct > encoding. > > The from_to function looks enticing, particularly because everyone has > heard about perl and utf8 strings, when it's almost always the wrong > thing to use. And perl does not use utf8, but supports unicode character > semantics. At least in the current Encode doc there is a section: B<CAVEAT>: The following operations look the same but are not quite so; from_to($data, ïso-8859-1", ütf8"); #1 $data = decode(ïso-8859-1", $data); #2 Both #1 and #2 make $data consist of a completely valid UTF-8 string but only #2 turns utf8 flag on. #1 is equivalent to $data = encode(ütf8", decode(ïso-8859-1", $data)); See L</"The UTF-8 flag"> below. > -- > Bart. -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen