Octavian Rasnita <[EMAIL PROTECTED]> writes: > Oh, sorry, but I've made a mistake when writing the message. >The Romanian language uses ISO-8859-2 and not ISO-8859-1 >So the question remains. Is it possible to decode a text written in more >languages that use more charsets?
Yes. But perhaps not as easily as you would like. You need markers which show where the encodings change. For perl purposes the language is not important, it is the "charset" (encoding) that matters. The encoding determines what the 8-bit bytes (also called octets) in a file mean as characters. So one "file" can normally only be in one encoding - this includes the perl script. Unicode and UTF-8 are designed to avoid this problem because UTF-8 can represent any Unicode code point and there are Unicode code points for (almost) all characters used by any language. However older 8-bit encodings like iso-8859-1 and iso-8859-2 pick different 256 character subsets. If I recall correctly So you cannot just enter 8-bit string litterals in both encodings into one perl script, and have perl know what they are directly. But you can have my $spanish = "..."; my $romanian = "..."; # Note that only one of those can "look right" in an iso-8859-* editor my $combined = Encode::decode('iso8859-1',$spanish). Encode::decode('iso8859-2',$romanian); You can then "print" the combined string as UTF-8 (or other Unicode encoding). But you will then need some way of viewing the Unicode file. An editor which can view the UTF-8 file will probably also allow you to enter UTF-8 strings directly as well. So you could write you script in UTF-8 and avoid the problem. Note that you cannot (in general) "print" the combined string as either 8859-1 or 8859-2 > >Thank you. > > >----- Original Message ----- >From: "Nick Ing-Simmons" <[EMAIL PROTECTED]> >To: <[EMAIL PROTECTED]> >Sent: Tuesday, April 13, 2004 11:13 AM >Subject: Re: Decoding more languages > > >> Octavian Rasnita <[EMAIL PROTECTED]> writes: >> >Hello all, >> > >> >I want to transform a text that contains words in more languages (it is a >> >course for learning a foreign language) in UTF-8. >> >I have 2 texts, one that contains Romanian and French words, and another >one >> >that contains Romanian and Spanish words. >> >I have seen that I can Encode::decode('ISO-8859-1', $text) the romanian