Re: Translating a Latin-1 string to a UTF8 string in Perl 5.6.1

Jarkko Hietaniemi Tue, 11 Dec 2001 17:44:13 -0800

On Tue, Dec 11, 2001 at 07:00:20PM -0600, Michael A. Grady wrote:
> Now that Perl 5.6.1 has removed support for tr///CU, is there still
> an easy way to take a latin-1 character string and convert it to
> a UTF8 string? I need to do that for generating LDIF files to load
> into an LDAP server.


No need for fancyisms.  I think the below might work even in perl4...

#!/usr/bin/perl -sp

if ($r) {
    # UTF-8 to Latin-1
    s/([\xC0-\xDF])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
} else {
    # Latin-1 to UTF-8
    s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
}
 
> I saw mention of using pack('U0',...), but I can't figure out how that
> actually works. E.g. Given a variable $string with a value of 'Áine', I'd
> like to get the corresponding string in utf8.

pack("U0U*", unpack("C*", $latin1here))

> --
> Michael A. Grady                             [EMAIL PROTECTED]
> Senior Research Programmer                   http://ljordal.cso.uiuc.edu 
> Computing & Communications Services Office   (217) 244-1253  phone
> University of Illinois at Urbana-Champaign   (217) 265-5635  fax
> Rm. 103, MC 680, 2212 Fox Drive, Suite C     Champaign, IL 61820

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Re: Translating a Latin-1 string to a UTF8 string in Perl 5.6.1

Reply via email to