On Tue, Dec 11, 2001 at 07:00:20PM -0600, Michael A. Grady wrote: > Now that Perl 5.6.1 has removed support for tr///CU, is there still > an easy way to take a latin-1 character string and convert it to > a UTF8 string? I need to do that for generating LDIF files to load > into an LDAP server.
No need for fancyisms. I think the below might work even in perl4... #!/usr/bin/perl -sp if ($r) { # UTF-8 to Latin-1 s/([\xC0-\xDF])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg; } else { # Latin-1 to UTF-8 s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg; } > I saw mention of using pack('U0',...), but I can't figure out how that > actually works. E.g. Given a variable $string with a value of 'Áine', I'd > like to get the corresponding string in utf8. pack("U0U*", unpack("C*", $latin1here)) > -- > Michael A. Grady [EMAIL PROTECTED] > Senior Research Programmer http://ljordal.cso.uiuc.edu > Computing & Communications Services Office (217) 244-1253 phone > University of Illinois at Urbana-Champaign (217) 265-5635 fax > Rm. 103, MC 680, 2212 Fox Drive, Suite C Champaign, IL 61820 -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen