On Thu, Apr 24, 2008 at 11:40 AM, R (Chandra) Chandrasekhar <[EMAIL PROTECTED]> wrote: > Chas. Owens wrote: > > > > The easiest way I can think of is to build a (UTF-8) file named > > itrans2unicode.table that looks like this > > > > a => a > > aa => ā > > ~N => ṅ > > > > > > I have successfully created the file lookup.table containing lines as > suggested above with ASCII and Unicode characters separated by ' => '. > > > > > Then read that file into a hash at startup > > > > Is there an easy way to do this directly? > > When I read the file into a hash, I used ' => ' as a separator pattern for > split and key value assignments as shown below: > > ----------- > #!/usr/bin/perl -C24 > use warnings; > use diagnostics; > use strict; > use utf8; > > open my $fh, "<:utf8", "lookup.table"; > my @lookup = <$fh>; > close $fh; > binmode STDOUT, ':utf8'; > > my %lookup = (); > foreach my $line (@lookup) > { > my ($key, $value) = split / => /, $line; > $lookup{$key} = $value; > print "$key => $lookup{$key}\n"; > } > ----------- > > Is there another, easier way to load the file into a hash, using the > already existing => symbol in the file? > > Otherwise, inserting the ' => ' seems a wasted effort. One could just as > well have used the original two column space or tab separated file and read > it in using the -a option and @F array to assign the ASCII symbol in column > one to the key and the Unicode symbol in column two to the value. > > Thank you. > > Chandra >
There is no great benefit to using => as the separator. I used it because of its implied meaning in Perl (key on the left, value on the right). Also the substitution I mentioned in my email won't work for you. You patterns are between one and three characters long (and the regex dealt with a character at a time). You will probably need something more like my $pattern = join "|", sort keys %lookup; $pattern = qr/$pattern/; while (<>) { s/($pattern)/$lookup{$1}/ge; print; } -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read.