Jan Eden wrote:
> 
> Hi,

Hello,

> sorry for the lengthy post.
> 
> I recently wrote a Perl script to convert 8-bit characters to LaTeX
> commands. The first version (which works just fine) looks like this
> (the ... indicates more lines to follow):

Your regular expressions look like they are longer then 8 bits.


> >#!/usr/bin/perl -pw
> >
> >s/â??/{\\glqq}/g;
> >s/â??/{\\grqq}/g;
> >s/á/\\'{a}/g;
> >s/Ã /\\`{a}/g;
> >s/â/\\^{a}/g;
> >s/ä/\\"{a}/g;
> >....
> 
> Now I tried to use a hash instead of consecutive replacement commands.
> The second version looked like this:
> 
> >#!/usr/bin/perl -w
> >
> >%enctabelle = ("â??"=>"{\\glqq}",
> >"â??"=>"{\\grqq}",
> >"á"=>"\\'{a}",
> >"Ã "=>"\\`{a}",
> >"â"=>"\\^{a}",
> >....
> >
> >while (<>) {
> >    $zeile = $_;
> >    foreach $char (keys %enctabelle) {
> >        $zeile =~ s/$char/$enctabelle{$char}/g;
> >    }
> >    print $zeile;
> >}
> 
> This worked, too, but it was extremely slow, obviously since the variables
> where compiled over and over again.
> 
> I gave it a third try like this (code taken from someone else's script):
> 
> >%enctabelle = ("â??"=>"{\\glqq}",
> >"â??"=>"{\\grqq}",
> >"á"=>"\\'{a}",
> >"Ã "=>"\\`{a}",
> >"â"=>"\\^{a}",
> >....
> >
> >while (<>) {
> >   s/(.)/exists $enctabelle{$1} ? $enctabelle{$1} : $1/geo;
> >   print;
> >}
> 
> This did not change the text at all. When I removed the ternary operator
> 
> >s/(.)/exists $enctabelle{$1}/g;
> 
> I got an error message like this:
> 
> >Line 208:  Use of uninitialized value in substitution iterator <> line 1.
> 
> Obviously, Perl cannot interpolate variable names like $enctabelle{ä}.
> Both the script and the file to convert are UTF-8 encoded. What's the problem here?

The problem is probably that you are searching for a single byte (.) not
a UTF character.

perldoc perlunicode
perldoc utf8
perldoc bytes


> On another list, I got a rather complicated snippet I did not fully understand:
> 
> >#!perl
> >
> >%enctabelle = (...);
> >
> >my $re = '(' . join('|', map quotemeta($_), keys %enctabelle) . ')';
> >$re = qr/$re/;
> >
> >while (<>) {
> >  s/$re/$enctabelle{$1}/g;
> >  print;
> >}
> 
> Maybe the quotemeta part is what helps identifying the corresponding value?
> 
> Any hints are greatly appreciated,

Do you want the fastest code?  The shortest code?  The most maintainable
code?  What are you trying to accomplish?


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to