Jan Eden wrote:
>
> Hi,
Hello,
> sorry for the lengthy post.
>
> I recently wrote a Perl script to convert 8-bit characters to LaTeX
> commands. The first version (which works just fine) looks like this
> (the ... indicates more lines to follow):
Your regular expressions look like they are longer then 8 bits.
> >#!/usr/bin/perl -pw
> >
> >s/�??/{\\glqq}/g;
> >s/�??/{\\grqq}/g;
> >s/á/\\'{a}/g;
> >s/� /\\`{a}/g;
> >s/â/\\^{a}/g;
> >s/ä/\\"{a}/g;
> >....
>
> Now I tried to use a hash instead of consecutive replacement commands.
> The second version looked like this:
>
> >#!/usr/bin/perl -w
> >
> >%enctabelle = ("�??"=>"{\\glqq}",
> >"�??"=>"{\\grqq}",
> >"á"=>"\\'{a}",
> >"� "=>"\\`{a}",
> >"â"=>"\\^{a}",
> >....
> >
> >while (<>) {
> > $zeile = $_;
> > foreach $char (keys %enctabelle) {
> > $zeile =~ s/$char/$enctabelle{$char}/g;
> > }
> > print $zeile;
> >}
>
> This worked, too, but it was extremely slow, obviously since the variables
> where compiled over and over again.
>
> I gave it a third try like this (code taken from someone else's script):
>
> >%enctabelle = ("�??"=>"{\\glqq}",
> >"�??"=>"{\\grqq}",
> >"á"=>"\\'{a}",
> >"� "=>"\\`{a}",
> >"â"=>"\\^{a}",
> >....
> >
> >while (<>) {
> > s/(.)/exists $enctabelle{$1} ? $enctabelle{$1} : $1/geo;
> > print;
> >}
>
> This did not change the text at all. When I removed the ternary operator
>
> >s/(.)/exists $enctabelle{$1}/g;
>
> I got an error message like this:
>
> >Line 208: Use of uninitialized value in substitution iterator <> line 1.
>
> Obviously, Perl cannot interpolate variable names like $enctabelle{ä}.
> Both the script and the file to convert are UTF-8 encoded. What's the problem here?
The problem is probably that you are searching for a single byte (.) not
a UTF character.
perldoc perlunicode
perldoc utf8
perldoc bytes
> On another list, I got a rather complicated snippet I did not fully understand:
>
> >#!perl
> >
> >%enctabelle = (...);
> >
> >my $re = '(' . join('|', map quotemeta($_), keys %enctabelle) . ')';
> >$re = qr/$re/;
> >
> >while (<>) {
> > s/$re/$enctabelle{$1}/g;
> > print;
> >}
>
> Maybe the quotemeta part is what helps identifying the corresponding value?
>
> Any hints are greatly appreciated,
Do you want the fastest code? The shortest code? The most maintainable
code? What are you trying to accomplish?
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>