Lange writes:
> for your understanding:
> � is equal to hexcode xf4
> � is equal to hexcode xf5
>
>
> the following problem appeared:
>
> $var="�init1�Dresden,�m� Gewerbeg. Kesselsdorf";
>
> After a
> $var=~s/(\xf4.*?\xf5)//g;
> it�s supposed to keep the following:
> "Dresden, Gewerbeg. Kesselsdorf"
>
> With my old version (v5.005_3 build 517) it worked without any complications.
> with the new perl-version (v5.6.0 build 616) it doesn�t work.
>
> I�ve tried a lot and found out, that if i replace
> xf4 with e.g. xxxxxx and
> xf5 with e.g. yyyyyy and let the RegExpr go over it then, everything will
> work as i want it to.
> But i have absolutely no clue why it won�t work the other way.
> I�ve converted the signs into HEX too, just to see, whether they�re really
> xf4 and xf5...they are indeed.
>
> Thanx andreas. ;)))
Hmmm... it seems to be related to the non-greedy operator and
characters with their top bit set. The following:
use strict;
my $var="\xf4some stuff\xf5";
if ($var =~ /\xf4.*\xf5/) {
print "greedy match succeeded\n";
}
else {
print "greedy match failed\n"
}
if ($var =~ /\xf4.*?\xf5/) {
print "non-greedy match succeeded\n";
}
else {
print "non-greedy match failed\n"
}
produces on 5.6.1
greedy match succeeded
non-greedy match failed
and on 5.004_04
greedy match succeeded
non-greedy match succeeded
If the delimiters do not have their top bit set both matches are
successful. Smells a bit like a bug.
A work-around (this is Perl after all) is:
s/\xf4[^\xf5]+\xf5//g
HTH
--
Brian Raven
It's, uh, pseudo code. Yeah, that's the ticket...
[...]
And "unicode" is pseudo code for $encoding. :-)
-- Larry Wall in
<[EMAIL PROTECTED]>