Lange writes:
 > for your understanding:

 > � is equal to hexcode xf4

 > � is equal to hexcode xf5

 > 

 > 

 > the following problem appeared:

 >  

 > $var="�init1�Dresden,�m� Gewerbeg. Kesselsdorf";

 > 

 > After a

 > $var=~s/(\xf4.*?\xf5)//g;

 > it�s supposed to keep the following:

 > "Dresden, Gewerbeg. Kesselsdorf"

 > 

 > With my old version (v5.005_3 build 517) it worked without any complications.

 > with the new perl-version (v5.6.0 build 616) it doesn�t work.

 > 

 > I�ve tried a lot and found out, that if i replace

 > xf4 with e.g. xxxxxx and

 > xf5 with e.g. yyyyyy and let the RegExpr go over it then, everything will

 > work as i want it to.

 > But i have absolutely no clue why it won�t work the other way.

 > I�ve converted the signs into HEX too, just to see, whether they�re really

 > xf4 and xf5...they are indeed.

 >  

 > Thanx andreas. ;)))


Hmmm... it seems to be related to the non-greedy operator and
characters with their top bit set. The following:

use strict;

my $var="\xf4some stuff\xf5";
if ($var =~ /\xf4.*\xf5/) {
    print "greedy match succeeded\n";
}
else {
    print "greedy match failed\n"
}
if ($var =~ /\xf4.*?\xf5/) {
    print "non-greedy match succeeded\n";
}
else {
    print "non-greedy match failed\n"
}

produces on 5.6.1
greedy match succeeded
non-greedy match failed

and on 5.004_04
greedy match succeeded
non-greedy match succeeded

If the delimiters do not have their top bit set both matches are
successful. Smells a bit like a bug.

A work-around (this is Perl after all) is:

s/\xf4[^\xf5]+\xf5//g

HTH

-- 
Brian Raven
It's, uh, pseudo code.  Yeah, that's the ticket...
[...]
And "unicode" is pseudo code for $encoding.  :-)
             -- Larry Wall in 
<[EMAIL PROTECTED]>rدx���f��)��+-�b��ޮYb�����+�J֭y�&��i��b�����+�J֭y�&�f��f��X��)ߣ���+ޥ��

Reply via email to