Hi Peter,

Peter Cordes wrote:

> This mawk bug boils down to mawk not respecting ^ when the pattern
> matches the empty string.  (I haven't dug into mawk's regex code, or
> tried other permutations, so while this new title is correct, it might
> not be where the bug is really coming from.)
> 
> 
>  mawk 'BEGIN{s=" foo bar";gsub(/^ */, "_", s);print s;}'</dev/null
> _f_o_o_b_a_r_

Thanks for the analysis.  Looking at the code, the problem is very
simple: to handle repeat matches, gsub runs recursively on the rest of
the string.  The rest of the string starts with the anchored
expression.  So gsub makes spurious matches.

The problem experienced is not restricted to empty matches.  For
example,

 echo aa | mawk '{ str = $0; gsub(/^a/, "MATCH", str); print str; }

should exhibit the same trouble.

However, the same problem could occur if an _unanchored_ regexp
matches the empty string: after a legitimate match, gsub would run
recursively on the rest of the string and match again.  Luckily, Mike
Brennan already handles this and has skipped those matches since
version 0.97 (maybe even earlier). ;-)

The solution: after a legitimate match, reject anchored matches just
as if they were empty.  Thomas Dickey implemented this fix in
mawk 1.3.3-20090727.  The relevant change is very small: just add

    } else if (isAnchored(re)) {
        repl_destroy(&xrepl);
        return new_STRING1(target, target_len);

after the /* target is empty string */ case in gsub() from bi_funct.c,
where isAnchored(re) is a new macro expanding to
(((RE_DATA *)(ptr))->anchored).

http://git.debian.org/?p=collab-maint/mawk.git;a=commitdiff;h=cb2ac5fd#patch3
has the unsplit patch.  If you’d like a patch against Debian sid,
just let me know.

Regards,
Jonathan



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to