Hi Peter, Peter Cordes wrote:
> This mawk bug boils down to mawk not respecting ^ when the pattern > matches the empty string. (I haven't dug into mawk's regex code, or > tried other permutations, so while this new title is correct, it might > not be where the bug is really coming from.) > > > mawk 'BEGIN{s=" foo bar";gsub(/^ */, "_", s);print s;}'</dev/null > _f_o_o_b_a_r_ Thanks for the analysis. Looking at the code, the problem is very simple: to handle repeat matches, gsub runs recursively on the rest of the string. The rest of the string starts with the anchored expression. So gsub makes spurious matches. The problem experienced is not restricted to empty matches. For example, echo aa | mawk '{ str = $0; gsub(/^a/, "MATCH", str); print str; } should exhibit the same trouble. However, the same problem could occur if an _unanchored_ regexp matches the empty string: after a legitimate match, gsub would run recursively on the rest of the string and match again. Luckily, Mike Brennan already handles this and has skipped those matches since version 0.97 (maybe even earlier). ;-) The solution: after a legitimate match, reject anchored matches just as if they were empty. Thomas Dickey implemented this fix in mawk 1.3.3-20090727. The relevant change is very small: just add } else if (isAnchored(re)) { repl_destroy(&xrepl); return new_STRING1(target, target_len); after the /* target is empty string */ case in gsub() from bi_funct.c, where isAnchored(re) is a new macro expanding to (((RE_DATA *)(ptr))->anchored). http://git.debian.org/?p=collab-maint/mawk.git;a=commitdiff;h=cb2ac5fd#patch3 has the unsplit patch. If you’d like a patch against Debian sid, just let me know. Regards, Jonathan -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org