On Thu, Dec 20, 2018 at 2:49 PM Jan Palus <at...@pld-linux.org> wrote: > I've just happened to notice a difference in behavior between sed 4.5 and 4.6 > when building VirtualBox. It seems to be locale dependent: > > $ echo 'foo(bar '|LC_ALL=C sed -e 's/\([^*] *\)\bbar\b/\1foo */g' > foo(bar > > $ echo 'foo(bar '|LC_ALL=C.UTF-8 sed -e 's/\([^*] *\)\bbar\b/\1foo */g' > foo(foo * > > In 4.5 both results are the same -- same as the second output with > LC_ALL=C.UTF-8.
Thanks a lot for that report. This is indeed a regression. It also affects the just-release grep-3.2, since the source is in a file used by both: gnulib's dfa.c. I tracked it down to this gnulib/lib/dfa.c commit: v0.1-2213-gae4b73e28 To back that out, I must first revert part of this fix-up patch: v0.1-2281-g95cd86dd7 Here's a demonstrator with grep: (it should match, but with 3.2, does not): $ echo 123-x|LC_ALL=C grep '.\bx' $ To avoid the failure, one can: - specify -P (for PCRE, a different matcher), or - don't use the C locale, but rather use a multi-byte locale like the one you chose, which inhibits use of the DFA matcher, because \b's definition requires multi-byte aware machinery not present in the DFA matcher. I expect to revert the mentioned mentioned gnulib commits, and then to make new releases of both grep and sed.