On Mon, Aug 11, 2014 at 05:15:21PM +0200, Harald Becker wrote: > >IMO there is still something very strange with sed and unicode > > YES! I did not stop looking for this. Looks like this is a problem > in the regular expression parser. > > s /./x/g > > shall match every character and replace with a single x, but indeed > it matches every byte of UTF-8 characters too (which is wrong). But > this doesn't seam to depend on setting of LANG (which confused me). > Is it possible, it only worked when BB is linked with glibc in a > fully functional environment. Maybe than an UTF-8 aware regex > scanner is used. We need to look further on this!
I think this is the result of using uclibc with a broken regex implementation -- either as a result of a build time option (omitting locale? omitting full regex?) or just a deficiency in uclibc. Using glibc or musl would solve it. Rich _______________________________________________ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox