One of the advantages of utf-8 encoding was that it was easy to re-sync after an invalid sequence.
It's a bit of a waste to then not do that. Minus points for musl. Can you not run sed with LANG=C or LANG=POSIX? Sam On 4 May 2014 15:57, "Rich Felker" <dal...@libc.org> wrote: > On Sun, May 04, 2014 at 04:44:10PM +0200, Denys Vlasenko wrote: > > On Sat, May 3, 2014 at 5:07 PM, Rich Felker <dal...@libc.org> wrote: > > >> Lets refuse to find end of line if there is a non UTF-8 sequence > inside that line? > > >> Sounds wrong to me... > > > > > > sed (also regcomp and regexec) requires text input. Byte streams with > > > illegal sequences are not text. Actually since the regex is not trying > > > to match the illegal sequence, just the end-of-line, it would > > > theoretically be possible to make this work (and it will once we > > > overhaul the regex implementation to work with byte-based DFA's rather > > > than character-based ones), but that doesn't change the fact that it's > > > an invalid test. > > > > Language lawyering is less important that real world usage. > > Indeed it's nice to support additional real-world usage when doing so > does not harm any other usage. But we're not talking about real-world > usage here. We're talking about a buggy configure test. > > I'd love to improve or even rewrite the regex engine but that's a lot > of work and lower priority than a number of other things on the musl > roadmap. > > Rich > _______________________________________________ > busybox mailing list > busybox@busybox.net > http://lists.busybox.net/mailman/listinfo/busybox >
_______________________________________________ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox