Vincent Lefevre wrote:

There's no reason that '.' matches something that doesn't belong to
the charset in C locale, but doesn't match in a UTF-8 locale.

In the C locale on GNU/Linux, all byte values are members of the charset. That is why it's OK for '.' to accept that byte in the C locale but reject it in a UTF-8 locale.

It's annoying that now in UTF-8, one can no longer match ISO-8859-1 text

This has been true for quite some time in 'grep', at least with the standard matchers. It may not have been true for -P but that relied on undefined behavior that could crash grep, and we can't have that.

It would make sense to add a notation to mean "match any character or invalid byte", as an extension. That'd take some work, though.


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to