> This is a reflexion made to me by a developer who can use, when
> needed, regexp (ed(1) or sed(1)) on an Unix where they still deal
> with "char" (bytes) to search for a string of bytes in a binary.

i have never needed to do this.  could you provide some motiviation
for grepping for a wierd byte in an executable?  surely the debugger
is better suited for this.

> And after some thought, I don't see an obvious reason why the regexp
> could not be used with bytes strings (so UTF-8 is OK) without trying to
> match runes (since not every bytes string is a correct UTF-8 sequence).

because it makes things more complicated and probablly worse for the
common case, while not providing an new functionality already in
other tools.

> Corollary: I don't know if there is an UTF-8 sequence that can tell:
> stop interpreting as UTF-8, takes "as is" (except every incorrect
> sequence, problem being to come back from there: if everything is OK "as
> is", what can be interpreted as: "stops raw, restart
> UTF-8"---solution: this is on user level, not low level, and this is in
> the shell explicitely delimiting chunks, like "'" is the only delimiter,
> and every embedded "'" has to be "escaped" by doubling it).

i think you've missed the point of making utf-8 *the* character set.
it's not sometimes the character set.  or only on tuesday.  it's always
the character set.

- erik

Reply via email to