> This is a reflexion made to me by a developer who can use, when > needed, regexp (ed(1) or sed(1)) on an Unix where they still deal > with "char" (bytes) to search for a string of bytes in a binary.
i have never needed to do this. could you provide some motiviation for grepping for a wierd byte in an executable? surely the debugger is better suited for this. > And after some thought, I don't see an obvious reason why the regexp > could not be used with bytes strings (so UTF-8 is OK) without trying to > match runes (since not every bytes string is a correct UTF-8 sequence). because it makes things more complicated and probablly worse for the common case, while not providing an new functionality already in other tools. > Corollary: I don't know if there is an UTF-8 sequence that can tell: > stop interpreting as UTF-8, takes "as is" (except every incorrect > sequence, problem being to come back from there: if everything is OK "as > is", what can be interpreted as: "stops raw, restart > UTF-8"---solution: this is on user level, not low level, and this is in > the shell explicitely delimiting chunks, like "'" is the only delimiter, > and every embedded "'" has to be "escaped" by doubling it). i think you've missed the point of making utf-8 *the* character set. it's not sometimes the character set. or only on tuesday. it's always the character set. - erik