@Araq 19:17:16: That makes the distinction between rule 9 ('underscores are removed') and a part of rule 11 ('en-dashes are not removed but ignored') irrelevant indeed. Most of my rules are based on the behaviour of the executables or the compilability of the source, however.
@Araq 19:26:26: That four rules don't explain everything. I'm not quite sure what you mean by your last paragraph. In general, different representations of essentially the same character doesn't make life easier when you want to search through the source for occurrences of them, but that topic has been discussed elsewere. I was just overwhelmed by the complexity of the whole thing, that's all. Why are `{`}!_:!{`}` and `{`}!:_!{`}` okay and is `{`}!_:_!{`}` not okay? It makes me curious about the underlying mechanisms, in case I want to play with it. In most cases, the alphabet and numbers would suffice me, and I would certainly avoid the pathological ones like this example.