The bottom line of this argument is that we should only
support ascii (read English) or the secutity code
will be harder to write.
The article basically says that Unicode is more complex than
ascii therefore security cannot easily validate input strings.
Here is the last bit of the article:
( http://www.counterpane.com/crypto-gram-0007.html#9)
> With Unicode, we probably won't be able to get
> a consistent definition of what to accept, what
> is a delimiter under what circumstance, or how
> to handle arbitrary streams safely. It's just
> a matter of time before simple validators pass
> data and upper layer software, trying to be
> helpful, attach magic-character semantics, and
> we have a brand-new variety of security holes.
>
> Unicode is just too complex to ever be secure.
It would be easy to make a similar (perhaps stronger)
argument that handling all encodings would make security
much more difficult. The multi-byte encoding have a
large range of characters (eg: SJIS, EUC-JP, GB 2312, etc.)
So shall we give up on the rest of the world so that
security coding will be easier?