Re: Substituting malformed UTF-8 sequences in a decoder

Edmund GRIMLEY EVANS Fri, 28 Jul 2000 08:41:58 -0700

Markus Kuhn <[EMAIL PROTECTED]>:

> I see valuable binary data (PDF & ZIP files, etc.) being destroyed
> almost every day by accidentally applied stupid lossy CRLF -> LF -> CRLF
> data conversion that supposedly smart software tries to perform on the
> fly. I foresee similar non-recoverable data conversion accidents if we
> try to establish software that wipes out malformed UTF-8 sequence
> without mercy and destructs all information that they might have
> contained.

Here the problem is that the program is misconverting on the fly and
not giving an error. If the program stopped with an error half way
through the user would know there was a problem and be able to do
something about it.

So, I don't think a UTF-8 decoder, as implemented in a library, should
do anything other than give an error if it encounters malformed UTF-8.
The user should be told that something has gone wrong. Clever
reversible conversion of malformed sequences is more likely to hide a
real problem, causing a bigger problem later, than to be helpful, I
suspect.

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Re: Substituting malformed UTF-8 sequences in a decoder

Reply via email to