On Fri, Apr 27, 2007 at 05:15:16PM +0600, Christopher Fynn wrote: > N3266 was discussed and rejected by WG2 yesterday. As you pointed out > there are all sorts of problems with this proposal, and accepting it > would break many existing implementations.
That's good to hear. In followup, I think the whole idea of trying to standardize error handling is flawed. What you should do when encountering invalid data varies a lot depending on the application. For filenames or text file contents you probably want to avoid corrupting them at all costs, even if they contain illegal sequences, to avoid catastrophic data loss or vulnerabilities. On the other hand, when presenting or converting data, there are many approaches that are all acceptable. These include dropping the corrupt data, replacing it with U+FFFD, or even interpreting the individual bytes according to a likely legacy codepage. This last option is popular for example in IRC clients and works well to deal with the stragglers who refuse to upgrade their clients to use UTF-8. Also, some applications may wish to give fatal errors and refuse to process data at all unless it's valid to begin with. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/