[Python-3000] Offtopic: declaring encoding

Paul Prescod Sat, 09 Sep 2006 10:41:57 -0700

On 9/9/06, Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> wrote:

> Note that there are plenty of other characters that should be
> treated as ignorable, so the applications that are broken for BOMs
> are broken more generally.

I disagree. UTF-8 BOM should not be used on Unix. It's not a reliable
method of encoding detection in general (applies only to Unicode),
and it breaks the simplicity of text streams.

We're offtopic but: treating these decisions as operating-system-specific is a big part of what caused the current mess. e.g with Japanese Windows users and Japanese Unix users using different encodings. The Unicode consortium should address the issue of auto-encoding and make a recommendation for how "raw" text files can have their encoding detected. A combination of BOM, coding declaration and fall-back to UTF-8 would cover the vast majority of the world's languages and incorporate many national encodings.

Are you defending the status quo wherein text data cannot even be reliably processed on the desktop on which it was created (yes, even on Unix: look back in this thread). Do you have a positive prescription?

Paul Prescod

_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

[Python-3000] Offtopic: declaring encoding

Reply via email to