Re: recycling internationalized garbage

Fredrik Lundh Wed, 15 Mar 2006 01:06:14 -0800

Martin wrote:

> > The point is that you can tell UTF-8 reliably.


RFC 3629 says "fairly reliably" rather than "reliably", but they mean
the same thing...

> > If the data decodes
> > as UTF-8, it *is* UTF-8, because no other encoding in the world
> > produces the same byte sequences (except for ASCII, which is
> > an UTF-8 subset).

or as the RFC puts it,

    "the probability that a string of characters in any other encoding
    appears as valid UTF-8 is low, diminishing with increasing string
    length".

:::

Ross Ridge wrote:

> It should be obvious that any 8-bit single-byte character set can
> produce byte sequences that are valid in UTF-8.

it should be fairly obvious that you don't know much about UTF-8...

</F>



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: recycling internationalized garbage

Reply via email to