On Wed, 2006-08-09 at 21:51 +0200, gabor wrote:
[...]
> phew... the immortal 
> how-tolerant-we-should-be-when-doing-unicode-conversion problems :-)

Agreed. This is much easier on my side of the fence (lobbing problems),
than your side (solving them).

> i generally prefer to do as little guesswork as possible, but in the 
> case of the environ-variables it seems we cannot avoid it.. after all, 
> it cannot crash when parsing the environ variables, because there's no 
> way from the programmer's side to affect them.
> 
> so what do you think about the following approach:
> 
> try ascii-decoding
> if fails, try utf8-decoding
> if fails do iso-8859-1-decoding (this cannot fail).

I was thinking you could use the locale module to help you somewhat:
locale.getdefaultlocale() and locale.getpreferredencoding() might both
be useful, although experimentation is needed. For example, on my
(Linux) system, getdefaultlocale() returns ('en_AU', 'utf') and I'm
pretty sure 'utf' isn't an encoding (utf-8 is, utf-16 also, but not
plain old utf.. :-( ).

I completely agree this is painful and normally I would punt. But my
crystal ball tells me that you will then get bug reports from Mr
Sagalaev, who is generally both very diligent in his debugging and likes
to use some language with a funny alphabet. If whatever you come up with
works naturally in places like Ivan's setup and maybe somebody who lives
in Hong Kong or Japan or some other East Asian locale, you could
consider this "solved" to some extent.

All that being said, you could start off implementing your list and go
from there (although surely utf-8 decoding will also handle ASCII
strings, so you could skip the first step).

> but imho this should happen only in "special" cases like 
> environ-variables.. for example in get/post params i would prefer to 
> raise an exception when the data cannot be en/de-coded using the 
> configured charset.

*Providing* what we send in the headers is that restrictive. A server
can send what character set encodings it will accept in the header. The
client can pick any one of those to send back. So keep that on your list
of things to check (this is HTTP-level stuff).

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers
-~----------~----~----~----~------~----~------~--~---

Reply via email to