On Wed, 2006-08-09 at 21:51 +0200, gabor wrote: [...] > phew... the immortal > how-tolerant-we-should-be-when-doing-unicode-conversion problems :-)
Agreed. This is much easier on my side of the fence (lobbing problems), than your side (solving them). > i generally prefer to do as little guesswork as possible, but in the > case of the environ-variables it seems we cannot avoid it.. after all, > it cannot crash when parsing the environ variables, because there's no > way from the programmer's side to affect them. > > so what do you think about the following approach: > > try ascii-decoding > if fails, try utf8-decoding > if fails do iso-8859-1-decoding (this cannot fail). I was thinking you could use the locale module to help you somewhat: locale.getdefaultlocale() and locale.getpreferredencoding() might both be useful, although experimentation is needed. For example, on my (Linux) system, getdefaultlocale() returns ('en_AU', 'utf') and I'm pretty sure 'utf' isn't an encoding (utf-8 is, utf-16 also, but not plain old utf.. :-( ). I completely agree this is painful and normally I would punt. But my crystal ball tells me that you will then get bug reports from Mr Sagalaev, who is generally both very diligent in his debugging and likes to use some language with a funny alphabet. If whatever you come up with works naturally in places like Ivan's setup and maybe somebody who lives in Hong Kong or Japan or some other East Asian locale, you could consider this "solved" to some extent. All that being said, you could start off implementing your list and go from there (although surely utf-8 decoding will also handle ASCII strings, so you could skip the first step). > but imho this should happen only in "special" cases like > environ-variables.. for example in get/post params i would prefer to > raise an exception when the data cannot be en/de-coded using the > configured charset. *Providing* what we send in the headers is that restrictive. A server can send what character set encodings it will accept in the header. The client can pick any one of those to send back. So keep that on your list of things to check (this is HTTP-level stuff). Regards, Malcolm --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers -~----------~----~----~----~------~----~------~--~---