Antoine> If an stdlib function returns an 8-bit string containing Antoine> non-ascii data, then this string used in unicode context incurs Antoine> an implicit conversion, which fails.
Such strings should be converted to Unicode at the point where they enter the application. That's likely the only place where you have a good chance of knowing the data encoding. Files generally have no encoding information associated with them. Some databases don't handle Unicode transparently. If you hang onto the input from such devices as plain strings until you need them as Unicode, you will almost certainly not know how the string was encoded. The state of the outside Unicode world being as miserable as it is (think web input forms), you often don't know the encoding at the interface and have to guess anyway. Even so, isolating that guesswork to the interface is better than recovering somewhere further downstream. Skip _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com