On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt <[EMAIL PROTECTED]> wrote: > On Monday 08 December 2008, Adam Olsen wrote: >> At this point someone suggests we have a type that can store an >> arbitrary mix of unicode and bytes, so the undecodable portions stay >> in their original form. :P > > Well, not an arbitrary mix, but a type that just stores whatever comes from > the system without further specifying it as either bytes or Unicode: > > * If you want a string for displaying it, you first have to extract a string > from that thing and there you optionally specify the encoding and error > behaviour. > * If you want to append a string to it, it is automatically encoded in the > default encoding, which obviously can fail.
So the 2.x str, but with a more interesting default encoding than ASCII. It'll work fine on the developer's system, but one day a user will present it with strange input, and boom. You have to be pessimistic here. The default operations should either always work or never work. Using unicode internally and skipping garbage input means the operations always work. Using a bytes API means mixing with unicode never works, unless the programmer explicitly converts, in which case the onus is on them to use proper error handling. The only thing separating this from a bikeshed discussion is that a bikeshed has many equally good solutions, while we have no good solutions. Instead we're trying to find the least-bad one. The unicode/bytes separation is pretty close to that. Adding a warning gets even closer. Adding magic makes it worse. -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com