On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt
<[EMAIL PROTECTED]> wrote:
> On Monday 08 December 2008, Adam Olsen wrote:
>> At this point someone suggests we have a type that can store an
>> arbitrary mix of unicode and bytes, so the undecodable portions stay
>> in their original form. :P
>
> Well, not an arbitrary mix, but a type that just stores whatever comes from
> the system without further specifying it as either bytes or Unicode:
>
> * If you want a string for displaying it, you first have to extract a string
> from that thing and there you optionally specify the encoding and error
> behaviour.
> * If you want to append a string to it, it is automatically encoded in the
> default encoding, which obviously can fail.

So the 2.x str, but with a more interesting default encoding than
ASCII.  It'll work fine on the developer's system, but one day a user
will present it with strange input, and boom.

You have to be pessimistic here.  The default operations should either
always work or never work.  Using unicode internally and skipping
garbage input means the operations always work.  Using a bytes API
means mixing with unicode never works, unless the programmer
explicitly converts, in which case the onus is on them to use proper
error handling.

The only thing separating this from a bikeshed discussion is that a
bikeshed has many equally good solutions, while we have no good
solutions.  Instead we're trying to find the least-bad one.  The
unicode/bytes separation is pretty close to that.  Adding a warning
gets even closer.  Adding magic makes it worse.


-- 
Adam Olsen, aka Rhamphoryncus
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to