Re: [Python-Dev] email package status in 3.X

Barry Warsaw Mon, 21 Jun 2010 13:33:42 -0700

On Jun 21, 2010, at 01:17 PM, P.J. Eby wrote:

>I'm not really sure how much use the encoding is on a unicode object - what
>would it actually mean?
>
>Hm. I suppose it would effectively mean "this string can be represented in
>this encoding" -- which is useful, in that you could fail operations when
>combining with bytes of a different encoding.


That's basically what I was thinking.

>Hm... no, in that case you should just encode the string to the bytes'
>encoding, and let that throw an error if it fails.  So, really, there's no
>reason for a string to know its encoding.  All you need is the bytes type to
>have an encoding attribute, and when doing mixed-type operations between
>bytes and strings, coerce to *bytes of the same encoding*.

If ebytes were a separate type, and it did the encoding check at constructor
time, and the results of the decoding were cached, then I think you would not
need the equivalent of an estr type.  If you had a string and knew what it
could be encoded to, then you could just coerce it to an ebytes and use the
cached decoded value wherever you needed it.

E.g.

    >>> mystring = 'some unicode string'
    >>> myencoding = 'iso-9999-foo'
    >>> myebytes = ebytes(mystring, myencoding)
    >>> myebytes.encoding == myencoding
    True
    >>> myebytes.string == mystring
    True

So ebytes() could accept a str or bytes as its first argument.

    >>> mybytes = b'some encoded string'
    >>> myebytes = ebytes(mybytes, myencoding)
    >>> mybytes == myebytes
    True
    >>> myebytes.encoding == myencoding
    True

In the first example ebytes() encodes mystring to set the internal bytes
representation.  In the second example, ebytes() decodes the bytes to get the
.string attribute value.  In both cases, an exception is raised if the
encoding/decoding fails.

>However, if .encoding is None, then coercion would follow the same rules as
>now -- i.e., convert the bytes to unicode, assuming an ascii encoding.  (This
>would be different than setting an encoding of 'ascii', because in that case,
>it means you want cross-type operations to result in ascii bytes, rather than
>a unicode string, and to fail if the unicode part can't be encoded
>appropriately.  The 'None' setting is effectively a nod to compatibility with
>prior 3.x versions, since I assume we can't just throw out the old coercion
>behavior.)
>
>Then, a few more changes to the bytes type would round out the implementation:
>
>* Allow .decode() to not specify an encoding, unless .encoding is None
>
>* Add back in the missing string methods (e.g. .encode()), since you can 
>transparently upgrade to a string)
>
>* Smart __str__, as shown in your proposal.

If my example above isn't nonsense, then __str__() would just return the
.string attribute.

>In short, +1.  (I wish it were possible to go back and make bytes non-strings
>and have only this ebytes or bstr or whatever type have string methods, but
>I'm pretty sure that ship has already sailed.)

Maybe it's PEP time?  No, I'm not volunteering. ;)

-Barry

signature.asc
Description: PGP signature

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] email package status in 3.X

Reply via email to