One recommendation: for starters, I'd much rather see the bytes type
standardized without a literal notation. There should be are lots of
ways to create bytes objects from string objects, with specific
explicit encodings, and those should suffice, at least initially.

I also wonder if having a b"..." literal would just add more confusion
-- bytes are not characters, but b"..." makes it appear as if they
are.

--Guido

On 2/11/06, Bengt Richter <[EMAIL PROTECTED]> wrote:
> On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum <[EMAIL PROTECTED]> 
> wrote:
>
> >> On Sat, 11 Feb 2006 05:08:09 +0000 (UTC), Neil Schemenauer <[EMAIL 
> >> PROTECTED]> > >The backwards compatibility problems *seem* to be 
> >> relatively minor.
> >> >I only found one instance of breakage in the standard library.  Note
> >> >that my patch does not change PyObject_Str(); that would break
> >> >massive amounts of code.  Instead, I introduce a new function:
> >> >PyString_New().  I'm not crazy about the name but I couldn't think
> >> >of anything better.
> >
> >On 2/10/06, Bengt Richter <[EMAIL PROTECTED]> wrote:
> >> Should this not be coordinated with PEP 332?
> >
> >Probably.. But that PEP is rather incomplete. Wanna work on fixing that?
> >
> I'd be glad to add my thoughts, but first of course it's Skip's PEP,
> and Martin casts a long shadow when it comes to character coding issues
> that I suspect will have to be considered.
>
> (E.g., if there is a b'...' literal for bytes, the actual characters of
> the source code itself that the literal is being expressed in could be ascii
> or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the 
> source
> is at least temporarily normalized to Unicode, and then re-encoded (except now
> for string literals?) per coding cookie or other encoding inference. (I may be
> out of date, gotta catch up).
>
> If one way or the other a string literal is in Unicode, then presumably so is
> a byte string b'...' literal -- i.e. internally u"b'...'" just before
> being turned into bytes.
>
> Should that then be an internal straight u"b'...'".encode('byte') with 
> default ascii + escapes
> for non-ascii and non-printables, to define the full 8 bits without encoding 
> error?
> Should unicode be encodable into byte via a specific encoding? E.g., 
> u'abc'.encode('byte','latin1'),
> to distinguish producing a mutable byte string vs an immutable str type as 
> with u'abc'.encode('latin1').
> (but how does this play with str being able to produce unicode? And when do 
> these changes happen?)
> I guess I'm getting ahead of myself ;-)
>
> So I would first ask Skip what he'd like to do, and Martin for some hints on 
> reading, to avoid
> going down paths he already knows lead to brick walls ;-) And I need to think 
> more about PEP 349.
>
> I would propose to do the reading they suggest, and edit up a new version of 
> pep-0332.txt
> that anyone could then improve further. I don't know about an early deadline. 
> I don't want
> to over-commit, as time and energies vary. OTOH, as you've noticed, I could 
> be spending my
> time more effectively ;-)
>
> I changed the thread title, and will wait for some signs from you, Skip, 
> Martin, Neil, and I don't
> know who else might be interested...
>
> Regards,
> Bengt Richter
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to