On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby <p...@telecommunity.com> wrote:
> True, but making it a separate type with a required encoding gets rid of the
> magical "I don't know" - the "I don't know" encoding is just a plain old
> bytes object.

So, to boil down the ebytes idea, it is basically a request for a
second string type that holds an octet stream plus an encoding name,
rather than a Unicode character stream. Calling it "ebytes" seems to
emphasise the wrong parallel in that case (you have a 'str' object
with a different internal structure, not any kind of bytes object).
For now I'll call it an "altstr". Then the idea can be described as

- altstr would expose the same API as str, NOT the same API as bytes
- explicit conversion via "str" would use the altstr's __str__ method
- explicit conversion via "bytes" would use the altstr's __bytes__ method
- implicit interaction with str would convert the str to an altstr
object according to the altstr's rules. This may be best handled via a
coercion method on altstr, rather than str actually needing to know
the details (i.e. an altrstr.__coerce_str__() method). For the
'ebytes' model, this would do something like
"type(self)(other.encode(self.encoding), self.encoding))". The
operation would then be handled by the corresponding method on the
coerced object. A new type could then override operations such as
__contains__, __mod__, format() and join().

This is still smelling an awful lot like the 2.x str type to me, but
supporting a __coerce_str__ method may allow some useful
experimentation in this space (as PJE suggested). There's a chance it
would be abused, but it offers a greater chance of success than trying
to come up with a concrete altstr type without providing a means for
experimentation first.

> (In principle, you could then drop *all* the stringlike methods from
> plain-old-bytes objects.  If it's really text-in-bytes you want, you should
> use an ebytes with the encoding specified.)

Except that a lot of those string-like methods are just plain useful,
even when you *know* you're dealing with an octet stream rather than
latin-1 encoded text.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to