On Feb 13, 2006, at 7:29 PM, Guido van Rossum wrote: > There's one property that bytes, str and unicode all share: type(x[0]) > == type(x), at least as long as len(x) >= 1. This is perhaps the > ultimate test for string-ness.
But not perfect, since of course other containers can contain objects of their own type too. But it leads to an interesting issue... > Or should b[0] be an int, if b is a bytes object? That would change > things dramatically. This makes me think I want an unsigned byte type, which b[0] would return. In another thread I think someone mentioned something about fixed width integral types, such that you could have an object that was guaranteed to be 8-bits wide, 16-bits wide, etc. Maybe you also want signed and unsigned versions of each. This may seem like YAGNI to many people, but as I've been working on a tightly embedded/ extended application for the last few years, I've definitely had occasions where I wish I could more closely and more directly model my C values as Python objects (without using the standard workarounds or writing my own C extension types). But anyway, without hyper-generalizing, it's still worth asking whether a bytes type is just a container of byte objects, where the contained objects would be distinct, fixed 8-bit unsigned integral types. > There's also the consideration for APIs that, informally, accept > either a string or a sequence of objects. Many of these exist, and > they are probably all being converted to support unicode as well as > str (if it makes sense at all). Should a bytes object be considered as > a sequence of things, or as a single thing, from the POV of these > types of APIs? Should we try to standardize how code tests for the > difference? (Currently all sorts of shortcuts are being taken, from > isinstance(x, (list, tuple)) to isinstance(x, basestring).) I think bytes objects are very much like string objects today -- they're the photons of Python since they can act like either sequences or scalars, depending on the context. For example, we have code that needs to deal with situations where an API can return either a scalar or a sequence of those scalars. So we have a utility function like this: def thingiter(obj): try: it = iter(obj) except TypeError: yield obj else: for item in it: yield item Maybe there's a better way to do this, but the most obvious problem is that (for our use cases), this fails for strings because in this context we want strings to act like scalars. So we add a little test just before the "try:" like "if isinstance(obj, basestring): yield obj". But that's yucky. I don't know what the solution is -- if there /is/ a solution short of special case tests like above, but I think the key observation is that sometimes you want your string to act like a sequence and sometimes you want it to act like a scalar. I suspect bytes objects will be the same way. -Barry _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com