On 22 August 2016 at 19:47, Stephen J. Turnbull
wrote:
> Nick Coghlan writes:
>
> > However, the real problem with this proposal (and the reason why the
> > switch from 8-bit str to "bytes are effectively a tuple of ints" in
> > Python 3 was such a pain), is that there are a lot of bytes and text
> > processing operations that *really do* operate code point by code
> > point.
>
> Sure, but code points aren't strings in any language I use except
> Python. And AFAIK strings are the only case in Python where a
> singleton *is* an element, and an element *is* a singleton.
Sure, but the main concern at hand ("list(strobj)" giving a broken out
list of individual code points rather than TypeError) isn't actually
related to the fact those individual items are themselves length-1
strings, it's related to the fact that Python normally considers
strings to be a sequence type rather than a scalar value type.
str is far from the only builtin container type that NumPy gives the
scalar treatment when sticking it into an array:
>>> np.array("abc")
array('abc', dtype='>> np.array(b"abc")
array(b'abc', dtype='|S3')
>>> np.array({1, 2, 3})
array({1, 2, 3}, dtype=object)
>>> np.array({1:1, 2:2, 3:3})
array({1: 1, 2: 2, 3: 3}, dtype=object)
(Interestingly, both bytearray and memoryview get interpreted as
"uint8" arrays, unlike the bytes literal - presumably the latter
discrepancy is a requirement for compatibility with NumPy's
str/unicode handling in Python 2)
That's why I suggested that a scalar proxy based on wrapt.ObjectProxy
that masked all container related protocols could be an interesting
future addition to the standard library (especially if it has been
battle-tested on PyPI first). "I want to take this container instance,
and make it behave like it wasn't a container, even if other code
tries to use it as a container" is usually what people are after when
they find str iteration inconvenient, but "treat this container as a
scalar value, but otherwise expose all of its methods" is an operation
with applications beyond strings.
Not-so-coincidentally, that approach would also give us a de facto
"code point" type: it would be the result of applying the scalar proxy
to a length 1 str instance.
Cheers,
Nick.
--
Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/