Do you mean we should fix *all* of CPython unicode handling, not only str.isascii()?
At least, equality test doesn't care wrong kind. https://github.com/python/cpython/blob/master/Objects/stringlib/eq.h https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Objects/unicodeobject.c#L10871-L10873 https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Objects/unicodeobject.c#L10998-L10999 There may be many others, but I'm not sure. On Fri, Jan 26, 2018 at 10:02 PM, M.-A. Lemburg <m...@egenix.com> wrote: > On 26.01.2018 12:17, INADA Naoki wrote: >>> No, because you can pass in maxchar to PyUnicode_New() and >>> the implementation will take this as hint to the max code point >>> used in the string. There is no check done whether maxchar >>> is indeed the minimum upper bound to the code point ordinals. >> >> API doc says: >> >> """ >> maxchar should be the true maximum code point to be placed in the string. >> As an approximation, it can be rounded up to the nearest value in the >> sequence 127, 255, 65535, 1114111. >> """ >> https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_New >> >> Since doc says *should*, strings created with wrong maxchar >> are considered invalid object. > > Not really: "should" means should, not must :-) Objects created > with PyUnicode_New() are valid and ready (this only has a meaning > for legacy strings). > > You can set maxchar to 64k and still just use ASCII as content. > In some cases, you may want the internal string representation > to be wchar_t compatible or work with Py_UCS2/4, so both 64k > and sys.maxunicode are reasonable and valid values. > > Overall, I'm starting to believe that a str.maxchar() function > would be a better choice than to only go for ASCII. > > This could have an optional parameter "exact" to force scanning > the string and returning the actual max code point ordinal > when set to True (default), or return the approximation based > on the used kind if not set (which is many cases, will give > you a good hint). > > For checking ASCII, you'd then write: > > def isascii(s): > if s.maxchar(exact=False) < 128: > return True > if s.maxchar() < 128: > return True > return False > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Jan 26 2018) >>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>> Python Database Interfaces ... http://products.egenix.com/ >>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > -- INADA Naoki <songofaca...@gmail.com> _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/