On Wed, Oct 23, 2019 at 6:04 PM David Mertz <me...@gnosis.cx> wrote: > Is this the same code points identified by `str.isspace`? >
I haven't checked -- so I will: and the answer is no: $ python weird_spaces.py x x x xx x x x x x x x x x x xx x x xx ['x', 'x', 'x', 'x\u180ex', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x\u200bx', 'x', 'x', 'x\ufeffx'] 41 18 [False, True, False, True, False, True, False, False, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, True, False, False, False, True, False, True, False, True, False, False, False] There are only three that didn't split, but many more than three that failed .isspace. Thanks for doing that. I would have soon otherwise. Still, "most of them" > isn't actually a precise answer for an uncertain string. :-) > nope. But it could be defined somewhere, and presumably is, though maybe not consistently. -CHB On Wed, Oct 23, 2019, 8:57 PM Christopher Barker <python...@gmail.com> wrote: > On Wed, Oct 23, 2019 at 5:53 PM Andrew Barnert via Python-ideas < > python-ideas@python.org> wrote: > >> > To be fair, I also don't know which of those split on str.split() with >> no arguments to the method either. >> > > I couldn't resist -- the answer is most of them: > > #!/usr/bin/env python > weird_spaces = ("x\u0020x\u00A0x\u1680x\u180Ex\u2000x\u2001x\u2002" > "x\u2003x\u2004x\u2005x\u2006x\u2007x\u2008x\u2009" > "x\u200Ax\u200Bx\u202Fx\u205Fx\u3000x\uFEFFx") > print(weird_spaces) > splitted = weird_spaces.split() > print(splitted) > > print(len(weird_spaces)) > print(len(splitted)) > > $ python weird_spaces.py > x x x xx x x x x x x x x x x xx x x xx > ['x', 'x', 'x', 'x\u180ex', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', > 'x', 'x\u200bx', 'x', 'x', 'x\ufeffx'] > 41 > 18 > > -CHB > > > -- > Christopher Barker, PhD > > Python Language Consulting > - Teaching > - Scientific Software Development > - Desktop GUI and Web Development > - wxPython, numpy, scipy, Cython > -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
#!/usr/bin/env python weird_spaces = ("x\u0020x\u00A0x\u1680x\u180Ex\u2000x\u2001x\u2002" "x\u2003x\u2004x\u2005x\u2006x\u2007x\u2008x\u2009" "x\u200Ax\u200Bx\u202Fx\u205Fx\u3000x\uFEFFx") print(weird_spaces) splitted = weird_spaces.split() print(splitted) print(len(weird_spaces)) print(len(splitted)) isspace = [c.isspace() for c in weird_spaces] print(isspace)
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ICM2RIS7EA3RXCRVRYTSDALFUQUEDM35/ Code of Conduct: http://python.org/psf/codeofconduct/