On Wed, 20 Apr 2005 10:55:18 +0200, David Fraser <[EMAIL PROTECTED]> wrote:
>Greg Ewing wrote: >> Will McGugan wrote: >> >>> Hi, >>> >>> I'm curious about the behaviour of the str.split() when applied to >>> empty strings. >>> >>> "".split() returns an empty list, however.. >>> >>> "".split("*") returns a list containing one empty string. >> >> >> Both of these make sense as limiting cases. >> >> Consider >> >> >>> "a b c".split() >> ['a', 'b', 'c'] >> >>> "a b".split() >> ['a', 'b'] >> >>> "a".split() >> ['a'] >> >>> "".split() >> [] >> >> and >> >> >>> "**".split("*") >> ['', '', ''] >> >>> "*".split("*") >> ['', ''] >> >>> "".split("*") >> [''] >> >> The split() method is really doing two somewhat different things >> depending on whether it is given an argument, and the end-cases >> come out differently. >> >You don't really explain *why* they make sense as limiting cases, as >your examples are quite different. > >Consider > >>> "a*b*c".split("*") >['a', 'b', 'c'] > >>> "a*b".split("*") >['a', 'b'] > >>> "a".split("*") >['a'] > >>> "".split("*") >[''] > >Now how is this logical when compared with split() above? The trouble is that s.split(arg) and s.split() are two different functions. The first is 1:1 and reversible like arg.join(s.split(arg))==s The second is not 1:1 nor reversible: '<<various whitespace>>'.join(s.split()) == s ?? Not usually. I think you can do it with the equivalent whitespace regex, preserving the splitout whitespace substrings and ''.joining those back with the others, but not with split(). I.e., >>> def splitjoin(s, splitter=None): ... return (splitter is None and '<<whitespace>>' or splitter).join(s.split(splitter)) ... >>> splitjoin('a*b*c', '*') 'a*b*c' >>> splitjoin('a*b', '*') 'a*b' >>> splitjoin('a', '*') 'a' >>> splitjoin('', '*') '' >>> splitjoin('a b c') 'a<<whitespace>>b<<whitespace>>c' >>> splitjoin('a b ') 'a<<whitespace>>b' >>> splitjoin(' b ') 'b' >>> splitjoin('') '' >>> splitjoin('*****','*') '*****' Note why that works: >>> '*****'.split('*') ['', '', '', '', '', ''] >>> '*a'.split('*') ['', 'a'] >>> 'a*'.split('*') ['a', ''] >>> splitjoin('*a','*') '*a' >>> splitjoin('a*','*') 'a*' Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list