Bengt Richter wrote:
On Wed, 20 Apr 2005 10:55:18 +0200, David Fraser <[EMAIL PROTECTED]> wrote:


Greg Ewing wrote:

Will McGugan wrote:


Hi,

I'm curious about the behaviour of the str.split() when applied to empty strings.

"".split() returns an empty list, however..

"".split("*") returns a list containing one empty string.


Both of these make sense as limiting cases.

Consider

>>> "a b c".split()
['a', 'b', 'c']
>>> "a b".split()
['a', 'b']
>>> "a".split()
['a']
>>> "".split()
[]

and

>>> "**".split("*")
['', '', '']
>>> "*".split("*")
['', '']
>>> "".split("*")
['']

The split() method is really doing two somewhat different things
depending on whether it is given an argument, and the end-cases
come out differently.


You don't really explain *why* they make sense as limiting cases, as your examples are quite different.


Consider

"a*b*c".split("*")

['a', 'b', 'c']

"a*b".split("*")

['a', 'b']

"a".split("*")

['a']

"".split("*")

['']

Now how is this logical when compared with split() above?


The trouble is that s.split(arg) and s.split() are two different functions.

The first is 1:1 and reversible like arg.join(s.split(arg))==s
The second is not 1:1 nor reversible: '<<various whitespace>>'.join(s.split()) 
== s ?? Not usually.

I think you can do it with the equivalent whitespace regex, preserving the 
splitout whitespace
substrings and ''.joining those back with the others, but not with split(). 
I.e.,

 >>> def splitjoin(s, splitter=None):
 ...     return (splitter is None and '<<whitespace>>' or 
splitter).join(s.split(splitter))
 ...
 >>> splitjoin('a*b*c', '*')
 'a*b*c'
 >>> splitjoin('a*b', '*')
 'a*b'
 >>> splitjoin('a', '*')
 'a'
 >>> splitjoin('', '*')
 ''
 >>> splitjoin('a b    c')
 'a<<whitespace>>b<<whitespace>>c'
 >>> splitjoin('a b    ')
 'a<<whitespace>>b'
 >>> splitjoin('  b    ')
 'b'
 >>> splitjoin('')
 ''

 >>> splitjoin('*****','*')
 '*****'
Note why that works:

 >>> '*****'.split('*')
 ['', '', '', '', '', '']
 >>> '*a'.split('*')
 ['', 'a']
 >>> 'a*'.split('*')
 ['a', '']

 >>> splitjoin('*a','*')
 '*a'
 >>> splitjoin('a*','*')
 'a*'

Thanks, this makes sense.
So ideally if we weren't dealing with backward compatibility these functions might have different names... "split" (with arg) and "spacesplit" (without arg)
In fact it would be nice to allow an argument to "spacesplit" specifying the characters regarded as 'space'
But all not worth breaking current code :-)


David
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to