[issue28937] str.split(): allow removing empty strings (when sep is not None)

Andrei Kulakov Sat, 05 Jun 2021 18:23:18 -0700

Andrei Kulakov <[email protected]> added the comment:

> I imagine that the discussion focussed on this since this is precisely what 
> happens when sep=None. For example, 'a     b   c     '.split() == ['a', 'b', 
> 'c']. I guess that the point was to provide users with explicit, manual 
> control over whether the behaviour of split should drop all empty strings or 
> retain all empty strings instead of this decision just being made on whether 
> sep is None or not.


That's true on some level but it seems to me that it's somewhat more nuanced 
than that.

The intent of sep=None is not to remove empties but to collapse invisible 
whitespace of mixed types into a single separator. ' \t  ' probably means a 
single separator because it looks like one visually. Yes, the effect is the 
same as removing empties but it's a relevant distinction when designing (and 
naming) a flag to make split() consistent with this behaviour when sep is ',', 
';', etc.

Because when you have 'a,,,' - the most likely intent is to have 3 empty 
values, NOT to collapse 3 commas into a single sep; - and then you might 
potentially have additional processing that gets rid of empties, as part of 
split() operation. So it's quite a different operation, even though the end 
effect is the same. So is this change really making the behaviour consistent? 
To me, consistency implies that intent is roughly the same, and outcome is also 
roughly the same. 

You might say, but: practicality beats purity?

However, there are some real issues here:

- harder to explain, remember, document.
- naming issue
- not completely solving the initial issue (and it would most likely leave no 
practical way to patch up that corner case if this PR is accepted)

Re: naming, for example, using keep_empty=False for sep=None is confusing, - it 
would seem that most (or even all) users would think of the operation as 
collapsing contiguous mixed whitespace into a single separator rather than 
splitting everything up and then purging empties. So this name could cause a 
fair bit of confusion for this case.

What if we call it `collapse_contiguous_separators`? I can live with an awkward 
name, but even then it doesn't work for the case like 'a,,,,' -- it doesn't 
make sense (mostly) to collapse 4 commas into one separator. Here you are 
actually purging empty values.

So the consistency seems labored in that any name you pick would be confusing 
for some cases.

And is the consistency for this case really needed? Is it common to have 
something like 'a,,,,' and say "I wish to get rid of those empty values but I 
don't want to use filter(None, values)"?

In regard to the workaround you suggested, that seems fine. If this PR is 
accepted, any of the workarounds that people now use for ''.split(',') or 
similar would still work just as before..

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue28937>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue28937] str.split(): allow removing empty strings (when sep is not None)

Reply via email to