Andrei Kulakov <[email protected]> added the comment:
> I imagine that the discussion focussed on this since this is precisely what
> happens when sep=None. For example, 'a b c '.split() == ['a', 'b',
> 'c']. I guess that the point was to provide users with explicit, manual
> control over whether the behaviour of split should drop all empty strings or
> retain all empty strings instead of this decision just being made on whether
> sep is None or not.
That's true on some level but it seems to me that it's somewhat more nuanced
than that.
The intent of sep=None is not to remove empties but to collapse invisible
whitespace of mixed types into a single separator. ' \t ' probably means a
single separator because it looks like one visually. Yes, the effect is the
same as removing empties but it's a relevant distinction when designing (and
naming) a flag to make split() consistent with this behaviour when sep is ',',
';', etc.
Because when you have 'a,,,' - the most likely intent is to have 3 empty
values, NOT to collapse 3 commas into a single sep; - and then you might
potentially have additional processing that gets rid of empties, as part of
split() operation. So it's quite a different operation, even though the end
effect is the same. So is this change really making the behaviour consistent?
To me, consistency implies that intent is roughly the same, and outcome is also
roughly the same.
You might say, but: practicality beats purity?
However, there are some real issues here:
- harder to explain, remember, document.
- naming issue
- not completely solving the initial issue (and it would most likely leave no
practical way to patch up that corner case if this PR is accepted)
Re: naming, for example, using keep_empty=False for sep=None is confusing, - it
would seem that most (or even all) users would think of the operation as
collapsing contiguous mixed whitespace into a single separator rather than
splitting everything up and then purging empties. So this name could cause a
fair bit of confusion for this case.
What if we call it `collapse_contiguous_separators`? I can live with an awkward
name, but even then it doesn't work for the case like 'a,,,,' -- it doesn't
make sense (mostly) to collapse 4 commas into one separator. Here you are
actually purging empty values.
So the consistency seems labored in that any name you pick would be confusing
for some cases.
And is the consistency for this case really needed? Is it common to have
something like 'a,,,,' and say "I wish to get rid of those empty values but I
don't want to use filter(None, values)"?
In regard to the workaround you suggested, that seems fine. If this PR is
accepted, any of the workarounds that people now use for ''.split(',') or
similar would still work just as before..
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue28937>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com