[issue28937] str.split(): remove empty strings when sep is not None

2019-03-19 Thread Emanuel Barry


Emanuel Barry  added the comment:

Unfortunately not. I no longer have the time or means to work on this, sorry. I 
hope someone else can pick it up.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2019-03-19 Thread Emanuel Barry


Change by Emanuel Barry :


--
nosy:  -ebarry

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2019-03-19 Thread Barry A. Warsaw


Barry A. Warsaw  added the comment:

@veky - Thank you for pointing out splitlines(keepends=True).  If we wanted 
consistency, then we'd change the sense and use something like 
.split(keepempty=True), however:

* I don't like run-on names, so I would suggest keep_empty
* Maybe just `keep` is enough
* Either way, this should be a keyword only argument
* The default would still be None (i.e. current behavior), but keep_empty=True 
would be equivalent to prune=False and keep_empty=False would be equivalent to 
prune=True in the previous discussion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2019-03-19 Thread Cheryl Sabella


Cheryl Sabella  added the comment:

@ebarry, any interest in converting your patch to a GitHub pull request?  
Thanks!

--
nosy: +cheryl.sabella
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-12 Thread Vedran Čačić

Vedran Čačić added the comment:

I think Guido's mistake is relevant here. It tripped me too. Too much 
negatives, and "prune" is not really well-known verb. Besides, we already have 
str.splitlines' keepends, which works the opposite way.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-12 Thread Guido van Rossum

Guido van Rossum added the comment:

> except the other way around

Whoops. Indeed. So all's well here.

> x.split(tuple(string.whitespace))

Yes, that's what I was after. (But it can be a separate PR.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-12 Thread Emanuel Barry

Emanuel Barry added the comment:

Barry: Sure, the docs example was just a quick write-up, you can word it 
however you want!

Guido: Pretty much, except the other way around (when prune is False, i.e. 
"don't remove empty strings").

The attached patch exposes the behaviour (it's identical to last night's, but 
I'm re-uploading it as an unrelated file went in), except that the `prune` 
argument isn't keyword-only (I didn't know how to do this, and didn't bother 
searching for just a proof-of-concept).

--
Added file: http://bugs.python.org/file45863/split_prune_1.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-12 Thread Emanuel Barry

Changes by Emanuel Barry :


Removed file: http://bugs.python.org/file45853/split_prune_1.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-12 Thread Barry A. Warsaw

Barry A. Warsaw added the comment:

On Dec 12, 2016, at 04:16 PM, Guido van Rossum wrote:

>So the proposal would be: prune=False -> empty strings stay, prune=True,
>empty strings are dropped, prune=None (default) use True if sep is None,
>False otherwise. Right?

Yep!

>Some end cases:
>
>- ''.split(None, prune=True) -> ['']
>- 'x  x'.split(None, prune=True) -> ['x', '', 'x']
>
>Right?

Isn't that what you'd expect if prune=False instead?  (i.e. prune=True always
drops empty strings from the results)

>While we're here I wish there was a specific argument we could translate
>.split(None) into, e.g. x.split() == x.split((' ', '\t', '\n', '\r', '\f')) #
>or whatever set of strings

Is that the sep= idea that @syeberman suggested earlier?  If so,
then you could do:

>>> x.split(tuple(string.whitespace))

Would that suffice?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-12 Thread Guido van Rossum

Guido van Rossum added the comment:

I like the proposal. I agree that filter(None, ...) is not discoverable (and 
has its own magic).

So the proposal would be: prune=False -> empty strings stay, prune=True, empty 
strings are dropped, prune=None (default) use True if sep is None, False 
otherwise. Right?

Some end cases:

- ''.split(None, prune=True) -> ['']
- 'x  x'.split(None, prune=True) -> ['x', '', 'x']

Right?

While we're here I wish there was a specific argument we could translate 
.split(None) into, e.g. x.split() == x.split((' ', '\t', '\n', '\r', '\f')) # 
or whatever set of strings

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-12 Thread Barry A. Warsaw

Barry A. Warsaw added the comment:

I really appreciate all the feedback.  Here are some thoughts.

I'm well aware of the filter(), re, and other options, and certainly those can
be made to work, but they're non-obvious.  The reason I suggested an
enhancement to str.split() is because I've seen the replace().split() being used
far too often, and what I think is happening is that people take the most
natural path to accomplish their goals: they know they just want to do a
simple string split on a token (usually one character) so they start out with
the obvious str.split(',') or whatever.  Then they notice that it doesn't work
consistent with their mental model in some corner cases.

The next common step isn't from there to filter() or re.  The former isn't a
well-known API and the latter is viewed as "too complex".  Their next mental
step is "oh, so providing a sep has different behavior that I don't want, so
I'll just replace the comma with a space and now don't have to provide sep".
And now str.split() does what they want.  Done.  Move along.

I do wish the str.split() API was consistent w.r.t. to sep=None, but it's what
we have and is a very well known API.

@rhettinger: I'm of mixed opinion on it too!  I really wanted to get this in
the tracker and see if we could come up with something better, but so far I
still like `prune` the best.

@ebarry: Thanks for the draft docs, but that's not how I think about this.
I'd be utilitarian and get right to the point, e.g.:

"""
The value of `prune` controls whether empty strings are removed from the
resulting list.  The default value (None) says to use the default behavior,
which for backward compatibility reasons is different whether sep is None or
not (see above).  Regardless of the value of sep, when prune is True empty
strings are removed and when prune is False they are not.
"""

So @mrabarnett, +1 on the suggested defaults.

Lastly, as for Guido's admonition against boolean arguments, I would make
prune a keyword-only argument, so that forces the code to be readable and
should alleviate those concerns.  The trade-off is the extra typing, but
that's actually besides the point.  The win here is that the solution is
easily discoverable and avoids the intermediate string object.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Guido, do you have an option on this?  IIRC, this was an API you created.

Nick's thought (posted on twitter) is that 'filter(None, sep.split(input)' 
already covers the "drop the empty values" case. 

My feelings are mixed.  Though I've never needed in practice, it would be nice 
if the whitespace removal algorithm could be customized to just a space or just 
a tab.   On the other hand, I think the new parameter would make the API more 
confusing and harder to learn.  It might be better to just document either the 
filter(None) approach or a simple regex for the less common cases.

--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Vedran Čačić

Vedran Čačić added the comment:

The problem with .split is its split (pun intended) personality: it really is 
two functions that have separate use cases, and have different algorithms; and 
people think about them as separate functions. In that view, we have just 
fallen afoul of Guido's rule of no literal passing bool arguments. The true 
solution would probably be to bite the bullet and have two separate methods. 
After all, .splitlines is a separate method for precisely such a reason.

--
nosy: +veky

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Emanuel Barry

Emanuel Barry added the comment:

Yes, I agree that being able to pass in a tuple would be really useful. As far 
as rolling out a custom function goes, I'd sooner reach for re.split than do 
that, so I don't really have a strong argument for either side. Feel free to 
play with the patch or make an entirely new one, though! I mainly submitted the 
patch to keep the discussion going, and eventually come to a concensus, but I 
don't have a strong opinion either way :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Sye van der Veen

Sye van der Veen added the comment:

In the sep!=None case, there are existing alternatives to prune=True that 
aren't many more keystrokes:

>>> ''.split(' ', prune=True)
[]
>>> [x for x in ''.split(' ') if x]
[]
>>> list(filter(bool, ''.split(' '))) # or drop list() and use the iterator 
>>> directly
[]

This becomes even fewer keystrokes for users that create a prune() or 
split_prune() function.

For the sep==None case, I agree there are no alternatives to prune=False (aside 
from rolling your own split function).  However, instead of prune, what if sep 
accepted a tuple of strings, similar to startswith.  In this case, each string 
would be considered one possible, yet distinct, delimiter:

>> ''.split(prune=False)
['']
>> ''.split((' ', '\t')) # typical whitespace
['']
>> ''.split(tuple(string.whitespace)) # ASCII whitespace
['']

Once again, this becomes even easier for users that create a split_no_prune() 
function, or that assign tuple(string.whitespace) to a variable.  It would also 
nicely handle strings with non-homogeneous delimiters:

>>> '1?2,,3;'.split((',', ';', '?'))
['1', '2', '', '3', '']

I personally find the 0-argument str.split() one of the great joys of Python.  
It's common to have to split out words from a sentence, and having that 
functionality just 8 characters away at all times has been very useful.

--
nosy: +syeberman

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Emanuel Barry

Emanuel Barry added the comment:

Here's an initial patch. It works exactly as discussed earlier, doesn't break 
any tests, and retains full backwards compatibility. No doc changes (except for 
the docstrings of str.[r]split) and no tests, as this is just a preliminary 
patch to see if there's any merit to the idea.

--
keywords: +patch
stage:  -> test needed
Added file: http://bugs.python.org/file45853/split_prune_1.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Emanuel Barry

Emanuel Barry added the comment:

Matthew: Yes, that's exactly the way I was going about it.

Thank you Raymond for your comments (and your informative answer on that SO 
question).

I think that part of the problem is that no delimiter (or None) behaves 
differently than with a delimiter. If we wanted proper consistency, we would 
have needed to make passing None (or nothing) the same as passing whitespace, 
but alas, we have to work with what we have.

As you said, API complexity is a concern that needs to be addressed. I think 
that the most important part is how it's documented, and, if phrased correctly 
(which is non-trivial), could actually simplify the explanation.

Consider this draft:

***

The value of the `prune` keyword argument determines whether or not consecutive 
delimiters should be grouped together. If `prune` is not given or None, it 
defaults to True if sep is None or not given, and False otherwise.

If `prune` is True, consecutive delimiters (all whitespace if None or not 
given) are regarded as a single separator, and the result will not contain any 
empty string. The resulting list may be empty.

Otherwise, if `prune` is False, consecutive delimiters are not grouped 
together, and the result may contain one or more empty string. The resulting 
list will never be empty.

***

I may be oversimplifying this, but it seems to me that this might help some 
people by hopefully streamlining the explanation.

This still doesn't solve the issue where a user can't say "split on a space or 
a tab, but not newlines", which IMO is lacking in the design, but that may be 
for another issue ;)

I've wrapped up a basic patch which probably doesn't work at all; I'll put it 
up when it's at least partly working for everyone to look at.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Raymond Hettinger

Raymond Hettinger added the comment:

A few randomly ordered thoughts about splitting:

* The best general purpose text splitter I've ever seen is in MS Excel and is 
called "Text to Columns".  It has a boolean flag, "treat consecutive delimiters 
as one" which is off by default.

* There is a nice discussion on the complexities of the current design on 
StackOverflow:  http://stackoverflow.com/questions/16645083  In addition, there 
are many other SO questions about the behavior of str.split().

* The learning curve for str.split() is already high.  The doc entry for it has 
been revised many times to try and explain what it does.  I'm concerned that 
adding another algorithmic option to it may make it more difficult to learn and 
use in the common cases (API design principle:  giving users more options can 
impair usability).  Usually in Python courses, I recommend using str.split() 
for the simple, common cases and using regex when you need more control.

* What I do like about the proposal is that that there is no clean way to take 
the default whitespace splitting algorithm and customize to a particular subset 
of whitespace (i.e. tabs only).

* A tangential issue is that it was a mistake to expose the maxsplit=-1 
implementation detail.   In Python 2.7, the help was "S.split([sep 
[,maxsplit]])".  But folks implementing the argument clinic have no way of 
coping with optional arguments that don't have a default value (like dict.pop), 
so they changed the API so that the implementation detail was exposed, 
"S.split(sep=None, maxsplit=-1)".   IMO, this is an API regression.  We really 
don't want people passing in -1 to indicate that there are no limits.  The 
Python way would have been to use None as a default or to stick with the 
existing API where the number of arguments supplied is part of the API (much 
like type() has two different meanings depending on whether it has an arity of 
1 or 3).

Overall, I'm +0 on the proposal but there should be good consideration given to 
1) whether there is a sufficient need to warrant increasing API complexity, 
making split() more difficult to learn and remember, 2) considering whether 
"prune" is the right word (can someone who didn't write the code read it 
clearly afterwards), 3) or addressing this through documentation (i.e. showing 
the simple regexes needed for cases not covered by str.split).

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Matthew Barnett

Matthew Barnett added the comment:

So prune would default to None?

None means current behaviour (prune if sep is None else don't prune)
True means prune empty strings
False means don't prune empty string

--
nosy: +mrabarnett

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Emanuel Barry

Emanuel Barry added the comment:

Actually, there might be a way. We could make prune default to True if sep is 
None, and default to False if sep is not None. That way, we get to keep the 
existing behaviour for either case, while satisfying both of our use cases :)

If that's a bad idea (and it quite probably is), I'll retract it. But it's an 
interesting possibility to at least consider.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Barry A. Warsaw

Barry A. Warsaw added the comment:

On Dec 11, 2016, at 03:57 PM, Serhiy Storchaka wrote:

>I meant adding boolean argument that changes the behavior when sep is None,
>not when it is not None.

Ah, I understand now, thanks.  However, I'm not sure that addresses my
particular use case.  It's actually kind of handy to filter out the empty
strings.  But I'm open to counter arguments.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Emanuel Barry

Emanuel Barry added the comment:

That would work for my case, but it wouldn't for Barry's (unless I missed 
something). He wants a non-None argument to not leave empty strings, but I want 
a None argument to leave empty strings... I don't think there's a 
one-size-fits-all solution in this case, but feel free to prove me wrong :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I meant adding boolean argument that changes the behavior when sep is None, not 
when it is not None.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Emanuel Barry

Emanuel Barry added the comment:

Changing the behaviour when sep is None is a big backwards-compatibility break, 
and I'm not sure we'd even want that. It's logical to allow passing None to 
mean the same thing as NULL (i.e. no arguments), and the behaviour in that case 
has been like that for... well, long enough that changing it isn't really 
feasible.

I agree with Barry here, especially since this is a completely opt-in feature, 
and existing behaviour isn't changed without the user's knowledge.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Barry A. Warsaw

Barry A. Warsaw added the comment:

On Dec 11, 2016, at 03:32 PM, Serhiy Storchaka wrote:

>Current behavior is consistent with str.count():
>
>len(string.split(sep)) == string.count(sep) + 1
>
>and re.split():
>
>re.split(re.escape(sep), string) == string.split(sep)

Yep.  My suggestion is a straight up 'practicality beats purity' request.

>May be the behavior when sep is None should be changed for consistency with
>the behavior when sep is not None?

I'm very strongly -1 on changing any existing behavior.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Current behavior is consistent with str.count():

len(string.split(sep)) == string.count(sep) + 1

and re.split():

re.split(re.escape(sep), string) == string.split(sep)

May be the behavior when sep is None should be changed for consistency with the 
behavior when sep is not None?

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Emanuel Barry

Emanuel Barry added the comment:

I understand the feeling. However, in a project I maintain, we want the other 
way around - to be able to never have an empty list, even if the string is 
empty (we resorted to using re.split in the end, which has this behaviour). 
Consider:

rest = re.split(" +", rest)[0].strip()

This gives us None-like behaviour in splitting, at the cost of not actually 
using str.split.

I'm +1 on the idea, but I'd like some way to better generalize str.split use 
(not everyone knows you can pass None and/or an integer).

(At the same time, the counter arguments where str has too many methods, or 
that methods shouldn't do too much, also apply here.)

But I don't like bikeshedding too much, so let's just count me as +1 for your 
way, if there's no strong momentum for mine :)

--
nosy: +ebarry
type:  -> enhancement

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28937] str.split(): remove empty strings when sep is not None

2016-12-11 Thread Barry A. Warsaw

New submission from Barry A. Warsaw:

This has finally bugged me enough to file an issue, although I wouldn't be able 
to use it until Python 3.7.  There's a subtle but documented difference in 
str.split() when sep=None:

>>> help(''.split)
Help on built-in function split:

split(...) method of builtins.str instance
S.split(sep=None, maxsplit=-1) -> list of strings

Return a list of the words in S, using sep as the
delimiter string.  If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.

I.e., that empty strings are removed from the result.  This does not happen 
when sep is given, leading to this type of unfortunate code:

>>> 'foo,bar,baz'.split(',')
['foo', 'bar', 'baz']
>>> 'foo,bar,baz'.replace(',', ' ').split()
['foo', 'bar', 'baz']
>>> ''.split(',')
['']
>>> ''.replace(',', ' ').split()
[]

Specifically, code that wants to split on say commas, but has to handle the 
case where the source string is empty, shouldn't have to also filter out the 
single empty string item.

Obviously we can't change existing behavior, so I propose to add a keyword 
argument `prune` that would make these two bits of code identical:

>>> ''.split()
[]
>>> ''.split(' ', prune=True)
[]

and would handle the case of ''.split(',') without having to resort to creating 
an ephemeral intermediate string.

`prune` should be a keyword-only argument, defaulting to False.

--
components: Library (Lib)
messages: 282923
nosy: barry
priority: normal
severity: normal
status: open
title: str.split(): remove empty strings when sep is not None
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com