On Sat, Nov 12, 2016 at 05:01:00PM +0000, Gary Godfrey wrote:
> I do a fair amount of work with pandas and data munging. This means that
> I'm often doing things like:
>
> mydf = df[ ['field1', 'field2', 'field3' ] ]
>
> This is a little ugly, so if the list is long enough, I do:
>
> mydf=df[ 'field1 field2 field3'.split() ]
I consider the need for that to indicate a possibly poor design of
pandas. Unless there is a good reason not to, I believe that any
function that requires a list of strings should also accept a single
space-delimited string instead. Especially if the strings are intended
as names or labels. So that:
func(['fe', 'fi', 'fo', 'fum'])
and
func('fe fi fo fum')
should be treated the same way.
Of course, it may be that pandas has a good reason for not supporting
that. But in general, we don't change the language to make up for
deficiencies in third-party library functionality.
> This is a little more readable, but still a bit ugly.
I don't agree that its ugly. I think that 'fe fi fo fum'.split() is
nicely explicit about what it is doing. It's also a candidate for
compile-time optimization since the argument is a literal.
> What I'm proposing here is:
>
> mydf = df[ w'field1 field2 field3' ]
>
> This would be identical in all ways (compile-time) to:
>
> mydf = df[ ('field1', 'field2', 'field3') ]
Are your field names usually constants known when you write the script?
I would have thought they'd more often be variables that you read from
your data.
> This should work with all the python quote variations (w''', w""", etc).
> The only internal escapes are \\ indicating a \ and <backslash><space>
> indicating a non-splitting space:
So not only do we have to learn yet another special kind of string:
- unicode strings
- byte strings
- raw strings (either unicode or bytes)
- f-strings
- and now w-strings
but this one has different escaping rules from the others.
I expect that there will be a huge number of confused questions about
why people cannot use standard escapes in their "word" strings.
> songs = w'My\ Bloody\ Valentine Blue\ Suede\ Shoes'
I think that escaping spaces like that will be an attractive nuisance. I
had to read your example three times before I noticed that the space
between Valentine and Blue was not escaped.
I would prefer a simple, straight-forward rule: it unconditionally
splits on whitespace. If you need to include non-splitting spaces, use a
proper non-breaking space \u00A0, or split the words into a tuple by
hand, like you're doing now. I don't think it is worth complicating the
feature to support non-splitting spaces.
(Hmmm... I see that str.split() currently splits on non-breaking spaces.
That feels wrong to me: although the NBSP character is considered
whitespace, its whole purpose is to avoid splitting.)
> Other Languages:
>
> perl has the qw operator:
>
> @a = qw(field1 field2 field3);
>
> ruby has %w
>
> a=%w{field1 field2}
The fact that other languages do something like this is a (weak) point
in its favour. But I see that there are a few questions on Stackoverflow asking
what %w
means, how it is different from %W, etc. For example:
http://stackoverflow.com/questions/1274675/what-does-warray-mean
http://stackoverflow.com/questions/690794/ruby-arrays-w-vs-w
and I notice this comment from the second link:
"%w" is my usual retort to people who get a little too cocky about
the readability of Ruby. Works every time.
That's a point against this proposal: the feature seems to be a bit
puzzling to users in languages that implement it (at least Ruby).
I'm rather luke-warm on this proposal, although I might be convinced to
support it if:
- w'...' unconditionally split on any whitespace (possibly
excluding NBSP);
- and normal escapes worked.
Even then I'm not really convinced this needs to be a language feature.
--
Steve
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/