On Sat, Nov 12, 2016 at 05:01:00PM +0000, Gary Godfrey wrote:

> I do a fair amount of work with pandas and data munging.  This means that
> I'm often doing things like:
> 
> mydf = df[ ['field1', 'field2', 'field3' ] ]
> 
> This is a little ugly, so if the list is long enough, I do:
> 
> mydf=df[ 'field1 field2 field3'.split() ]

I consider the need for that to indicate a possibly poor design of 
pandas. Unless there is a good reason not to, I believe that any 
function that requires a list of strings should also accept a single 
space-delimited string instead. Especially if the strings are intended 
as names or labels. So that:

func(['fe', 'fi', 'fo', 'fum']) 

and 

func('fe fi fo fum') 

should be treated the same way.

Of course, it may be that pandas has a good reason for not supporting 
that. But in general, we don't change the language to make up for 
deficiencies in third-party library functionality.


> This is a little more readable, but still a bit ugly.

I don't agree that its ugly. I think that 'fe fi fo fum'.split() is 
nicely explicit about what it is doing. It's also a candidate for 
compile-time optimization since the argument is a literal.


> What I'm proposing here is:
> 
> mydf = df[ w'field1 field2 field3' ]
> 
> This would be identical in all ways (compile-time) to:
> 
> mydf = df[ ('field1', 'field2', 'field3') ]

Are your field names usually constants known when you write the script?

I would have thought they'd more often be variables that you read from 
your data.


> This should work with all the python quote variations (w''', w""", etc).
> The only internal escapes are \\ indicating a \ and <backslash><space>
> indicating a non-splitting space:

So not only do we have to learn yet another special kind of string:

- unicode strings
- byte strings
- raw strings (either unicode or bytes)
- f-strings
- and now w-strings

but this one has different escaping rules from the others.

I expect that there will be a huge number of confused questions about 
why people cannot use standard escapes in their "word" strings.



> songs = w'My\ Bloody\ Valentine Blue\ Suede\ Shoes'

I think that escaping spaces like that will be an attractive nuisance. I 
had to read your example three times before I noticed that the space 
between Valentine and Blue was not escaped.

I would prefer a simple, straight-forward rule: it unconditionally 
splits on whitespace. If you need to include non-splitting spaces, use a 
proper non-breaking space \u00A0, or split the words into a tuple by 
hand, like you're doing now. I don't think it is worth complicating the 
feature to support non-splitting spaces.

(Hmmm... I see that str.split() currently splits on non-breaking spaces. 
That feels wrong to me: although the NBSP character is considered 
whitespace, its whole purpose is to avoid splitting.)


> Other Languages:
> 
> perl has the qw operator:
> 
> @a = qw(field1 field2 field3);
> 
> ruby has %w
> 
> a=%w{field1 field2}

The fact that other languages do something like this is a (weak) point 
in its favour. But I see that there are a few questions on Stackoverflow asking 
what %w 
means, how it is different from %W, etc. For example:

http://stackoverflow.com/questions/1274675/what-does-warray-mean

http://stackoverflow.com/questions/690794/ruby-arrays-w-vs-w

and I notice this comment from the second link:

    "%w" is my usual retort to people who get a little too cocky about 
    the readability of Ruby. Works every time.


That's a point against this proposal: the feature seems to be a bit 
puzzling to users in languages that implement it (at least Ruby).

I'm rather luke-warm on this proposal, although I might be convinced to 
support it if:

- w'...' unconditionally split on any whitespace (possibly 
  excluding NBSP);

- and normal escapes worked.

Even then I'm not really convinced this needs to be a language feature.


-- 
Steve
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to