[Python-ideas] Re: Proposal: Complex comprehensions containing statements

Antoine Rozo Fri, 21 Feb 2020 00:35:22 -0800

Hi,

I think this syntax is very hard to read because the
yielding-expression can be anywhere in the block and there is nothing
to identify it.
Even in your examples I can't figure out which expression will be
used. What if I call a function somewhere in the block?
Can't you just use generators + list/set/dict constructors when you
need complex statements?


Regards

Le ven. 21 févr. 2020 à 08:58, Alex Hall <alex.moj...@gmail.com> a écrit :
>
> This is a proposal for a new syntax where a comprehension is written as the 
> appropriate brackets containing a loop which can contain arbitrary statements.
>
> Here are some simple examples. Instead of:
>
>     [
>         f(x)
>         for y in z
>         for x in y
>         if g(x)
>     ]
>
> one may write:
>
>     [
>         for y in z:
>             for x in y:
>                 if g(x):
>                     f(x)
>     ]
>
> Instead of:
>
>     lst = []
>     for x in y:
>         if cond(x):
>             break
>         z = f(x)
>         lst.append(z * 2)
>
> one may write:
>
>     lst = [
>         for x in y:
>             if cond(x):
>                 break
>             z = f(x)
>             yield z * 2
>     ]
>
> Instead of:
>
>     [
>         {k: v for k, v in foo}
>         for foo in bar
>     ]
>
> one may write:
>
>     [
>         for foo in bar:
>             {for k, v in foo: k: v}
>     ]
>
> ## Specification
>
> A list/set/dict comprehension or generator expression is written as the 
> appropriate brackets containing a `for` or `while` loop.
>
> In the general case some expressions have `yield` in front and they become 
> the values of the comprehension, like a generator function.
>
> If the comprehension contains exactly one expression statement at any level 
> of nesting, i.e. if there is only one place where a `yield` can be placed at 
> the start of a statement, then `yield` is not required and the expression is 
> implicitly yielded. In particular this means that any existing comprehension 
> translated into the new style doesn't require `yield`.
>
> If the comprehension doesn't contain exactly one expression statement and 
> doesn't contain a `yield`, it's a SyntaxError.
>
> ### Dictionary comprehensions
>
> For dictionary comprehensions, a `key: value` pair is allowed as its own 
> pseudo-statement or in a yield. It's not a real expression and cannot appear 
> inside other expressions.
>
> This can potentially be confused with variable type annotations with no 
> assigned value, e.g. `x: int`. But we can essentially apply the same rule as 
> other comprehensions: either use `yield`, or only have one place where a 
> `yield` could be added in front of a statement. So if there is only one pair 
> `x: y` we try to implicitly yield that. The only way this could be 
> misinterpreted is if a user declared the type of exactly one expression and 
> completely forgot to give their comprehension elements, and the program would 
> almost certainly fail spectacularly.
>
> ### Whitespace
>
> If placing the loop on a single line would be valid syntax outside a 
> comprehension (i.e. it just contains a simple statement) then we call this an 
> *inline* comprehension. It can be inserted in the same line(s) as other code 
> and formatted however the writer likes - there are no concerns about 
> whitespace.
>
> For a more complex comprehension, the loop must start and end with a newline, 
> i.e. the lines containing the loop cannot contain any tokens from outside, 
> including the enclosing brackets. For example, this is allowed:
>
>     foo = [
>         for x in y:
>             if x > 0:
>                 f(x)
>     ]
>
> but this is not:
>
>     foo = [for x in y:
>                if x > 0:
>                    f(x)]
>
> This ensures that code is readable even at a quick glance. The eyes can 
> quickly find where the loop starts and distinguish the embedded statements 
> from the rest of the enclosing expression.
>
> Furthermore, it's easy to copy paste entire lines to move them around, 
> whereas refactoring the invalid example above without specific tools would be 
> annoying and error-prone. It also makes it easy to adjust code outside the 
> comprehension (e.g. rename `foo` to something longer) without messing up 
> indentation and alignment.
>
> Inside the loop, the rules for indentation and such are the same as anywhere 
> else. The syntax of the loop is valid only if it's also valid as a normal 
> loop outside any expression. The body of the loop must be more indented than 
> the for/while keyword that starts the loop.
>
> ### Variable scope
>
> Since comprehensions look like normal loops they should maybe behave like 
> them again, including executing in the same scope and 'leaking' the iteration 
> variable(s). Assignments via the walrus operator already affect the outer 
> scope, only the iteration variable currently behaves differently. My 
> understanding is that this is influenced by the fact that there is little 
> reason to use the value of the iteration variable after a list comprehension 
> completes since it will always be the last value in the iterable. But since 
> the new syntax allows `break`, the value may become useful again.
>
> I don't know what the right approach is here and I imagine it can generate 
> plenty of debate. Given that this whole proposal is already controversial and 
> likely to be rejected this may not be the best place to start discussion. But 
> maybe it is, I don't know.
>
> ## Benefits/comparison to current methods
>
> ### Uniform syntax
>
> The new comprehensions just look like normal loops in brackets, or generator 
> functions. This should make them easier for beginners to learn than the old 
> comprehensions.
>
> A particular concept that's easier to learn is comprehensions that contain 
> multiple loops. Consider this comprehension over a nested list:
>
>     [
>         f(cell)
>         for row in matrix
>         for cell in row
>     ]
>
> For beginners this can easily be confusing, [and sometimes for experienced 
> coders 
> too](https://mail.python.org/archives/list/python-ideas@python.org/message/BX7LWUS57M52EPJMIR6A3SDQYSN7UCEX/
> ). Yes there's a rule that one can learn, but putting it in reverse also 
> seems logical, perhaps even more so:
>
>     [
>         f(cell)
>         for cell in row
>         for row in matrix
>     ]
>
> Now the comprehension is 'consistently backwards', it reads more like 
> English, and the usage of `cell` is right next to its definition. But of 
> course that order is wrong...unless we want a nested list comprehension that 
> produces a new nested list:
>
>     [
>         [
>             f(cell)
>             for cell in row
>         ]
>         for row in matrix
>     ]
>
> Again, it's not hard for an experienced coder to understand this, but for a 
> beginner grappling with new concepts this is not great. Now consider how the 
> same two comprehensions would be written in the new syntax:
>
>     [
>         for row in matrix:
>             for cell in row:
>                 f(cell)
>     ]
>
>     [
>         for row in matrix:
>             [
>                 for cell in row:
>                     f(cell)
>             ]
>     ]
>
> ### Power and flexibility
>
> Comprehensions are great and I love using them. I want to be able to use them 
> more often. I know I can solve any problem with a loop, but it's obvious that 
> comprehensions are much nicer or we wouldn't need to have them at all. 
> Compare this code:
>
>     new_matrix = []
>     for row in matrix:
>         new_row = []
>         for cell in row:
>             try:
>                 new_row.append(f(cell))
>             except ValueError:
>                 new_row.append(0)
>         new_matrix.append(new_row)
>
> with the solution using the new syntax:
>
>     new_matrix = [
>         for row in matrix: [
>             for cell in row:
>                 try:
>                     yield f(cell)
>                 except ValueError:
>                     yield 0
>         ]
>     ]
>
> It's immediately visually obvious that it's building a new nested list, 
> there's much less syntax for me to parse, and the variable `new_row` has gone 
> from appearing 4 times to 0!
>
> There have been many requests to add some special syntax to comprehensions to 
> make them a bit more powerful:
>
> - [Is this PEP-able? "with" statement inside genexps / list 
> comprehensions](https://mail.python.org/archives/list/python-ideas@python.org/thread/BUD46OEPBN6YW43HPPEG3P3IFDOG6KMV/#O3U3V4Q4I2GOGVFCFH67TZ355WE7XKTD)
> - [Allowing breaks in generator expressions by overloading the while 
> keyword](https://mail.python.org/archives/list/python-ideas@python.org/thread/6PEOE5ZXHQHAINEPQ7PTKSWYFW5OFMPQ/#ETB6ISNSB4KWQQYNMTRVJMZF4AWYCXV5)
> - [while conditional in list comprehension 
> ??](https://mail.python.org/archives/list/python-ideas@python.org/thread/RYBBHV3YBBEIBUZPZ4WNQGKI76VSBWI5/#A36BJCUAGUBZA7FIQ3LN6UMZUYCL2LJG)
>
> This would solve all such problems neatly.
>
> ### No trying to fit things in a single expression
>
> The current syntax can only contain one expression in the body. This 
> restriction makes it difficult to solve certain problems elegantly and 
> creates an uncomfortable grey area where it's hard to decide between 
> squeezing maybe a bit too much into an expression or doing things 'manually'. 
> This can lead to analysis paralysis and disagreements between coders and 
> reviewers. For example, which of the following is the best?
>
>     clean = [
>         line.strip()
>         for line in lines
>         if line.strip()
>     ]
>
>     stripped = [line.strip() for line in lines]
>     clean = [line for line in stripped if line]
>
>     clean = list(filter(None, map(str.strip, lines)))
>
>     clean = []
>     for line in lines:
>         line = line.strip()
>         if line:
>             clean.append(line)
>
>     def clean_lines():
>         for line in lines:
>             line = line.strip()
>             if line:
>                 yield line
>
>     clean = list(clean_lines())
>
> You probably have a favourite, but it's very subjective and this kind of 
> problem requires judgement depending on the situation. For example, I'd 
> choose the first version in this case, but a different version if I had to 
> worry about duplicating something more complex or expensive than `.strip()`. 
> And again, there's an awkward sweet spot where it's hard to decide whether I 
> care enough about the duplication.
>
> What about assignment expressions? We could do this:
>
>     clean = [
>         stripped
>         for line in lines
>         if (stripped := line.strip())
>     ]
>
> Like the nested loops, this is tricky to parse without experience. The 
> execution order can be confusing and the variable is used away from where 
> it's defined. Even if you like it, there are clearly many who don't. I think 
> the fact that assignment expressions were a desired feature despite being so 
> controversial is a symptom of this problem. It's the kind of thing that 
> happens when we're stuck with the limitations of a single expression.
>
> The solution with the new syntax is:
>
>     clean = [
>         for line in lines:
>             stripped = line.strip()
>             if stripped:
>                 stripped
>     ]
>
> or if you'd like to use an assignment expression:
>
>     clean = [
>         for line in lines:
>             if stripped := line.strip():
>                 stripped
>     ]
>
> I think both of these look great and are easily better than any of the other 
> options. And I think it would be the clear winner in any similar situation - 
> no careful judgement needed. This would become the one (and only one) obvious 
> way to do it. The new syntax has the elegance of list comprehensions and the 
> flexibility of multiple statements. It's completely scalable and works 
> equally well from the simplest comprehension to big complicated constructions.
>
> ### Easy to change
>
> I hate when I've already written a list comprehension but a new requirement 
> forces me to change it to, say, the `.append` version. It's a tedious 
> refactoring involving brackets, colons, indentation, and moving things 
> around. It also leaves me with a very unhelpful `git diff`. With the new 
> syntax I can easily add logic as I please and get a nice simple diff.
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/5UIXE23B26XPIQGPYNI575XN3NNX6JRR/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Antoine Rozo
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VUPTDIVHZ5TINEYBQCFCPYZNK2DFYBU3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Proposal: Complex comprehensions containing statements

Reply via email to