[Python-ideas] Re: Generalized deferred computation in Python
On Thu, 23 Jun 2022 at 11:35, Joao S. O. Bueno wrote: > > Martin Di Paola wrote: > > Three cases: Dask/PySpark, Django's ORM and selectq. All of them > > implement deferred expressions but all of them "compute" them in very > > specific ways (aka, they plan and execute the computation differently). > > > So - I've been hit with the "transparency execution of deferred code" dilemma > before. > > What happens is that: Python, at one point will have to "use" an object - and > that use > is through calling one of the dunder methods. Up to that time, like, just > writing the object name > in a no-operation line, does nothing. (unless the line is in a REPL, which > will then call the __repr__ > method in the object). Why are dunder methods special? Does being passed to some other function also do nothing? What about a non-dunder attribute? Especially, does being involved in an 'is' check count as using an object? dflt = fetch_cached_object("default") mine = later fetch_cached_object(user.keyword) ... if mine is dflt: ... # "using" mine? Or not? Does it make a difference whether the object has previously been poked in some other way? ChrisA ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HUJ36AA34SZU7D5Q4G6N5UFFKYUOGOFT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
On Wed, Jun 22, 2022 at 6:36 PM Joao S. O. Bueno wrote: > implement "all possible" > dunder methods, and proxy those to the underlying object, for a "future > type" that was > calculated off-process, and did not need any ".value()" or ".result()" > methods to be called. > Here's a package on PyPI that seems to do that: https://pypi.org/project/lazy-object-proxy/ It's written partly in C, so it may be fast. I haven't tested it. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CUEVSLDF5AAGN4MFIXU4TQ47Z35AXXAC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
> > On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote: > >But basically, think about `x = (later expensive1() + later expensive2()) > / > >later expensive3()`. How can we make `x` itself be a zero argument > >lambda? [... see below ...] > x = lambda: (expensive1() + expensive2()) / expensive3() What am I missing? I don't understand what you meant by saying that zero-argument lambdas are "isolated". It sounds like a way that you happen to think about lambdas, and not an objective property of lambdas. This proposal is like lambda on the definition end, but on the invocation end the call happens implicitly. In effect you have to explicitly mark everywhere that you *don't* want it to be called instead of everywhere that you *do* want it to be called. It isn't clear to me that that's better, much less enough better to justify changing the semantics of what I suppose is the single most common operation in Python ("getting a value"). On Wed, Jun 22, 2022 at 2:53 PM Martin Di Paola wrote: > # Using zero-argument nested lambdas > x = lambda: ((lambda: expensive1())() + (lambda: expensive2())()) / > (lambda: expensive3())() > Why not just expensive1() instead of (lambda: expensive1())()? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FXE5U2JBWZCBSHE5Z2DU3POMBW5K6JKM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] dataclass field argument to allow converting value on init
The idea is to have a `default_factory` like argument (either in the `field` function, or a new function entirely) that takes a function as an argument, and that function, with the value provided by `__init__`, is called and the return value is used as the value for the respective field. For example: ```py @dataclass class Foo: x: str = field(init_fn=chr) f = Foo(65) f.x # "A" ``` The `chr` function is called, given the value `65` and `x` is set to its return value of `"A"`. I understand that there is both `__init__` and `__post_init__` which can be used for this purpose, but sometimes it isn't ideal to override them. If you overrided `__init__`, and were using `__post_init__`, you would need to manually call it, and in my case, `__post_init__` is implemented on a base class, which all other classes inherit, and so overloading it would require re-implementing the logic from it (and that's ignoring the fact that you also need to type the field with `InitVar` to even have it passed to `__post_init__` in the first place). I've created a proof of concept, shown below: ```py def initfn(fn, default=None): class Inner: def __set_name__(_, owner_cls, owner_name): old_setattr = getattr(owner_cls, "__setattr__") def __setattr__(self, attr_name, value): if attr_name == owner_name: # Bypass `__setattr__` self.__dict__[attr_name] = fac(value) else: old_setattr(self, attr_name, value) setattr(owner_cls, "__setattr__", __setattr__) def fac(value): if isinstance(value, Inner): return default return fn(value) return field(default=Inner()) ``` It makes use of the fact that providing `default` as an argument to `field` means it checks the value for a `__set_name__` function, and calls it with the class and field name as arguments. Overriding `__setattr__` is just used to catch when a value is being assigned to a field, and if that field's name matches the name given to `__set_name__`, it calls the function on the value, at sets the field to that instead. It can be used like so: ```py @dataclass class Foo: x: str = initfn(fn=chr, default="Z") f = Foo(65) f2 = Foo() f.x # "A" f2.x # "Z" ``` It adds a little overhead, especially with having to override `__setattr__` however, I believe it would have very little overhead if directly implemented in the dataclass library. Even in the case of being able to override one of the init functions, I still think it would be nice to have as a quality of life feature as I feel calling a function is too simple to want to override the functions, if that makes sense. Thanks. Dexter ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4SM5EVP6MMGGHQMZSJXBML74PWWDHEWV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.
> No need to have an object there - you could just define it as a syntactic > construct instead. Assignment targets aren't themselves objects (although the > same syntax can often be used on the RHS, when it would resolve to one). Right. Thanks. That _should_ have been obvious. :) > Having a way to say "allow additional elements without iterating over them" > would be useful, but creating a new way to spell the non-assignment wouldn't > be of sufficiently great value to justify the syntax IMO. I mostly agree. I included that option for completeness. It would still have the benefit of avoiding the memory usage of creating a list and keeping references to the items until the list itself can be collected. Come to think of it, can (or could) Python already optimize that using current syntax, noticing that the variable assigned to is never used after it is "assigned" to? If that optimization were implemented (I presume it is not implemented now) then there is actually no point to this proposal at all except to allow "..." in final positions in the expression to the left of "=" and to have that mean to not iterate. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AHUOIVOS4GXHAI3AT7O5M2MI4BJJER24/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
Thanks Rob, I recognize that I have so-far skirted the order-of-precedence concern. I believe I have used parens in my example everywhere there might be a question... But that's not a general description or rule. I have a bunch of issues that I know I need to flesh out, many coming as suggestions in this thread, which I appreciate. I just wanted to provide something concrete to start the conversation. FWIW, there is a bunch more at the link now than in my initial paste. But I want to clarify more before I copy a new version into the email thread. I haven't used Twisted in a while, but it is certainly an important library, and I don't want to cause confusion. Any specific recommendation on language to use? On Wed, Jun 22, 2022, 8:45 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote: > Thank you for your proposal David. At last we have a counter-proposal to > talk about. A few points: > > (1) (As I pointed out in an earlier post) There is a flaw in using the > syntax of an expression PRECEDED by a SOFT keyword: > x =later-y > With your proposal, x is assigned a deferred-evaluation-object which will > be evaluated at some time later as "minus y", right? > Erm, no. This is already legal syntax for x being immediately assigned a > value of "later minus y". > If you put the soft keyword *after* the expression: > x = -y later > it may or may not read as well (subjective) but AFAICS would work. > Alternatively you could propose a hard keyword. Or a different syntax > altogether. > > (2) Delayed evaluation may be useful for many purposes. But for the > specific purpose of providing late-bound function argument defaults, having > to write the extra line ("n = n" in your example) removes much of the > appeal. Two lines of boilerplate (using a sentinel) replaced by one > obscure one plus one keyword is not much if any of a win, whereas PEP 671 > would remove the boilerplate altogether apart from one sigil. Under your > proposal, I for one would probably stick with the sentinel idiom which is > explicit. I think "n=n" is confusing to an inexperienced Python user. > You may not think this is important. My opinion is that late-bound > defaults are important. (We may have to agree to differ.) Apart from > anything else: Python fully supports early-bound defaults, why discriminate > against late-bound ones? > > (3) You talk about "deferred objects" and in one place you actually say > "Evaluate > the Deferred". A "deferred" is an important object but a different concept > in Twisted, I think calling it something else would be better to avoid > confusion. > > Best wishes > Rob Cliffe > > > On 21/06/2022 21:53, David Mertz, Ph.D. wrote: > > Here is a very rough draft of an idea I've floated often, but not with > much specification. Take this as "ideas" with little firm commitment to > details from me. PRs, or issues, or whatever, can go to > https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as > mentioning them in this thread. > > PEP: > Title: Generalized deferred computation > Author: David Mertz > Discussions-To: > https://mail.python.org/archives/list/python-ideas@python.org/thread/ > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 21-Jun-2022 > Python-Version: 3.12 > Post-History: > > Abstract > > > This PEP proposes introducing the soft keyword ``later`` to express the > concept > of deferred computation. When an expression is preceded by the keyword, > the > expression is not evaluated but rather creates a "thunk" or "deferred > object." > Reference to the deferred object later in program flow causes the > expression to > be executed at that point, and for both the value and type of the object to > become the result of the evaluated expression. > > > Motivation > == > > "Lazy" or "deferred" evaluation is useful paradigm for expressing > relationships > among potentially expensive operations prior their actual computation. > Many > functional programming languages, such as Haskell, build laziness into the > heart of their language. Within the Python ecosystem, the popular > scientific > library `dask-delayed `_ provides a framework for lazy > evaluation > that is very similar to that proposed in this PEP. > > .. _dask-delayed: >https://docs.dask.org/en/stable/delayed.html > > > Examples of Use > === > > While the use of deferred computation is principally useful when > computations > are likely to be expensive, the simple examples shown do not necessarily > use > such expecially spendy computations. Most of these are directly inspired > by > examples used in the documentation of dask-delayed. > > In dask-delayed, ``Delayed`` objects are create by functions, and > operations > create a *directed acyclic graph* rather than perform actual > computations. For > example:: > > >>> import dask > >>> @dask.delayed > ... def later(x): > ... return x > ... >
[Python-ideas] Re: Generalized deferred computation in Python
On Wed, Jun 22, 2022 at 11:22:05PM +0100, Paul Moore wrote: Hang on, did the PEP change? The version I saw didn't have a compute() method, deferred objects were just evaluated when they were referenced. You are right, the PEP does not mention a compute() method but uses the that term. I just used to make explicit when the evaluation takes place in the examples that I gave. My bad. There's a *huge* difference (in my opinion) between auto-executing deferred expressions, and a syntax for creating *objects* that can be asked to calculate their value. And yes, the latter is extremely close to being nothing more than "a shorter and more composable form of zero-arg lambda", so it needs to be justifiable in comparison to zero-arg lambda (which is why I'm more interested in the composability aspect, building an AST by combining delayed expressions into larger ones). Agree, the *huge* difference is what I tried to highlight because it is there where I see holes in the PEP. Building an AST as you mentioned could fill on of those holes but how they are iterated and evaluated is still missing. Of course, the exactly details will depend of the library that theoretically could use deferred expressions (like PySpark) but still I see non trivial details to fill. - what would be the API for the objects of the AST that represents the deferred expresion(s) ? - how the "evaluator" of the expressions would iterate over them? Do will the "evaluator" have to check that every of the expressions is meaningful for it? - does the AST simplifies the implementation of existing libs implementing deferred methods? - who is the "evaluator" in the case of expressions that don't share a common "implementation"? Allow me to expand on the last item: # some Dask code df = later dask_df.filter(...) s = later df.sum() # some selectq code d = later sQ.select("div") c = later d.count() # now, mix and compute! (s + c).compute() I can see how the deferred expressions are linked and how the AST is built but "who" knows how to execute it... I'm not sure. Will be Dask that will know how to plan, optimize and execute the sum() over the partitions of the dataframe, or will be selectq that knows how to build an xpath and talk with Selenium? May be will be the Python VM? May be the three? I know that those questions have an answer but I still fill that there are more unknowns (specially of why the PEP would be useful for some lib). Thanks, Martin. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZII7WHFRD23SVVAWUIVERFZJNABGJOLD/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
Martin Di Paola wrote: > Three cases: Dask/PySpark, Django's ORM and selectq. All of them > implement deferred expressions but all of them "compute" them in very > specific ways (aka, they plan and execute the computation differently). So - I've been hit with the "transparency execution of deferred code" dilemma before. What happens is that: Python, at one point will have to "use" an object - and that use is through calling one of the dunder methods. Up to that time, like, just writing the object name in a no-operation line, does nothing. (unless the line is in a REPL, which will then call the __repr__ method in the object). I have implemented a toy project far in the past that would implement "all possible" dunder methods, and proxy those to the underlying object, for a "future type" that was calculated off-process, and did not need any ".value()" or ".result()" methods to be called. Any such an object, that has slots for all dunder methods, any of which, when called, would trigger the resolve, could work today, without any modification, to implement the proposed behavior. And all that is needed to be possible to manipulate the object before the evaluation takes place, is a single reserved name, within the object namespace, that is not a proxy to evaluation. It could be special-cased within the object's class __getattribute__ itself: not even a new reserved dunder slot would be needed: that is: "getattr(myobj, "_is_deferred", False)" would not trigger the evaluation. (although a special slot for it in object would allow plain checking using "myobj.__is_deferred__" without the need to use getattr or hasattr) So, all that would be needed for such a feature would be keyword support to build this special proxy type. That said, the usefulness or not of this proposal can be better thought, as well, as, knowing that this "special attribute" mechanism can be used to add further inspection/modification mechanisms to the delayed objects. The act of "filling in all possible dunder methods" itself is quite hacky, but even if done in C, I don't think it could be avoided. Here is the code I referred to that implements the same proxy type that would be needed for this feature - (IRRC it is even pip installable): https://bitbucket.org/jsbueno/lelo/src/master/lelo/_lelo.py On Wed, Jun 22, 2022 at 11:46 AM Martin Di Paola wrote: > Hi David, I read the PEP and I think it would be useful to expand the > Motivation and Examples sections. > > While indeed Dask uses lazy evaluation to build a complex computation > without executing it, I don't think that it is the whole story. > > Dask takes this deferred complex computation and *plans* how to execute it > and then it *executes* it in non-obvious/direct ways. > > For example, the computation of the min() of a dataframe can be done > computing the min() of each partition of the dataframe and then > computing the min() of them. Here is where the plan and the execution > stages play. > > All of this is hidden from the developer. From his/her perspective the > min() is called once over the whole dataframe. > > Dask's deferred computations are "useless" without the > planning/execution plan. > > PySpark, like Dask, does exactly the same. > > But what about Django's ORM? Indeed Django allows you the build a SQL > query without executing it. You can then perform more subqueries, > joins and group by without executing them. > > Only when you need the real data the query is executed. > > This is another example of deferred execution similar to Dask/PySpark > however when we consider the planning/execution stages the similarities > ends there. > > Django's ORM writes a SQL query and send it to a SQL database. > > Another example of deferred execution would be my library to interact > with web pages programmatically: selectq. > > Very much like an ORM, you can select elements from a web page, perform > subselections and unions without really interacting with the web page. > > Only when you want to get the data from the page is when the deferred > computations are executed and like an ORM, the plan done by selectq is > to build a single xpath and then execute it using Selenium. > > So... > > Three cases: Dask/PySpark, Django's ORM and selectq. All of them > implement deferred expressions but all of them "compute" them in very > specific ways (aka, they plan and execute the computation differently). > > Would those libs (and probably others) do benefit from the PEP? How? > > Thanks, > Martin. > > On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote: > >Here is a very rough draft of an idea I've floated often, but not with > much > >specification. Take this as "ideas" with little firm commitment to > details > >from me. PRs, or issues, or whatever, can go to > >https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as > >mentioning them in this thread. > > > >PEP: > >Title: Generalized deferred computation > >Author: David Mertz >
[Python-ideas] Re: Generalized deferred computation in Python
On Thu, 23 Jun 2022 at 10:44, Rob Cliffe via Python-ideas wrote: > > Thank you for your proposal David. At last we have a counter-proposal to > talk about. A few points: > > (1) (As I pointed out in an earlier post) There is a flaw in using the syntax > of an expression PRECEDED by a SOFT keyword: > x =later-y > With your proposal, x is assigned a deferred-evaluation-object which will be > evaluated at some time later as "minus y", right? > Erm, no. This is already legal syntax for x being immediately assigned a > value of "later minus y". > If you put the soft keyword *after* the expression: > x = -y later > it may or may not read as well (subjective) but AFAICS would work. > Alternatively you could propose a hard keyword. Or a different syntax > altogether. Or just define that the soft keyword applies only if not followed by an operator. That way, "later -y" would be interpreted the same way it always has, and if you actually want a deferred of y's negation, you'd need to spell it some other way. Although I'm not entirely sure how, since the obvious choice, grouping parentheses, just makes it look like a function call instead, and "later 0-y" might not have the same semantics. ChrisA ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5L3AJ3B65ZXNEZWOWTWDTI36DCLETCDV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
Thank you for your proposal David. At last we have a counter-proposal to talk about. A few points: (1) (As I pointed out in an earlier post) There is a flaw in using the syntax of an expression PRECEDED by a SOFT keyword: x = later -y With your proposal, x is assigned a deferred-evaluation-object which will be evaluated at some time later as "minus y", right? Erm, no. This is already legal syntax for x being immediately assigned a value of "later minus y". If you put the soft keyword *after* the expression: x = -y later it may or may not read as well (subjective) but AFAICS would work. Alternatively you could propose a hard keyword. Or a different syntax altogether. (2) Delayed evaluation may be useful for many purposes. But for the specific purpose of providing late-bound function argument defaults, having to write the extra line ("n = n" in your example) removes much of the appeal. Two lines of boilerplate (using a sentinel) replaced by one obscure one plus one keyword is not much if any of a win, whereas PEP 671 would remove the boilerplate altogether apart from one sigil. Under your proposal, I for one would probably stick with the sentinel idiom which is explicit. I think "n=n" is confusing to an inexperienced Python user. You may not think this is important. My opinion is that late-bound defaults are important. (We may have to agree to differ.) Apart from anything else: Python fully supports early-bound defaults, why discriminate against late-bound ones? (3) You talk about "deferred objects" and in one place you actually say "Evaluate the Deferred". A "deferred" is an important object but a different concept in Twisted, I think calling it something else would be better to avoid confusion. Best wishes Rob Cliffe On 21/06/2022 21:53, David Mertz, Ph.D. wrote: Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to details from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as mentioning them in this thread. PEP: Title: Generalized deferred computation Author: David Mertz Discussions-To: https://mail.python.org/archives/list/python-ideas@python.org/thread/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 21-Jun-2022 Python-Version: 3.12 Post-History: Abstract This PEP proposes introducing the soft keyword ``later`` to express the concept of deferred computation. When an expression is preceded by the keyword, the expression is not evaluated but rather creates a "thunk" or "deferred object." Reference to the deferred object later in program flow causes the expression to be executed at that point, and for both the value and type of the object to become the result of the evaluated expression. Motivation == "Lazy" or "deferred" evaluation is useful paradigm for expressing relationships among potentially expensive operations prior their actual computation. Many functional programming languages, such as Haskell, build laziness into the heart of their language. Within the Python ecosystem, the popular scientific library `dask-delayed `_ provides a framework for lazy evaluation that is very similar to that proposed in this PEP. .. _dask-delayed: https://docs.dask.org/en/stable/delayed.html Examples of Use === While the use of deferred computation is principally useful when computations are likely to be expensive, the simple examples shown do not necessarily use such expecially spendy computations. Most of these are directly inspired by examples used in the documentation of dask-delayed. In dask-delayed, ``Delayed`` objects are create by functions, and operations create a *directed acyclic graph* rather than perform actual computations. For example:: >>> import dask >>> @dask.delayed ... def later(x): ... return x ... >>> output = [] >>> data = [23, 45, 62] >>> for x in data: ... x = later(x) ... a = x * 3 ... b = 2**x ... c = a + b ... output.append(c) ... >>> total = sum(output) >>> total Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') >>> total.compute() 4611721202807865734 >>> total.visualize() .. figure:: pep--dag.png :align: center :width: 50% :class: invert-in-dark-mode Figure 1. Dask DAG created from simple operations. Under this PEP, the soft keyword ``later`` would work in a similar manner to this dask.delayed code. But rather than requiring calling ``.compute()`` on a ``Delayed`` object to arrive at the result of a computation, every reference to a binding would perform the "compute" *unless* it was itself a deferred expression. So the equivalent code under this PEP would be:: >>> output = [] >>> data = [23, 45, 62] >>> for later x in data: ...
[Python-ideas] Re: Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.
On Thu, 23 Jun 2022 at 08:56, Steve Jorgensen wrote: > > This is based on previous discussions of possible ways of matching all > remaining items during destructuring but without iterating of remaining final > items. This is not exactly a direct replacement for that idea though, and > skipping iteration of final items might or might not be part of the goal. > > In this proposal, the ellipsis (...) can be used in the expression on the > left side of the equals sign in destructuring anywhere that `*` can > appear and has approximately the same meaning. The difference is that when > the ellipsis is used, the matched items are not stored in variables. This can > be useful when the matched data might be very large. > > ..., last_one = > a, ..., z = > first_one, ... = > > Additionally, when the ellipsis comes last and the data is being retrieved by > iterating, stop retrieving items since that might be expensive and we know > that we will not use them. > > > Alternative A: > > Still iterate over items when the ellipsis comes last (for side effects) but > introduce a new `final_elipsis` object that is used to stop iteration. The > negation of `ellipsis` (e.g. `-...`) could return `final_ellipsis` in that > case. > No need to have an object there - you could just define it as a syntactic construct instead. Assignment targets aren't themselves objects (although the same syntax can often be used on the RHS, when it would resolve to one). > Alternative B: > > Still iterate over items when the ellipsis comes last (for side effects) and > don't provide any new means of skipping iteration over final items. The > programmer can use islice to achieve that. > This is exactly equivalent to using star-underscore, minus the final step of assigning. Not really very advantageous. Having a way to say "allow additional elements without iterating over them" would be useful, but creating a new way to spell the non-assignment wouldn't be of sufficiently great value to justify the syntax IMO. ChrisA ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JJPVSXCDKGW6TPFFDF46G7CZB43DIMFO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.
Steve Jorgensen wrote: > This is based on previous discussions of possible ways of matching all > remaining items during destructuring but without iterating of remaining final > items. This is not exactly a direct replacement for that idea though, and > skipping iteration of final items might or might not be part of the goal. > In this proposal, the ellipsis (...) can be used in the expression on the > left side of the equals sign in destructuring anywhere that `*` can > appear and has approximately the same meaning. The difference is that when > the ellipsis is used, the matched items are not stored in variables. This can > be useful when the matched data might be very large. > ..., last_one = > a, ..., z = > first_one, ... = > Additionally, when the ellipsis comes last and the data is being retrieved by > iterating, stop retrieving items since that might be expensive and we know > that we will not use them. > Alternative A: > Still iterate over items when the ellipsis comes last (for side effects) but > introduce a new `final_elipsis` object that is used to stop iteration. The > negation of `ellipsis` (e.g. `-...`) could return `final_ellipsis` in that > case. > Alternative B: > Still iterate over items when the ellipsis comes last (for side effects) and > don't provide any new means of skipping iteration over final items. The > programmer can use islice to achieve that. Correction: "are not stored in variables" should say "are not stored in a variable" ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CCQYEZH465W4ARBMBIUWK6YN4J5HNA5B/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.
This is based on previous discussions of possible ways of matching all remaining items during destructuring but without iterating of remaining final items. This is not exactly a direct replacement for that idea though, and skipping iteration of final items might or might not be part of the goal. In this proposal, the ellipsis (...) can be used in the expression on the left side of the equals sign in destructuring anywhere that `*` can appear and has approximately the same meaning. The difference is that when the ellipsis is used, the matched items are not stored in variables. This can be useful when the matched data might be very large. ..., last_one = a, ..., z = first_one, ... = Additionally, when the ellipsis comes last and the data is being retrieved by iterating, stop retrieving items since that might be expensive and we know that we will not use them. Alternative A: Still iterate over items when the ellipsis comes last (for side effects) but introduce a new `final_elipsis` object that is used to stop iteration. The negation of `ellipsis` (e.g. `-...`) could return `final_ellipsis` in that case. Alternative B: Still iterate over items when the ellipsis comes last (for side effects) and don't provide any new means of skipping iteration over final items. The programmer can use islice to achieve that. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QPMFXOOHKQJ6YFM35SJXZMANBQTRZ3FY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
On Wed, 22 Jun 2022 at 22:52, Martin Di Paola wrote: > Perhaps this is the real focus to analyze. The PEP suggests that > x.compute() does something *fundamentally* different from just executing > the intermediate expressions. Hang on, did the PEP change? The version I saw didn't have a compute() method, deferred objects were just evaluated when they were referenced. There's a *huge* difference (in my opinion) between auto-executing deferred expressions, and a syntax for creating *objects* that can be asked to calculate their value. And yes, the latter is extremely close to being nothing more than "a shorter and more composable form of zero-arg lambda", so it needs to be justifiable in comparison to zero-arg lambda (which is why I'm more interested in the composability aspect, building an AST by combining delayed expressions into larger ones). Paul ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/P2LAQMQUM3KBXNH6EE5XBON4BLNTS7UR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote: The difference is in composition of operations. I can write a dozen zero-argument lambdas easily enough. But those are all isolated. But basically, think about `x = (later expensive1() + later expensive2()) / later expensive3()`. How can we make `x` itself be a zero argument lambda? [... see below ...] The following three constructions are roughly equivalent: # Using proposed PEP x = (later expensive1() + later expensive2()) / later expensive3() x.compute() # Using zero-argument nested lambdas x = lambda: ((lambda: expensive1())() + (lambda: expensive2())()) / (lambda: expensive3())() x() # compute # Using the good old function def x(): return (expensive1() + expensive2()) / expensive3() x() # compute [... cont ...] And not just with those exact operations on the Deferreds, but arbitrary combinations of Deferreds to create more complex Deferreds, without performing the intermediate computations? Perhaps this is the real focus to analyze. The PEP suggests that x.compute() does something *fundamentally* different from just executing the intermediate expressions. The *key* difference between the PEP proposal and the use of zero-argument lambdas or good-old functions is that no intermediate expression is computed (expensive1, expensive2 and expensive3 separately) but the whole combined expression is computed by *xxx* allowing it to perform *yyy*. Having said that, I want to remark the "xxx" and "yyy" above are missing from the PEP. The PEP does not explain who nor how these deferred computations are really executed (the xxx and yyy). As I mentioned in my previous email, the PEP gives an example about how Dask uses deferred expression to build and distribute complex arithmetic/statistics operations over a partitioned dataframe but this example cannot be used to assume that Python will do it in the same way. In my previous email I mentioned others libs that implement deferred expressions and "should" benefit from the PEP however I cannot see it how. Perhaps a concrete example borrowed from selectq could spot the missing pieces (xxx and yyy): # from selectq red_divs = sQ.select("div", class_="red") # this is deferred blue_divs = sQ.select("div", class_="blue") # this is deferred both = red_divs | blue_divs # this is deferred count = both.count() # the count() forces the real computation Now, I imagine rewriting it using PEP but # using PEP red_divs = later sQ.select("div", class_="red") blue_divs = later sQ.select("div", class_="blue") both = later (red_divs | blue_divs) count = later both.count() count = count.compute() ...but how does know the Python VM what "compute()" means or how to do it? (the yyy). Is the Python VM who should do it? (the xxx) I would assume that it is not the Python VM but the library... - how the library will know what compute() means? - how will know about the intermediate red_divs and blue_divs? - what benefits would bring this PEP to libs like PySpark, selectq and Django's ORM (to mention a few)? So far the PEP *does not* explain that and it only talks about how to delay plain python code. In this sense then, I would *agree* with Eric Smith. David may have a different intention, but in the current form the PEP is equivalent to zero-argument lambdas. Thanks, Martin. On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote: On Wed, Jun 22, 2022 at 2:17 PM Eric V. Smith wrote: Every time I’ve looked at this, I come back to: other than the clunky syntax, how is explicit evaluation different from a zero-argument lambda? I've enhanced the PEP, so maybe look at the link for some of my updates; but I need to add a bunch more, so don't want to repost each small draft change. But basically, think about `x = (later expensive1() + later expensive2()) / later expensive3()`. How can we make `x` itself be a zero argument lambda? And not just with those exact operations on the Deferreds, but arbitrary combinations of Deferreds to create more complex Deferreds, without performing the intermediate computations? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Y6LVDCFYU23XC6A7ZO2R7VUNTG344GC7/ Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to
[Python-ideas] Re: Generalized deferred computation in Python
> On 22 Jun 2022, at 19:09, Paul Moore wrote: > > I suspect that you consider evaluation-on-reference as an important > feature of your proposal, but could you consider explicit evaluation > as an alternative? Or at the very least address in the PEP the fact > that this would close the door on future explicit evaluation models? I can think of ways to implement evaluation-on-reference, but they all have the effect of making python slower. The simple a = b will need to slow down so that the object in b can checked to see if it need evaluating. How will you avoid making python slower with this feature? Barry ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QBL6JFDKDKFBURKEXOHEOBCK6WULEEQR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
Could this idea be used to specify the default factory of a dataclass field? For example, @dataclass class X: x: list[int] = deferred [] instead of @dataclass class X: x: list[int] = field(default_factory=list) If so, it might be worth adding to your proposal as another common motivating example? Best, Neil On Wednesday, June 22, 2022 at 2:23:13 PM UTC-4 David Mertz, Ph.D. wrote: > On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola > wrote: > >> Hi David, I read the PEP and I think it would be useful to expand the >> Motivation and Examples sections. >> While indeed Dask uses lazy evaluation to build a complex computation >> without executing it, I don't think that it is the whole story. >> Dask takes this deferred complex computation and *plans* how to execute it >> and then it *executes* it in non-obvious/direct ways. >> > > Dask is very clever about execution planning. Ray is possibly even more > clever in that regard. > > However, I think that that should be an explicit non-goal of the PEP. > DeferredObjects should create a DAG, yes. But I think Python itself should > not think about being *clever* in evaluating that DAG, nor in itself think > about parallelism. If my PEP were adopted, that would be something other > libraries like Dask or Django could build on top of with more elaborate > evaluation plans. > > But just the DAG itself gets you more than just "wait until needed to do > the final computation." It allows for intermediate computation of nodes of > the DAG lower down than the final result. For example, imagine > dependencies like this (where all the computation steps are expensive): > > A -> B > B -> Z > B -> Y > B -> X > A -> C > C -> X > X -> F > X -> G > C -> W > C -> V > A -> D > D -> V > D -> U > D -> T > > Hopefully you either see my ASCII art in fixed font, or it's at least > intelligible. If I want to force evaluation of A, I need to do > everything. But if it turns out all I need within my program is C, then I > have to do computations C, X, F, G, W, V. Which is maybe still expensive, > but at least I don't worry about B, Z, Y, U, T, or A. > > Yes, I should add something like this to the PEP. > > >___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/US5MCILKVRPUQBUERK352JKTH7RNJMTJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
On Wed, Jun 22, 2022 at 2:17 PM Eric V. Smith wrote: > Every time I’ve looked at this, I come back to: other than the clunky > syntax, how is explicit evaluation different from a zero-argument lambda? The difference is in composition of operations. I can write a dozen zero-argument lambdas easily enough. But those are all isolated. I've enhanced the PEP, so maybe look at the link for some of my updates; but I need to add a bunch more, so don't want to repost each small draft change. But basically, think about `x = (later expensive1() + later expensive2()) / later expensive3()`. How can we make `x` itself be a zero argument lambda? And not just with those exact operations on the Deferreds, but arbitrary combinations of Deferreds to create more complex Deferreds, without performing the intermediate computations? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Y6LVDCFYU23XC6A7ZO2R7VUNTG344GC7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola wrote: > Hi David, I read the PEP and I think it would be useful to expand the > Motivation and Examples sections. > While indeed Dask uses lazy evaluation to build a complex computation > without executing it, I don't think that it is the whole story. > Dask takes this deferred complex computation and *plans* how to execute it > and then it *executes* it in non-obvious/direct ways. > Dask is very clever about execution planning. Ray is possibly even more clever in that regard. However, I think that that should be an explicit non-goal of the PEP. DeferredObjects should create a DAG, yes. But I think Python itself should not think about being *clever* in evaluating that DAG, nor in itself think about parallelism. If my PEP were adopted, that would be something other libraries like Dask or Django could build on top of with more elaborate evaluation plans. But just the DAG itself gets you more than just "wait until needed to do the final computation." It allows for intermediate computation of nodes of the DAG lower down than the final result. For example, imagine dependencies like this (where all the computation steps are expensive): A -> B B -> Z B -> Y B -> X A -> C C -> X X -> F X -> G C -> W C -> V A -> D D -> V D -> U D -> T Hopefully you either see my ASCII art in fixed font, or it's at least intelligible. If I want to force evaluation of A, I need to do everything. But if it turns out all I need within my program is C, then I have to do computations C, X, F, G, W, V. Which is maybe still expensive, but at least I don't worry about B, Z, Y, U, T, or A. Yes, I should add something like this to the PEP. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4CN3PRNH4M7T5TF36IRDZ6GZMHQS26TW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
> On Jun 22, 2022, at 2:12 PM, Paul Moore wrote: > > On Wed, 22 Jun 2022 at 18:35, David Mertz, Ph.D. > wrote: >> >> Hi Martin, >> >> Short answer: yes, I agree. >> Slightly longer: I would be eternally grateful if you wish to contribute to >> the PEP with any such expansion of the Motivation and Expansion. > > One concern I have, triggered by Martin's Dask, PySpark and Django > examples, is that we've seen proposals in the past for "deferred > expression" objects that capture an unevaluated expression, and make > its AST available for user code to manipulate. The three examples here > could all use such a feature, as could other ORMs (and I'm sure there > are other use cases). This is in contrast to your proposal, which > doesn't seem to help those use cases (if it does, I'd like to > understand how). > > The key distinction seems to be that with your proposal, evaluation is > "on reference" and unavoidable, whereas in the other proposals I've > seen, evaluation happens on demand (and as a result, it's also > possible to work with the expression AST *before* evaluation). My > concern is that we're unlikely to be able to justify *two* forms of > "deferred expression" construct in Python, and your proposal, by > requiring transparent evaluation on reference, would preclude any > processing (such as optimisation, name injection, or other forms of > AST manipulation) of the expression before evaluation. > > I suspect that you consider evaluation-on-reference as an important > feature of your proposal, but could you consider explicit evaluation > as an alternative? Or at the very least address in the PEP the fact > that this would close the door on future explicit evaluation models? Every time I’ve looked at this, I come back to: other than the clunky syntax, how is explicit evaluation different from a zero-argument lambda? Eric ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7OFY7ES6ON3I2QQEVAAATUX2OUHD5L2A/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
On Wed, 22 Jun 2022 at 18:35, David Mertz, Ph.D. wrote: > > Hi Martin, > > Short answer: yes, I agree. > Slightly longer: I would be eternally grateful if you wish to contribute to > the PEP with any such expansion of the Motivation and Expansion. One concern I have, triggered by Martin's Dask, PySpark and Django examples, is that we've seen proposals in the past for "deferred expression" objects that capture an unevaluated expression, and make its AST available for user code to manipulate. The three examples here could all use such a feature, as could other ORMs (and I'm sure there are other use cases). This is in contrast to your proposal, which doesn't seem to help those use cases (if it does, I'd like to understand how). The key distinction seems to be that with your proposal, evaluation is "on reference" and unavoidable, whereas in the other proposals I've seen, evaluation happens on demand (and as a result, it's also possible to work with the expression AST *before* evaluation). My concern is that we're unlikely to be able to justify *two* forms of "deferred expression" construct in Python, and your proposal, by requiring transparent evaluation on reference, would preclude any processing (such as optimisation, name injection, or other forms of AST manipulation) of the expression before evaluation. I suspect that you consider evaluation-on-reference as an important feature of your proposal, but could you consider explicit evaluation as an alternative? Or at the very least address in the PEP the fact that this would close the door on future explicit evaluation models? Paul ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GDMURWK3M4WOQVGEMWFY33ZBRSKQMGYX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
Hi Martin, Short answer: yes, I agree. Slightly longer: I would be eternally grateful if you wish to contribute to the PEP with any such expansion of the Motivation and Expansion. On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola wrote: > Hi David, I read the PEP and I think it would be useful to expand the > Motivation and Examples sections. > > While indeed Dask uses lazy evaluation to build a complex computation > without executing it, I don't think that it is the whole story. > > Dask takes this deferred complex computation and *plans* how to execute it > and then it *executes* it in non-obvious/direct ways. > > For example, the computation of the min() of a dataframe can be done > computing the min() of each partition of the dataframe and then > computing the min() of them. Here is where the plan and the execution > stages play. > > All of this is hidden from the developer. From his/her perspective the > min() is called once over the whole dataframe. > > Dask's deferred computations are "useless" without the > planning/execution plan. > > PySpark, like Dask, does exactly the same. > > But what about Django's ORM? Indeed Django allows you the build a SQL > query without executing it. You can then perform more subqueries, > joins and group by without executing them. > > Only when you need the real data the query is executed. > > This is another example of deferred execution similar to Dask/PySpark > however when we consider the planning/execution stages the similarities > ends there. > > Django's ORM writes a SQL query and send it to a SQL database. > > Another example of deferred execution would be my library to interact > with web pages programmatically: selectq. > > Very much like an ORM, you can select elements from a web page, perform > subselections and unions without really interacting with the web page. > > Only when you want to get the data from the page is when the deferred > computations are executed and like an ORM, the plan done by selectq is > to build a single xpath and then execute it using Selenium. > > So... > > Three cases: Dask/PySpark, Django's ORM and selectq. All of them > implement deferred expressions but all of them "compute" them in very > specific ways (aka, they plan and execute the computation differently). > > Would those libs (and probably others) do benefit from the PEP? How? > > Thanks, > Martin. > > On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote: > >Here is a very rough draft of an idea I've floated often, but not with > much > >specification. Take this as "ideas" with little firm commitment to > details > >from me. PRs, or issues, or whatever, can go to > >https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as > >mentioning them in this thread. > > > >PEP: > >Title: Generalized deferred computation > >Author: David Mertz > >Discussions-To: > >https://mail.python.org/archives/list/python-ideas@python.org/thread/ > >Status: Draft > >Type: Standards Track > >Content-Type: text/x-rst > >Created: 21-Jun-2022 > >Python-Version: 3.12 > >Post-History: > > > >Abstract > > > > > >This PEP proposes introducing the soft keyword ``later`` to express the > >concept > >of deferred computation. When an expression is preceded by the keyword, > the > >expression is not evaluated but rather creates a "thunk" or "deferred > >object." > >Reference to the deferred object later in program flow causes the > >expression to > >be executed at that point, and for both the value and type of the object > to > >become the result of the evaluated expression. > > > > > >Motivation > >== > > > >"Lazy" or "deferred" evaluation is useful paradigm for expressing > >relationships > >among potentially expensive operations prior their actual computation. > Many > >functional programming languages, such as Haskell, build laziness into the > >heart of their language. Within the Python ecosystem, the popular > >scientific > >library `dask-delayed `_ provides a framework for lazy > >evaluation > >that is very similar to that proposed in this PEP. > > > >.. _dask-delayed: > > https://docs.dask.org/en/stable/delayed.html > > > > > >Examples of Use > >=== > > > >While the use of deferred computation is principally useful when > >computations > >are likely to be expensive, the simple examples shown do not necessarily > use > >such expecially spendy computations. Most of these are directly inspired > by > >examples used in the documentation of dask-delayed. > > > >In dask-delayed, ``Delayed`` objects are create by functions, and > operations > >create a *directed acyclic graph* rather than perform actual computations. > >For > >example:: > > > >>>> import dask > >>>> @dask.delayed > >... def later(x): > >... return x > >... > >>>> output = [] > >>>> data = [23, 45, 62] > >>>> for x in data: > >... x = later(x) > >... a = x * 3 > >... b = 2**x > >... c = a + b > >...
[Python-ideas] Re: Generalized deferred computation in Python
I think I have an idea how to do something like what you're asking with less magic, and I think an example implementation of this could actually be done in pure Python code (though a more performant implementation would need support at the C level). What if a deferred object has 1 magic method ( __isdeferred__ ) that is invoked directly rather than causing a thunk, and invocation of any other method does cause a thunk. For the example implementation, a thunk would simply mean that the value is computed and stored within the instance, and method calls on the wrapper are now delegated to that. In the proper implementation, the object would change its identity to become its computed result. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RNZXM55GFZ5DHOHP6QZZ744HUVNDB2BV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
Hi David, I read the PEP and I think it would be useful to expand the Motivation and Examples sections. While indeed Dask uses lazy evaluation to build a complex computation without executing it, I don't think that it is the whole story. Dask takes this deferred complex computation and *plans* how to execute it and then it *executes* it in non-obvious/direct ways. For example, the computation of the min() of a dataframe can be done computing the min() of each partition of the dataframe and then computing the min() of them. Here is where the plan and the execution stages play. All of this is hidden from the developer. From his/her perspective the min() is called once over the whole dataframe. Dask's deferred computations are "useless" without the planning/execution plan. PySpark, like Dask, does exactly the same. But what about Django's ORM? Indeed Django allows you the build a SQL query without executing it. You can then perform more subqueries, joins and group by without executing them. Only when you need the real data the query is executed. This is another example of deferred execution similar to Dask/PySpark however when we consider the planning/execution stages the similarities ends there. Django's ORM writes a SQL query and send it to a SQL database. Another example of deferred execution would be my library to interact with web pages programmatically: selectq. Very much like an ORM, you can select elements from a web page, perform subselections and unions without really interacting with the web page. Only when you want to get the data from the page is when the deferred computations are executed and like an ORM, the plan done by selectq is to build a single xpath and then execute it using Selenium. So... Three cases: Dask/PySpark, Django's ORM and selectq. All of them implement deferred expressions but all of them "compute" them in very specific ways (aka, they plan and execute the computation differently). Would those libs (and probably others) do benefit from the PEP? How? Thanks, Martin. On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote: Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to details from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as mentioning them in this thread. PEP: Title: Generalized deferred computation Author: David Mertz Discussions-To: https://mail.python.org/archives/list/python-ideas@python.org/thread/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 21-Jun-2022 Python-Version: 3.12 Post-History: Abstract This PEP proposes introducing the soft keyword ``later`` to express the concept of deferred computation. When an expression is preceded by the keyword, the expression is not evaluated but rather creates a "thunk" or "deferred object." Reference to the deferred object later in program flow causes the expression to be executed at that point, and for both the value and type of the object to become the result of the evaluated expression. Motivation == "Lazy" or "deferred" evaluation is useful paradigm for expressing relationships among potentially expensive operations prior their actual computation. Many functional programming languages, such as Haskell, build laziness into the heart of their language. Within the Python ecosystem, the popular scientific library `dask-delayed `_ provides a framework for lazy evaluation that is very similar to that proposed in this PEP. .. _dask-delayed: https://docs.dask.org/en/stable/delayed.html Examples of Use === While the use of deferred computation is principally useful when computations are likely to be expensive, the simple examples shown do not necessarily use such expecially spendy computations. Most of these are directly inspired by examples used in the documentation of dask-delayed. In dask-delayed, ``Delayed`` objects are create by functions, and operations create a *directed acyclic graph* rather than perform actual computations. For example:: >>> import dask >>> @dask.delayed ... def later(x): ... return x ... >>> output = [] >>> data = [23, 45, 62] >>> for x in data: ... x = later(x) ... a = x * 3 ... b = 2**x ... c = a + b ... output.append(c) ... >>> total = sum(output) >>> total Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') >>> total.compute() 4611721202807865734 >>> total.visualize() .. figure:: pep--dag.png :align: center :width: 50% :class: invert-in-dark-mode Figure 1. Dask DAG created from simple operations. Under this PEP, the soft keyword ``later`` would work in a similar manner to this dask.delayed code. But rather than requiring calling ``.compute()`` on a ``Delayed`` object to arrive at the result of a computation, every reference to a binding would