[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Chris Angelico
On Thu, 23 Jun 2022 at 11:35, Joao S. O. Bueno  wrote:
>
> Martin Di Paola wrote:
> > Three cases: Dask/PySpark, Django's ORM and selectq. All of them
> > implement deferred expressions but all of them "compute" them in very
> > specific ways (aka, they plan and execute the computation differently).
>
>
> So - I've been hit with the "transparency execution of deferred code" dilemma
> before.
>
> What happens is that: Python, at one point will have to "use" an object - and 
> that use
> is through calling one of the dunder methods. Up to that time, like, just 
> writing the object name
> in a no-operation line, does nothing. (unless the line is in a REPL, which 
> will then call the __repr__
> method in the object).

Why are dunder methods special? Does being passed to some other
function also do nothing? What about a non-dunder attribute?

Especially, does being involved in an 'is' check count as using an object?

dflt = fetch_cached_object("default")
mine = later fetch_cached_object(user.keyword)
...
if mine is dflt: ... # "using" mine? Or not?

Does it make a difference whether the object has previously been poked
in some other way?

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HUJ36AA34SZU7D5Q4G6N5UFFKYUOGOFT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Ben Rudiak-Gould
On Wed, Jun 22, 2022 at 6:36 PM Joao S. O. Bueno 
wrote:

> implement "all possible"
> dunder methods, and proxy those to the underlying object, for a "future
> type" that was
> calculated off-process, and did not need any ".value()" or ".result()"
> methods to be called.
>

Here's a package on PyPI that seems to do that:

https://pypi.org/project/lazy-object-proxy/

It's written partly in C, so it may be fast. I haven't tested it.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CUEVSLDF5AAGN4MFIXU4TQ47Z35AXXAC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Ben Rudiak-Gould
>
> On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote:
> >But basically, think about `x = (later expensive1() + later expensive2())
> /
> >later expensive3()`.  How can we make `x` itself be a zero argument
> >lambda? [... see below ...]
>

   x = lambda: (expensive1() + expensive2()) / expensive3()

What am I missing?

I don't understand what you meant by saying that zero-argument lambdas are
"isolated". It sounds like a way that you happen to think about lambdas,
and not an objective property of lambdas.

This proposal is like lambda on the definition end, but on the invocation
end the call happens implicitly. In effect you have to explicitly mark
everywhere that you *don't* want it to be called instead of everywhere that
you *do* want it to be called. It isn't clear to me that that's better,
much less enough better to justify changing the semantics of what I suppose
is the single most common operation in Python ("getting a value").


On Wed, Jun 22, 2022 at 2:53 PM Martin Di Paola 
wrote:

> # Using zero-argument nested lambdas
> x = lambda: ((lambda: expensive1())() + (lambda: expensive2())()) /
> (lambda: expensive3())()
>

Why not just expensive1() instead of (lambda: expensive1())()?
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FXE5U2JBWZCBSHE5Z2DU3POMBW5K6JKM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] dataclass field argument to allow converting value on init

2022-06-22 Thread Dexter Hill
The idea is to have a `default_factory` like argument (either in the `field` 
function, or a new function entirely) that takes a function as an argument, and 
that function, with the value provided by `__init__`, is called and the return 
value is used as the value for the respective field. For example:
```py
@dataclass
class Foo:
x: str = field(init_fn=chr)

f = Foo(65)
f.x # "A"
```
The `chr` function is called, given the value `65` and `x` is set to its return 
value of `"A"`. I understand that there is both `__init__` and `__post_init__` 
which can be used for this purpose, but sometimes it isn't ideal to override 
them. If you overrided `__init__`, and were using `__post_init__`, you would 
need to manually call it, and in my case, `__post_init__` is implemented on a 
base class, which all other classes inherit, and so overloading it would 
require re-implementing the logic from it (and that's ignoring the fact that 
you also need to type the field with `InitVar` to even have it passed to 
`__post_init__` in the first place).

I've created a proof of concept, shown below:
```py
def initfn(fn, default=None):
class Inner:
def __set_name__(_, owner_cls, owner_name):
old_setattr = getattr(owner_cls, "__setattr__")

def __setattr__(self, attr_name, value):

if attr_name == owner_name:
# Bypass `__setattr__`
self.__dict__[attr_name] = fac(value)

else:
old_setattr(self, attr_name, value)

setattr(owner_cls, "__setattr__", __setattr__)

def fac(value):
if isinstance(value, Inner):
return default

return fn(value)

return field(default=Inner())
```
It makes use of the fact that providing `default` as an argument to `field` 
means it checks the value for a `__set_name__` function, and calls it with the 
class and field name as arguments. Overriding `__setattr__` is just used to 
catch when a value is being assigned to a field, and if that field's name 
matches the name given to `__set_name__`, it calls the function on the value, 
at sets the field to that instead.
It can be used like so:
```py
@dataclass
class Foo:
x: str = initfn(fn=chr, default="Z")

f = Foo(65)
f2 = Foo()

f.x # "A"
f2.x # "Z"
```
It adds a little overhead, especially with having to override `__setattr__` 
however, I believe it would have very little overhead if directly implemented 
in the dataclass library.

Even in the case of being able to override one of the init functions, I still 
think it would be nice to have as a quality of life feature as I feel calling a 
function is too simple to want to override the functions, if that makes sense.

Thanks.
Dexter
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4SM5EVP6MMGGHQMZSJXBML74PWWDHEWV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.

2022-06-22 Thread Steve Jorgensen
> No need to have an object there - you could just define it as a syntactic 
> construct instead. Assignment targets aren't themselves objects (although the 
> same syntax can often be used on the RHS, when it would resolve to one).

Right. Thanks. That _should_ have been obvious. :)

> Having a way to say "allow additional elements without iterating over them" 
> would be useful, but creating a new way to spell the non-assignment wouldn't 
> be of sufficiently great value to justify the syntax IMO.

I mostly agree. I included that option for completeness. It would still have 
the benefit of avoiding the memory usage of creating a list and keeping 
references to the items until the list itself can be collected.

Come to think of it, can (or could) Python already optimize that using current 
syntax, noticing that the variable assigned to is never used after it is 
"assigned" to? If that optimization were implemented (I presume it is not 
implemented now) then there is actually no point to this proposal at all except 
to allow "..." in final positions in the expression to the left of "=" and to 
have that mean to not iterate.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/AHUOIVOS4GXHAI3AT7O5M2MI4BJJER24/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread David Mertz, Ph.D.
Thanks Rob,

I recognize that I have so-far skirted the order-of-precedence concern. I
believe I have used parens in my example everywhere there might be a
question... But that's not a general description or rule.

I have a bunch of issues that I know I need to flesh out, many coming as
suggestions in this thread, which I appreciate. I just wanted to provide
something concrete to start the conversation.

FWIW, there is a bunch more at the link now than in my initial paste. But I
want to clarify more before I copy a new version into the email thread.

I haven't used Twisted in a while, but it is certainly an important
library, and I don't want to cause confusion. Any specific recommendation
on language to use?

On Wed, Jun 22, 2022, 8:45 PM Rob Cliffe via Python-ideas <
python-ideas@python.org> wrote:

> Thank you for your proposal David.  At last we have a counter-proposal to
> talk about.  A few points:
>
> (1) (As I pointed out in an earlier post) There is a flaw in using the
> syntax of an expression PRECEDED by a SOFT keyword:
> x =later-y
> With your proposal, x is assigned a deferred-evaluation-object which will
> be evaluated at some time later as "minus y", right?
> Erm, no.  This is already legal syntax for x being immediately assigned a
> value of "later minus y".
> If you put the soft keyword *after* the expression:
> x  =  -y  later
> it may or may not read as well (subjective) but AFAICS would work.
> Alternatively you could propose a hard keyword.  Or a different syntax
> altogether.
>
> (2) Delayed evaluation may be useful for many purposes.  But for the
> specific purpose of providing late-bound function argument defaults, having
> to write the extra line ("n = n" in your example) removes much of the
> appeal.  Two lines of boilerplate (using a sentinel) replaced by one
> obscure one plus one keyword is not much if any of a win, whereas PEP 671
> would remove the boilerplate altogether apart from one sigil.  Under your
> proposal, I for one would probably stick with the sentinel idiom which is
> explicit.  I think "n=n" is confusing to an inexperienced Python user.
> You may not think this is important.  My opinion is that late-bound
> defaults are important. (We may have to agree to differ.)  Apart from
> anything else: Python fully supports early-bound defaults, why discriminate
> against late-bound ones?
>
> (3) You talk about "deferred objects" and in one place you actually say 
> "Evaluate
> the Deferred".  A "deferred" is an important object but a different concept
> in Twisted, I think calling it something else would be better to avoid
> confusion.
>
> Best wishes
> Rob Cliffe
>
>
> On 21/06/2022 21:53, David Mertz, Ph.D. wrote:
>
> Here is a very rough draft of an idea I've floated often, but not with
> much specification.  Take this as "ideas" with little firm commitment to
> details from me. PRs, or issues, or whatever, can go to
> https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as
> mentioning them in this thread.
>
> PEP: 
> Title: Generalized deferred computation
> Author: David Mertz 
> Discussions-To:
> https://mail.python.org/archives/list/python-ideas@python.org/thread/
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 21-Jun-2022
> Python-Version: 3.12
> Post-History:
>
> Abstract
> 
>
> This PEP proposes introducing the soft keyword ``later`` to express the
> concept
> of deferred computation.  When an expression is preceded by the keyword,
> the
> expression is not evaluated but rather creates a "thunk" or "deferred
> object."
> Reference to the deferred object later in program flow causes the
> expression to
> be executed at that point, and for both the value and type of the object to
> become the result of the evaluated expression.
>
>
> Motivation
> ==
>
> "Lazy" or "deferred" evaluation is useful paradigm for expressing
> relationships
> among potentially expensive operations prior their actual computation.
> Many
> functional programming languages, such as Haskell, build laziness into the
> heart of their language.  Within the Python ecosystem, the popular
> scientific
> library `dask-delayed `_ provides a framework for lazy
> evaluation
> that is very similar to that proposed in this PEP.
>
> .. _dask-delayed:
>https://docs.dask.org/en/stable/delayed.html
>
>
> Examples of Use
> ===
>
> While the use of deferred computation is principally useful when
> computations
> are likely to be expensive, the simple examples shown do not necessarily
> use
> such expecially spendy computations.  Most of these are directly inspired
> by
> examples used in the documentation of dask-delayed.
>
> In dask-delayed, ``Delayed`` objects are create by functions, and
> operations
> create a *directed acyclic graph* rather than perform actual
> computations.  For
> example::
>
> >>> import dask
> >>> @dask.delayed
> ... def later(x):
> ... return x
> ...
> 

[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Martin Di Paola



On Wed, Jun 22, 2022 at 11:22:05PM +0100, Paul Moore wrote:

Hang on, did the PEP change? The version I saw didn't have a compute()
method, deferred objects were just evaluated when they were
referenced.


You are right, the PEP does not mention a compute() method but uses the
that term. I just used to make explicit when the evaluation takes place
in the examples that I gave. My bad.


There's a *huge* difference (in my opinion) between auto-executing
deferred expressions, and a syntax for creating *objects* that can be
asked to calculate their value. And yes, the latter is extremely close
to being nothing more than "a shorter and more composable form of
zero-arg lambda", so it needs to be justifiable in comparison to
zero-arg lambda (which is why I'm more interested in the composability
aspect, building an AST by combining delayed expressions into larger
ones).


Agree, the *huge* difference is what I tried to highlight because it is
there where I see holes in the PEP.

Building an AST as you mentioned could fill on of those holes but how
they are iterated and evaluated is still missing.

Of course, the exactly details will depend of the library that
theoretically could use deferred expressions (like PySpark) but still I
see non trivial details to fill.

 - what would be the API for the objects of the AST that represents the
   deferred expresion(s) ?
 - how the "evaluator" of the expressions would iterate over them? Do
   will the "evaluator" have to check that every of the expressions is
   meaningful for it?
 - does the AST simplifies the implementation of existing libs
   implementing deferred methods?
 - who is the "evaluator" in the case of expressions that don't share a
   common "implementation"?

Allow me to expand on the last item:

# some Dask code
df = later dask_df.filter(...)
s = later df.sum()

# some selectq code
d = later sQ.select("div")
c = later d.count()

# now, mix and compute!
(s + c).compute()

I can see how the deferred expressions are linked and how the AST is
built but "who" knows how to execute it... I'm not sure. Will be Dask
that will know how to plan, optimize and execute the sum() over the
partitions of the dataframe, or will be selectq that knows how to build
an xpath and talk with Selenium? May be will be the Python VM? May be
the three?

I know that those questions have an answer but I still fill that there
are more unknowns (specially of why the PEP would be useful for some lib).

Thanks,
Martin.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZII7WHFRD23SVVAWUIVERFZJNABGJOLD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Joao S. O. Bueno
Martin Di Paola wrote:
> Three cases: Dask/PySpark, Django's ORM and selectq. All of them
> implement deferred expressions but all of them "compute" them in very
> specific ways (aka, they plan and execute the computation differently).


So - I've been hit with the "transparency execution of deferred code"
dilemma
before.

What happens is that: Python, at one point will have to "use" an object -
and that use
is through calling one of the dunder methods. Up to that time, like, just
writing the object name
in a no-operation line, does nothing. (unless the line is in a REPL, which
will then call the __repr__
method in the object).

I have implemented a toy project far in the past that would implement "all
possible"
dunder methods, and proxy those to the underlying object, for a "future
type" that was
calculated off-process, and did not need any ".value()" or ".result()"
methods to be called.

Any such an object, that has slots for all dunder methods, any of which,
when called, would
trigger the resolve, could work today, without any modification, to
implement
the proposed behavior.

And all that is needed to be possible to manipulate the object before the
evaluation takes place, is a single reserved name, within the object
namespace, that is not a proxy to evaluation. It could be special-cased
within the object's class __getattribute__ itself: not even a new reserved
dunder slot would be needed: that is:
"getattr(myobj, "_is_deferred", False)" would not trigger the evaluation.
(although a special slot for it in object would allow plain checking using
"myobj.__is_deferred__" without the need to use getattr or hasattr)

So, all that would  be needed for such a feature would be
keyword support to build this special proxy type.

That said, the usefulness or not of this proposal can be better thought,
as well, as, knowing that this "special attribute" mechanism can be used
to add further inspection/modification mechanisms to the delayed
objects.



The act of "filling in all possible dunder methods" itself is
quite hacky, but even if done in C, I don't think it could be avoided.

Here is the code I referred to that implements the same proxy
type that would be needed for this feature -
(IRRC it is even pip installable):

https://bitbucket.org/jsbueno/lelo/src/master/lelo/_lelo.py


On Wed, Jun 22, 2022 at 11:46 AM Martin Di Paola 
wrote:

> Hi David, I read the PEP and I think it would be useful to expand the
> Motivation and Examples sections.
>
> While indeed Dask uses lazy evaluation to build a complex computation
> without executing it, I don't think that it is the whole story.
>
> Dask takes this deferred complex computation and *plans* how to execute it
> and then it *executes* it in non-obvious/direct ways.
>
> For example, the computation of the min() of a dataframe can be done
> computing the min() of each partition of the dataframe and then
> computing the min() of them. Here is where the plan and the execution
> stages play.
>
> All of this is hidden from the developer. From his/her perspective the
> min() is called once over the whole dataframe.
>
> Dask's deferred computations are "useless" without the
> planning/execution plan.
>
> PySpark, like Dask, does exactly the same.
>
> But what about Django's ORM? Indeed Django allows you the build a SQL
> query without executing it. You can then perform more subqueries,
> joins and group by without executing them.
>
> Only when you need the real data the query is executed.
>
> This is another example of deferred execution similar to Dask/PySpark
> however when we consider the planning/execution stages the similarities
> ends there.
>
> Django's ORM writes a SQL query and send it to a SQL database.
>
> Another example of deferred execution would be my library to interact
> with web pages programmatically: selectq.
>
> Very much like an ORM, you can select elements from a web page, perform
> subselections and unions without really interacting with the web page.
>
> Only when you want to get the data from the page is when the deferred
> computations are executed and like an ORM, the plan done by selectq is
> to build a single xpath and then execute it using Selenium.
>
> So...
>
> Three cases: Dask/PySpark, Django's ORM and selectq. All of them
> implement deferred expressions but all of them "compute" them in very
> specific ways (aka, they plan and execute the computation differently).
>
> Would those libs (and probably others) do benefit from the PEP? How?
>
> Thanks,
> Martin.
>
> On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote:
> >Here is a very rough draft of an idea I've floated often, but not with
> much
> >specification.  Take this as "ideas" with little firm commitment to
> details
> >from me. PRs, or issues, or whatever, can go to
> >https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as
> >mentioning them in this thread.
> >
> >PEP: 
> >Title: Generalized deferred computation
> >Author: David Mertz 
> 

[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Chris Angelico
On Thu, 23 Jun 2022 at 10:44, Rob Cliffe via Python-ideas
 wrote:
>
> Thank you for your proposal David.  At last we have a counter-proposal to 
> talk about.  A few points:
>
> (1) (As I pointed out in an earlier post) There is a flaw in using the syntax 
> of an expression PRECEDED by a SOFT keyword:
> x =later-y
> With your proposal, x is assigned a deferred-evaluation-object which will be 
> evaluated at some time later as "minus y", right?
> Erm, no.  This is already legal syntax for x being immediately assigned a 
> value of "later minus y".
> If you put the soft keyword *after* the expression:
> x  =  -y  later
> it may or may not read as well (subjective) but AFAICS would work.
> Alternatively you could propose a hard keyword.  Or a different syntax 
> altogether.

Or just define that the soft keyword applies only if not followed by
an operator. That way, "later -y" would be interpreted the same way it
always has, and if you actually want a deferred of y's negation, you'd
need to spell it some other way. Although I'm not entirely sure how,
since the obvious choice, grouping parentheses, just makes it look
like a function call instead, and "later 0-y" might not have the same
semantics.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5L3AJ3B65ZXNEZWOWTWDTI36DCLETCDV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Rob Cliffe via Python-ideas
Thank you for your proposal David.  At last we have a counter-proposal 
to talk about.  A few points:


(1) (As I pointed out in an earlier post) There is a flaw in using the 
syntax of an expression PRECEDED by a SOFT keyword:

    x     =    later    -y
With your proposal, x is assigned a deferred-evaluation-object which 
will be evaluated at some time later as "minus y", right?
Erm, no.  This is already legal syntax for x being immediately assigned 
a value of "later minus y".

If you put the soft keyword *after* the expression:
    x  =  -y  later
it may or may not read as well (subjective) but AFAICS would work.
Alternatively you could propose a hard keyword.  Or a different syntax 
altogether.


(2) Delayed evaluation may be useful for many purposes.  But for the 
specific purpose of providing late-bound function argument defaults, 
having to write the extra line ("n = n" in your example) removes much of 
the appeal.  Two lines of boilerplate (using a sentinel) replaced by one 
obscure one plus one keyword is not much if any of a win, whereas PEP 
671 would remove the boilerplate altogether apart from one sigil.  Under 
your proposal, I for one would probably stick with the sentinel idiom 
which is explicit.  I think "n=n" is confusing to an inexperienced 
Python user.
You may not think this is important.  My opinion is that late-bound 
defaults are important. (We may have to agree to differ.)  Apart from 
anything else: Python fully supports early-bound defaults, why 
discriminate against late-bound ones?


(3) You talk about "deferred objects" and in one place you actually say 
"Evaluate the Deferred".  A "deferred" is an important object but a 
different concept in Twisted, I think calling it something else would be 
better to avoid confusion.


Best wishes
Rob Cliffe


On 21/06/2022 21:53, David Mertz, Ph.D. wrote:
Here is a very rough draft of an idea I've floated often, but not with 
much specification.  Take this as "ideas" with little firm commitment 
to details from me. PRs, or issues, or whatever, can go to 
https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as 
mentioning them in this thread.


PEP: 
Title: Generalized deferred computation
Author: David Mertz 
Discussions-To: 
https://mail.python.org/archives/list/python-ideas@python.org/thread/

Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 21-Jun-2022
Python-Version: 3.12
Post-History:

Abstract


This PEP proposes introducing the soft keyword ``later`` to express 
the concept
of deferred computation.  When an expression is preceded by the 
keyword, the
expression is not evaluated but rather creates a "thunk" or "deferred 
object."
Reference to the deferred object later in program flow causes the 
expression to
be executed at that point, and for both the value and type of the 
object to

become the result of the evaluated expression.


Motivation
==

"Lazy" or "deferred" evaluation is useful paradigm for expressing 
relationships
among potentially expensive operations prior their actual 
computation.  Many

functional programming languages, such as Haskell, build laziness into the
heart of their language.  Within the Python ecosystem, the popular 
scientific
library `dask-delayed `_ provides a framework for lazy 
evaluation

that is very similar to that proposed in this PEP.

.. _dask-delayed:
https://docs.dask.org/en/stable/delayed.html


Examples of Use
===

While the use of deferred computation is principally useful when 
computations
are likely to be expensive, the simple examples shown do not 
necessarily use
such expecially spendy computations.  Most of these are directly 
inspired by

examples used in the documentation of dask-delayed.

In dask-delayed, ``Delayed`` objects are create by functions, and 
operations
create a *directed acyclic graph* rather than perform actual 
computations.  For

example::

    >>> import dask
    >>> @dask.delayed
    ... def later(x):
    ...     return x
    ...
    >>> output = []
    >>> data = [23, 45, 62]
    >>> for x in data:
    ...     x = later(x)
    ...     a = x * 3
    ...     b = 2**x
    ...     c = a + b
    ...     output.append(c)
    ...
    >>> total = sum(output)
    >>> total
    Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0')
    >>> total.compute()
    4611721202807865734
    >>> total.visualize()

.. figure:: pep--dag.png
   :align: center
   :width: 50%
   :class: invert-in-dark-mode

   Figure 1.  Dask DAG created from simple operations.

Under this PEP, the soft keyword ``later`` would work in a similar 
manner to
this dask.delayed code.  But rather than requiring calling 
``.compute()`` on a
``Delayed`` object to arrive at the result of a computation, every 
reference to

a binding would perform the "compute" *unless* it was itself a deferred
expression.  So the equivalent code under this PEP would be::

    >>> output = []
    >>> data = [23, 45, 62]
    >>> for later x in data:
    ...     

[Python-ideas] Re: Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.

2022-06-22 Thread Chris Angelico
On Thu, 23 Jun 2022 at 08:56, Steve Jorgensen  wrote:
>
> This is based on previous discussions of possible ways of matching all 
> remaining items during destructuring but without iterating of remaining final 
> items. This is not exactly a direct replacement for that idea though, and 
> skipping iteration of final items might or might not be part of the goal.
>
> In this proposal, the ellipsis (...) can be used in the expression on the 
> left side of the equals sign in destructuring anywhere that `*` can 
> appear and has approximately the same meaning. The difference is that when 
> the ellipsis is used, the matched items are not stored in variables. This can 
> be useful when the matched data might be very large.
>
> ..., last_one = 
> a, ..., z = 
> first_one, ... = 
>
> Additionally, when the ellipsis comes last and the data is being retrieved by 
> iterating, stop retrieving items since that might be expensive and we know 
> that we will not use them.
>
>
> Alternative A:
>
> Still iterate over items when the ellipsis comes last (for side effects) but 
> introduce a new `final_elipsis` object that is used to stop iteration. The 
> negation of `ellipsis` (e.g. `-...`) could return `final_ellipsis` in that 
> case.
>

No need to have an object there - you could just define it as a
syntactic construct instead. Assignment targets aren't themselves
objects (although the same syntax can often be used on the RHS, when
it would resolve to one).

> Alternative B:
>
> Still iterate over items when the ellipsis comes last (for side effects) and 
> don't provide any new means of skipping iteration over final items. The 
> programmer can use islice to achieve that.
>

This is exactly equivalent to using star-underscore, minus the final
step of assigning. Not really very advantageous.

Having a way to say "allow additional elements without iterating over
them" would be useful, but creating a new way to spell the
non-assignment wouldn't be of sufficiently great value to justify the
syntax IMO.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/JJPVSXCDKGW6TPFFDF46G7CZB43DIMFO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.

2022-06-22 Thread Steve Jorgensen
Steve Jorgensen wrote:
> This is based on previous discussions of possible ways of matching all 
> remaining items during destructuring but without iterating of remaining final 
> items. This is not exactly a direct replacement for that idea though, and 
> skipping iteration of final items might or might not be part of the goal.
> In this proposal, the ellipsis (...) can be used in the expression on the 
> left side of the equals sign in destructuring anywhere that `*` can 
> appear and has approximately the same meaning. The difference is that when 
> the ellipsis is used, the matched items are not stored in variables. This can 
> be useful when the matched data might be very large.
> ..., last_one = 
> a, ..., z = 
> first_one, ... = 
> Additionally, when the ellipsis comes last and the data is being retrieved by 
> iterating, stop retrieving items since that might be expensive and we know 
> that we will not use them.
> Alternative A:
> Still iterate over items when the ellipsis comes last (for side effects) but 
> introduce a new `final_elipsis` object that is used to stop iteration. The 
> negation of `ellipsis` (e.g. `-...`) could return `final_ellipsis` in that 
> case.
> Alternative B:
> Still iterate over items when the ellipsis comes last (for side effects) and 
> don't provide any new means of skipping iteration over final items. The 
> programmer can use islice to achieve that.

Correction: "are not stored in variables" should say "are not stored in a 
variable"
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CCQYEZH465W4ARBMBIUWK6YN4J5HNA5B/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.

2022-06-22 Thread Steve Jorgensen
This is based on previous discussions of possible ways of matching all 
remaining items during destructuring but without iterating of remaining final 
items. This is not exactly a direct replacement for that idea though, and 
skipping iteration of final items might or might not be part of the goal.

In this proposal, the ellipsis (...) can be used in the expression on the left 
side of the equals sign in destructuring anywhere that `*` can appear 
and has approximately the same meaning. The difference is that when the 
ellipsis is used, the matched items are not stored in variables. This can be 
useful when the matched data might be very large.

..., last_one = 
a, ..., z = 
first_one, ... = 

Additionally, when the ellipsis comes last and the data is being retrieved by 
iterating, stop retrieving items since that might be expensive and we know that 
we will not use them.


Alternative A:

Still iterate over items when the ellipsis comes last (for side effects) but 
introduce a new `final_elipsis` object that is used to stop iteration. The 
negation of `ellipsis` (e.g. `-...`) could return `final_ellipsis` in that case.


Alternative B:

Still iterate over items when the ellipsis comes last (for side effects) and 
don't provide any new means of skipping iteration over final items. The 
programmer can use islice to achieve that.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QPMFXOOHKQJ6YFM35SJXZMANBQTRZ3FY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Paul Moore
On Wed, 22 Jun 2022 at 22:52, Martin Di Paola  wrote:
> Perhaps this is the real focus to analyze. The PEP suggests that
> x.compute() does something *fundamentally* different from just executing
> the intermediate expressions.

Hang on, did the PEP change? The version I saw didn't have a compute()
method, deferred objects were just evaluated when they were
referenced.

There's a *huge* difference (in my opinion) between auto-executing
deferred expressions, and a syntax for creating *objects* that can be
asked to calculate their value. And yes, the latter is extremely close
to being nothing more than "a shorter and more composable form of
zero-arg lambda", so it needs to be justifiable in comparison to
zero-arg lambda (which is why I'm more interested in the composability
aspect, building an AST by combining delayed expressions into larger
ones).

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/P2LAQMQUM3KBXNH6EE5XBON4BLNTS7UR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Martin Di Paola


On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote:

The difference is in composition of operations.  I can write a dozen
zero-argument lambdas easily enough.  But those are all isolated.
But basically, think about `x = (later expensive1() + later expensive2()) /
later expensive3()`.  How can we make `x` itself be a zero argument
lambda? [... see below ...]


The following three constructions are roughly equivalent:

# Using proposed PEP
x = (later expensive1() + later expensive2()) / later expensive3()
x.compute()

# Using zero-argument nested lambdas
x = lambda: ((lambda: expensive1())() + (lambda: expensive2())()) / (lambda: 
expensive3())()
x() # compute

# Using the good old function
def x():
return (expensive1() + expensive2()) / expensive3()
x() # compute


[... cont ...] And not just with those exact operations on the Deferreds, but
arbitrary combinations of Deferreds to create more complex Deferreds,
without performing the intermediate computations?


Perhaps this is the real focus to analyze. The PEP suggests that
x.compute() does something *fundamentally* different from just executing
the intermediate expressions.

The *key* difference between the PEP proposal and the use of
zero-argument lambdas or good-old functions is that no intermediate
expression is computed (expensive1, expensive2 and expensive3 separately)
but the whole combined expression is computed by *xxx* allowing it to perform
*yyy*.

Having said that, I want to remark the "xxx" and "yyy" above are missing
from the PEP.

The PEP does not explain who nor how these deferred
computations are really executed (the xxx and yyy).

As I mentioned in my previous email, the PEP gives an example about how
Dask uses deferred expression to build and distribute complex
arithmetic/statistics operations over a partitioned dataframe
but this example cannot be used to assume that Python
will do it in the same way.

In my previous email I mentioned others libs that implement deferred expressions
and "should" benefit from the PEP however I cannot see it how.

Perhaps a concrete example borrowed from selectq
could spot the missing pieces (xxx and yyy):

# from selectq
red_divs = sQ.select("div", class_="red")   # this is deferred
blue_divs = sQ.select("div", class_="blue") # this is deferred
both = red_divs | blue_divs # this is deferred
count = both.count()  # the count() forces the real computation

Now, I imagine rewriting it using PEP but

# using PEP
red_divs = later sQ.select("div", class_="red")
blue_divs = later sQ.select("div", class_="blue")
both = later (red_divs | blue_divs)
count = later both.count()
count = count.compute()

...but how does know the Python VM what "compute()" means or how to do
it? (the yyy). Is the Python VM who should do it? (the xxx)

I would assume that it is not the Python VM but the library...

 - how the library will know what compute() means?
 - how will know about the intermediate red_divs and blue_divs?
 - what benefits would bring this PEP to libs like PySpark, selectq and
   Django's ORM (to mention a few)?

So far the PEP *does not* explain that and it only talks about how to
delay plain python code.

In this sense then, I would *agree* with Eric Smith.

David may have a different intention, but in the current form the PEP
is equivalent to zero-argument lambdas.

Thanks,
Martin.

On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote:

On Wed, Jun 22, 2022 at 2:17 PM Eric V. Smith  wrote:


Every time I’ve looked at this, I come back to: other than the clunky
syntax, how is explicit evaluation different from a zero-argument lambda?




I've enhanced the PEP, so maybe look at the link for some of my updates;
but I need to add a bunch more, so don't want to repost each small draft
change.

But basically, think about `x = (later expensive1() + later expensive2()) /
later expensive3()`.  How can we make `x` itself be a zero argument
lambda?  And not just with those exact operations on the Deferreds, but
arbitrary combinations of Deferreds to create more complex Deferreds,
without performing the intermediate computations?


--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.



___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Y6LVDCFYU23XC6A7ZO2R7VUNTG344GC7/
Code of Conduct: http://python.org/psf/codeofconduct/


___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to 

[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Barry Scott


> On 22 Jun 2022, at 19:09, Paul Moore  wrote:
> 
> I suspect that you consider evaluation-on-reference as an important
> feature of your proposal, but could you consider explicit evaluation
> as an alternative? Or at the very least address in the PEP the fact
> that this would close the door on future explicit evaluation models?

I can think of ways to implement evaluation-on-reference, but they all have the 
effect of
making python slower.

The simple

a = b

will need to slow down so that the object in b can checked to see if it need 
evaluating.

How will you avoid making python slower with this feature?

Barry

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QBL6JFDKDKFBURKEXOHEOBCK6WULEEQR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Neil Girdhar
Could this idea be used to specify the default factory of a dataclass 
field?  For example,

@dataclass
class X:
  x: list[int] = deferred []

instead of

@dataclass
class X:
  x: list[int] = field(default_factory=list)

If so, it might be worth adding to your proposal as another common 
motivating example?

Best,

Neil

On Wednesday, June 22, 2022 at 2:23:13 PM UTC-4 David Mertz, Ph.D. wrote:

> On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola  
> wrote:
>
>> Hi David, I read the PEP and I think it would be useful to expand the
>> Motivation and Examples sections.
>> While indeed Dask uses lazy evaluation to build a complex computation
>> without executing it, I don't think that it is the whole story.
>> Dask takes this deferred complex computation and *plans* how to execute it
>> and then it *executes* it in non-obvious/direct ways.
>>
>
> Dask is very clever about execution planning.  Ray is possibly even more 
> clever in that regard.
>
> However, I think that that should be an explicit non-goal of the PEP.  
> DeferredObjects should create a DAG, yes.  But I think Python itself should 
> not think about being *clever* in evaluating that DAG, nor in itself think 
> about parallelism.  If my PEP were adopted, that would be something other 
> libraries like Dask or Django could build on top of with more elaborate 
> evaluation plans.
>
> But just the DAG itself gets you more than just "wait until needed to do 
> the final computation." It allows for intermediate computation of nodes of 
> the DAG lower down than the final result.  For example, imagine 
> dependencies like this (where all the computation steps are expensive):
>
> A -> B
>  B -> Z
>  B -> Y
>  B -> X
> A -> C
>  C -> X
>   X -> F
>   X -> G
>  C -> W
>  C -> V
> A -> D
>  D -> V
>  D -> U
>  D -> T
>  
> Hopefully you either see my ASCII art in fixed font, or it's at least 
> intelligible.  If I want to force evaluation of A, I need to do 
> everything.  But if it turns out all I need within my program is C, then I 
> have to do computations C, X, F, G, W, V.  Which is maybe still expensive, 
> but at least I don't worry about B, Z, Y, U, T, or A.
>
> Yes, I should add something like this to the PEP.
>
>
>___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/US5MCILKVRPUQBUERK352JKTH7RNJMTJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread David Mertz, Ph.D.
On Wed, Jun 22, 2022 at 2:17 PM Eric V. Smith  wrote:

> Every time I’ve looked at this, I come back to: other than the clunky
> syntax, how is explicit evaluation different from a zero-argument lambda?


The difference is in composition of operations.  I can write a dozen
zero-argument lambdas easily enough.  But those are all isolated.

I've enhanced the PEP, so maybe look at the link for some of my updates;
but I need to add a bunch more, so don't want to repost each small draft
change.

But basically, think about `x = (later expensive1() + later expensive2()) /
later expensive3()`.  How can we make `x` itself be a zero argument
lambda?  And not just with those exact operations on the Deferreds, but
arbitrary combinations of Deferreds to create more complex Deferreds,
without performing the intermediate computations?


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Y6LVDCFYU23XC6A7ZO2R7VUNTG344GC7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread David Mertz, Ph.D.
On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola 
wrote:

> Hi David, I read the PEP and I think it would be useful to expand the
> Motivation and Examples sections.
> While indeed Dask uses lazy evaluation to build a complex computation
> without executing it, I don't think that it is the whole story.
> Dask takes this deferred complex computation and *plans* how to execute it
> and then it *executes* it in non-obvious/direct ways.
>

Dask is very clever about execution planning.  Ray is possibly even more
clever in that regard.

However, I think that that should be an explicit non-goal of the PEP.
DeferredObjects should create a DAG, yes.  But I think Python itself should
not think about being *clever* in evaluating that DAG, nor in itself think
about parallelism.  If my PEP were adopted, that would be something other
libraries like Dask or Django could build on top of with more elaborate
evaluation plans.

But just the DAG itself gets you more than just "wait until needed to do
the final computation." It allows for intermediate computation of nodes of
the DAG lower down than the final result.  For example, imagine
dependencies like this (where all the computation steps are expensive):

A -> B
 B -> Z
 B -> Y
 B -> X
A -> C
 C -> X
  X -> F
  X -> G
 C -> W
 C -> V
A -> D
 D -> V
 D -> U
 D -> T

Hopefully you either see my ASCII art in fixed font, or it's at least
intelligible.  If I want to force evaluation of A, I need to do
everything.  But if it turns out all I need within my program is C, then I
have to do computations C, X, F, G, W, V.  Which is maybe still expensive,
but at least I don't worry about B, Z, Y, U, T, or A.

Yes, I should add something like this to the PEP.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4CN3PRNH4M7T5TF36IRDZ6GZMHQS26TW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Eric V. Smith via Python-ideas

> On Jun 22, 2022, at 2:12 PM, Paul Moore  wrote:
> 
> On Wed, 22 Jun 2022 at 18:35, David Mertz, Ph.D.  
> wrote:
>> 
>> Hi Martin,
>> 
>> Short answer: yes, I agree.
>> Slightly longer: I would be eternally grateful if you wish to contribute to 
>> the PEP with any such expansion of the Motivation and Expansion.
> 
> One concern I have, triggered by Martin's Dask, PySpark and Django
> examples, is that we've seen proposals in the past for "deferred
> expression" objects that capture an unevaluated expression, and make
> its AST available for user code to manipulate. The three examples here
> could all use such a feature, as could other ORMs (and I'm sure there
> are other use cases). This is in contrast to your proposal, which
> doesn't seem to help those use cases (if it does, I'd like to
> understand how).
> 
> The key distinction seems to be that with your proposal, evaluation is
> "on reference" and unavoidable, whereas in the other proposals I've
> seen, evaluation happens on demand (and as a result, it's also
> possible to work with the expression AST *before* evaluation). My
> concern is that we're unlikely to be able to justify *two* forms of
> "deferred expression" construct in Python, and your proposal, by
> requiring transparent evaluation on reference, would preclude any
> processing (such as optimisation, name injection, or other forms of
> AST manipulation) of the expression before evaluation.
> 
> I suspect that you consider evaluation-on-reference as an important
> feature of your proposal, but could you consider explicit evaluation
> as an alternative? Or at the very least address in the PEP the fact
> that this would close the door on future explicit evaluation models?

Every time I’ve looked at this, I come back to: other than the clunky syntax, 
how is explicit evaluation different from a zero-argument lambda?

Eric
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7OFY7ES6ON3I2QQEVAAATUX2OUHD5L2A/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Paul Moore
On Wed, 22 Jun 2022 at 18:35, David Mertz, Ph.D.  wrote:
>
> Hi Martin,
>
> Short answer: yes, I agree.
> Slightly longer: I would be eternally grateful if you wish to contribute to 
> the PEP with any such expansion of the Motivation and Expansion.

One concern I have, triggered by Martin's Dask, PySpark and Django
examples, is that we've seen proposals in the past for "deferred
expression" objects that capture an unevaluated expression, and make
its AST available for user code to manipulate. The three examples here
could all use such a feature, as could other ORMs (and I'm sure there
are other use cases). This is in contrast to your proposal, which
doesn't seem to help those use cases (if it does, I'd like to
understand how).

The key distinction seems to be that with your proposal, evaluation is
"on reference" and unavoidable, whereas in the other proposals I've
seen, evaluation happens on demand (and as a result, it's also
possible to work with the expression AST *before* evaluation). My
concern is that we're unlikely to be able to justify *two* forms of
"deferred expression" construct in Python, and your proposal, by
requiring transparent evaluation on reference, would preclude any
processing (such as optimisation, name injection, or other forms of
AST manipulation) of the expression before evaluation.

I suspect that you consider evaluation-on-reference as an important
feature of your proposal, but could you consider explicit evaluation
as an alternative? Or at the very least address in the PEP the fact
that this would close the door on future explicit evaluation models?

Paul
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/GDMURWK3M4WOQVGEMWFY33ZBRSKQMGYX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread David Mertz, Ph.D.
Hi Martin,

Short answer: yes, I agree.
Slightly longer: I would be eternally grateful if you wish to contribute to
the PEP with any such expansion of the Motivation and Expansion.

On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola 
wrote:

> Hi David, I read the PEP and I think it would be useful to expand the
> Motivation and Examples sections.
>
> While indeed Dask uses lazy evaluation to build a complex computation
> without executing it, I don't think that it is the whole story.
>
> Dask takes this deferred complex computation and *plans* how to execute it
> and then it *executes* it in non-obvious/direct ways.
>
> For example, the computation of the min() of a dataframe can be done
> computing the min() of each partition of the dataframe and then
> computing the min() of them. Here is where the plan and the execution
> stages play.
>
> All of this is hidden from the developer. From his/her perspective the
> min() is called once over the whole dataframe.
>
> Dask's deferred computations are "useless" without the
> planning/execution plan.
>
> PySpark, like Dask, does exactly the same.
>
> But what about Django's ORM? Indeed Django allows you the build a SQL
> query without executing it. You can then perform more subqueries,
> joins and group by without executing them.
>
> Only when you need the real data the query is executed.
>
> This is another example of deferred execution similar to Dask/PySpark
> however when we consider the planning/execution stages the similarities
> ends there.
>
> Django's ORM writes a SQL query and send it to a SQL database.
>
> Another example of deferred execution would be my library to interact
> with web pages programmatically: selectq.
>
> Very much like an ORM, you can select elements from a web page, perform
> subselections and unions without really interacting with the web page.
>
> Only when you want to get the data from the page is when the deferred
> computations are executed and like an ORM, the plan done by selectq is
> to build a single xpath and then execute it using Selenium.
>
> So...
>
> Three cases: Dask/PySpark, Django's ORM and selectq. All of them
> implement deferred expressions but all of them "compute" them in very
> specific ways (aka, they plan and execute the computation differently).
>
> Would those libs (and probably others) do benefit from the PEP? How?
>
> Thanks,
> Martin.
>
> On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote:
> >Here is a very rough draft of an idea I've floated often, but not with
> much
> >specification.  Take this as "ideas" with little firm commitment to
> details
> >from me. PRs, or issues, or whatever, can go to
> >https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as
> >mentioning them in this thread.
> >
> >PEP: 
> >Title: Generalized deferred computation
> >Author: David Mertz 
> >Discussions-To:
> >https://mail.python.org/archives/list/python-ideas@python.org/thread/
> >Status: Draft
> >Type: Standards Track
> >Content-Type: text/x-rst
> >Created: 21-Jun-2022
> >Python-Version: 3.12
> >Post-History:
> >
> >Abstract
> >
> >
> >This PEP proposes introducing the soft keyword ``later`` to express the
> >concept
> >of deferred computation.  When an expression is preceded by the keyword,
> the
> >expression is not evaluated but rather creates a "thunk" or "deferred
> >object."
> >Reference to the deferred object later in program flow causes the
> >expression to
> >be executed at that point, and for both the value and type of the object
> to
> >become the result of the evaluated expression.
> >
> >
> >Motivation
> >==
> >
> >"Lazy" or "deferred" evaluation is useful paradigm for expressing
> >relationships
> >among potentially expensive operations prior their actual computation.
> Many
> >functional programming languages, such as Haskell, build laziness into the
> >heart of their language.  Within the Python ecosystem, the popular
> >scientific
> >library `dask-delayed `_ provides a framework for lazy
> >evaluation
> >that is very similar to that proposed in this PEP.
> >
> >.. _dask-delayed:
> >   https://docs.dask.org/en/stable/delayed.html
> >
> >
> >Examples of Use
> >===
> >
> >While the use of deferred computation is principally useful when
> >computations
> >are likely to be expensive, the simple examples shown do not necessarily
> use
> >such expecially spendy computations.  Most of these are directly inspired
> by
> >examples used in the documentation of dask-delayed.
> >
> >In dask-delayed, ``Delayed`` objects are create by functions, and
> operations
> >create a *directed acyclic graph* rather than perform actual computations.
> >For
> >example::
> >
> >>>> import dask
> >>>> @dask.delayed
> >... def later(x):
> >... return x
> >...
> >>>> output = []
> >>>> data = [23, 45, 62]
> >>>> for x in data:
> >... x = later(x)
> >... a = x * 3
> >... b = 2**x
> >... c = a + b
> >...  

[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Steve Jorgensen
I think I have an idea how to do something like what you're asking with less 
magic, and I think an example implementation of this could actually be done in 
pure Python code (though a more performant implementation would need support at 
the C level).

What if a deferred object has 1 magic method ( __isdeferred__ ) that is invoked 
directly rather than causing a thunk, and invocation of any other method does 
cause a thunk. For the example implementation, a thunk would simply mean that 
the value is computed and stored within the instance, and method calls on the 
wrapper are now delegated to that. In the proper implementation, the object 
would change its identity to become its computed result.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RNZXM55GFZ5DHOHP6QZZ744HUVNDB2BV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generalized deferred computation in Python

2022-06-22 Thread Martin Di Paola

Hi David, I read the PEP and I think it would be useful to expand the
Motivation and Examples sections.

While indeed Dask uses lazy evaluation to build a complex computation
without executing it, I don't think that it is the whole story.

Dask takes this deferred complex computation and *plans* how to execute it
and then it *executes* it in non-obvious/direct ways.

For example, the computation of the min() of a dataframe can be done
computing the min() of each partition of the dataframe and then
computing the min() of them. Here is where the plan and the execution
stages play.

All of this is hidden from the developer. From his/her perspective the
min() is called once over the whole dataframe.

Dask's deferred computations are "useless" without the
planning/execution plan.

PySpark, like Dask, does exactly the same.

But what about Django's ORM? Indeed Django allows you the build a SQL
query without executing it. You can then perform more subqueries,
joins and group by without executing them.

Only when you need the real data the query is executed.

This is another example of deferred execution similar to Dask/PySpark
however when we consider the planning/execution stages the similarities
ends there.

Django's ORM writes a SQL query and send it to a SQL database.

Another example of deferred execution would be my library to interact
with web pages programmatically: selectq.

Very much like an ORM, you can select elements from a web page, perform
subselections and unions without really interacting with the web page.

Only when you want to get the data from the page is when the deferred
computations are executed and like an ORM, the plan done by selectq is
to build a single xpath and then execute it using Selenium.

So...

Three cases: Dask/PySpark, Django's ORM and selectq. All of them
implement deferred expressions but all of them "compute" them in very
specific ways (aka, they plan and execute the computation differently).

Would those libs (and probably others) do benefit from the PEP? How?

Thanks,
Martin.

On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote:

Here is a very rough draft of an idea I've floated often, but not with much
specification.  Take this as "ideas" with little firm commitment to details
from me. PRs, or issues, or whatever, can go to
https://github.com/DavidMertz/peps/blob/master/pep-.rst as well as
mentioning them in this thread.

PEP: 
Title: Generalized deferred computation
Author: David Mertz 
Discussions-To:
https://mail.python.org/archives/list/python-ideas@python.org/thread/
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 21-Jun-2022
Python-Version: 3.12
Post-History:

Abstract


This PEP proposes introducing the soft keyword ``later`` to express the
concept
of deferred computation.  When an expression is preceded by the keyword, the
expression is not evaluated but rather creates a "thunk" or "deferred
object."
Reference to the deferred object later in program flow causes the
expression to
be executed at that point, and for both the value and type of the object to
become the result of the evaluated expression.


Motivation
==

"Lazy" or "deferred" evaluation is useful paradigm for expressing
relationships
among potentially expensive operations prior their actual computation.  Many
functional programming languages, such as Haskell, build laziness into the
heart of their language.  Within the Python ecosystem, the popular
scientific
library `dask-delayed `_ provides a framework for lazy
evaluation
that is very similar to that proposed in this PEP.

.. _dask-delayed:
  https://docs.dask.org/en/stable/delayed.html


Examples of Use
===

While the use of deferred computation is principally useful when
computations
are likely to be expensive, the simple examples shown do not necessarily use
such expecially spendy computations.  Most of these are directly inspired by
examples used in the documentation of dask-delayed.

In dask-delayed, ``Delayed`` objects are create by functions, and operations
create a *directed acyclic graph* rather than perform actual computations.
For
example::

   >>> import dask
   >>> @dask.delayed
   ... def later(x):
   ... return x
   ...
   >>> output = []
   >>> data = [23, 45, 62]
   >>> for x in data:
   ... x = later(x)
   ... a = x * 3
   ... b = 2**x
   ... c = a + b
   ... output.append(c)
   ...
   >>> total = sum(output)
   >>> total
   Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0')
   >>> total.compute()
   4611721202807865734
   >>> total.visualize()

.. figure:: pep--dag.png
  :align: center
  :width: 50%
  :class: invert-in-dark-mode

  Figure 1.  Dask DAG created from simple operations.

Under this PEP, the soft keyword ``later`` would work in a similar manner to
this dask.delayed code.  But rather than requiring calling ``.compute()``
on a
``Delayed`` object to arrive at the result of a computation, every
reference to
a binding would