[Python-ideas] Delayed computation

David Mertz Fri, 29 May 2020 11:42:50 -0700

On Fri, May 29, 2020 at 1:56 PM Rhodri James <rho...@kynesim.co.uk> wrote:


> Presumably "delayed" is something that would be automatically applied to
> the actual parameter given, otherwise your call graphs might or might
> not actually be call graphs depending on how the function was called.
> What happens if I call "foo(y=0)" for instance?
>

I am slightly hijacking the thread.  I think the "solution" to the narrow
"problem" of mutable default arguments is not at all worth having.  So yes,
if that was the only, or even main, purpose of a hypothetical 'delayed'
keyword and 'DelayedType', it would absolutely not be worthwhile.  It would
just happen to solve that problem as a side effect.

Where I think it is valuable is the idea of letting all the normal
operations work on EITHER a DelayedType or whatever type the operation
would otherwise operate on.  So no, `foo(y=0)` would pass in a concrete
type and do greedy calculations, nothing delayed, no in-memory call graph
(other than whatever is implicit in the bytecode).

However, I don't really love the 'concretize' operation, which you
correctly identify as a source of new bugs.  I think it is necessary
occasionally, and actually there is no reason it cannot be a plain old
function rather than a keyword.  The DelayedType object is simply some way
of storing a call graph, which is has some kind of underlying
representation.  So the only slightly special function concretize() could
just walk the graph.

I think it would be better if MANY operations simply *implied* the need to
concretize the call tree into a result value.  The problem is, I'm really
not sure exactly which operations, and probably a lot of them could have
arguments in either direction.  My thinking here is inspired by several
data frame libraries that address this issue.

* Pandas is always eager.  Call a method, it does the computation right
away, and usually returns the mutated data frame object (or some new
similar object) to chain methods in a "fluent style."
* Dask.dataframe is always lazy.  It implements 95% of the Pandas API (by
putting Pandas "under the hood").  You can still chain methods, but
everything returns a call graph that the next method uses to build a larger
call graph.  The ONLY time any computation is done when you explicitly call
`df.compute()`
* Vaex is another data frame library that is very interesting.  It is
*almost always* lazy, and likewise allows chaining methods.  You can easily
build up call graphs behind the scenes (either by chaining or with
intermediate names for "computations").  However, the RIGHT collection of
methods know they need to concretize... and they just do the right thing.

Of these, I've used Vaex the least by far.  However, my impression is that
it makes the right decisions, and users rarely need to think about what is
going on behind the scenes (other than that operations on big data is darn
fast).  Still, "things with data frames" is more specialized than
"everything one can do in Python."  I'm not sure what the right answers are.

For example, my first intuition is that `print(x)` is a good occasion to
implicitly concretize.  If you really want the underlying delayed, maybe
you'd write `print(x.callgraph)` or something similar.  But then, sticking
in a debugging print might wind up being an expensive thing to do in that
case, so...


-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/G7ZYBSFC6L3FFPFODPZWMHV72MLIQPMU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Delayed computation

Reply via email to