[issue44604] [Enhancement] Asyncio task decorator to provide functionality similar to dask's delayed interface

2021-07-11 Thread Aritn Sarraf


New submission from Aritn Sarraf :

For those not familiar, the dask delayed interface allows a user to define a 
DAG through a functional invocation interface. Dask docs here: 
https://docs.dask.org/en/latest/delayed.html
Another example of this kind of interface is airflow's new TaskFlow api: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/taskflow.html

The proposed solution would look something like this. Essentially all we're 
doing is defining a decorator that will allow you to pass in coroutines to 
another coroutine, and will resolve the dependent coroutines before passing the 
results to your dependent coroutine.

# Note0: can be removed, see Note2 below
async def task_wrapper(val):
return val

def task(afunc):  # open to other names for the decorator since it might be 
a bit ambiguous
async def inner(*args):  # Note1: real solution would be expanded to 
args/kwargs
# Note2: `task_wrapper` kind of unneccesary, we can just 
conditionally not gather in those cases
args = [arg if inspect.isawaitable(arg) else task_wrapper(arg) for 
arg in args]
args = await asyncio.gather(*args)
return await afunc(*args)
return inner


The advantage this gives us in asyncio is that we can easily build processing 
pipelines where each piece is completely independent and does not know anything 
about any other piece of the pipeline. Obviously this is already possible 
currently, but this simple wrapper will provide a very clean way to connect it 
all together.

Take the following example, where we want to fetch data for various ids and 
post process/upload them.

@task
async def fetch(x):
# Note3: timings here defined to demo obvious expected async behavior 
in completion order of print statements
sleep_time = {'a1': 1, 'a2': 2, 'b1': 4, 'b2': 0.5, 'c1': 6, 'c2': 
3.5}[x]
await asyncio.sleep(sleep_time)
ret_val = f'f({x})'
print(f'Done {ret_val}')
return ret_val

async def process(x1, x2):
await asyncio.sleep(1)
ret_val = f'p({x1}, {x2})'
print(f'Done {ret_val}')
return ret_val

Notice we didn't decorate `process`, this is to allow us to demonstrate how you 
can still use the interface on functions that you can't or don't want to 
decorate. Now to define/execute our pipeline we can simply do this. :


async def main():
fa1 = fetch('a1')
fa2 = fetch('a2')
fb1 = fetch('b1')
fb2 = fetch('b2')
fc1 = fetch('c1')
fc2 = fetch('c2')
pa = task(process)(fa1, fa2)
pb = task(process)(fb1, fb2)
pc = task(process)(fc1, fc2)
return await asyncio.gather(pa, pb, pc)
 
loop = asyncio.new_event_loop()
loop.run_until_complete(main())

This will be a very simple non-breaking inclusion to the library, that will 
allow users to build clean/straightforward asynchronous processing 
pipelines/DAGs.

--
components: asyncio
messages: 397274
nosy: asarraf, asvetlov, yselivanov
priority: normal
severity: normal
status: open
title: [Enhancement] Asyncio task decorator to provide functionality similar to 
dask's delayed interface
type: enhancement
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue44604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43929] Raise on threading.Event.__bool__ due to ambiguous nature

2021-04-24 Thread Aritn Sarraf


Aritn Sarraf  added the comment:

Understood. Thanks both, for taking the time to look.

--

___
Python tracker 
<https://bugs.python.org/issue43929>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43929] Raise on threading.Event.__bool__ due to ambiguous nature

2021-04-24 Thread Aritn Sarraf


Aritn Sarraf  added the comment:

Hi Steve, a couple things to preface my following comment. (1) Didn't mean to 
suggest that the current behavior is a bug. I don't think it is a bug, rather 
that it can easily lead to bugs. (2) Sorry for tagging the previous versions, 
I'm not familiar with the ticket system and didn't realize I was asking for 
(nor do I want) this to be changed in previous versions as well. I thought it 
just means what versions was this ticket relevant to.

I do realize and appreciate the basic object model regarding `bool(obj)`, and 
there is nothing in the threading documentation or the language as a whole that 
would lead me to believe that Events should be evaluated for their truthiness 
directly. However, I would like to expand on my case further before closing the 
ticket.

I believe there is a fundamental difference in the "perception" of what an 
Event object represents vs most other objects, *with respect to how bool should 
be evaluated on it*. It undeniably draws very clear parallels to a true boolean 
flag, and is often used as a surrogate for such. I realize it is more than 
that, that it's used for synchronization as well as other things, but the fact 
that Event is so often assigned to a variable called "flag" or a variable that 
implies a discrete boolean state (like my original stop_thread example), rather 
than a variable name that encompasses the full concept of an event, is a good 
indication that this is how people view/use the object. 

Given that the concept of Event and a boolean flag are so closely intertwined, 
I think that (but am *not* suggesting the following) it could even be 
considered appropriate for `bool(not_set_event)` to evaluate to False.  Again, 
I am not suggesting this, as I realize that an Event is more than just it's 
underlying "set" state. But, this is why I think that more often than not it is 
Ambiguous what a developer actually intended by directly evaluating such.

Now, in terms of what the current behavior enables us to do, in other words, by 
adopting this change, what abilities in the language/threading framework are we 
losing. The only thing I can think of is the ability to do this: `event = event 
or Event()`. I don't have statistics on this but I would make the assumption, 
and I believe it's a safe one, that the vast majority of situations where Event 
shows up in a boolean evaluated statement (e.g. if, while, not, and/or) is as 
`event.is_set()`. So much so, that I would even go so far as to make the 
assumption that there is a decently high probability that if someone does write 
`event or/and other_variable` it was done so in error. However, this is nothing 
but an assumption, with no evidence to back it up, so really the point I want 
to get across here is that there is not much utility in `bool(event)` and I 
don't think we're hindering the language in any way by forbidding it.

With respect to backwards compatibility, while it is not backwards compatible, 
it is very refactor friendly. First of all, there will not be many properly 
used cases (e.g. `event = event or Event()`) where this would show up in the 
first place. And second, since we're raising an exception rather than returning 
a different value, we won't introduce unexpected behavior to any existing use 
cases.

All this said, any bugs that this behavior can lead to, are most likely easily 
caught and resolved. But, I think that this change would be, and only be, a 
benefit to developing threaded applications in Python.

--

___
Python tracker 
<https://bugs.python.org/issue43929>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43929] Raise on threading.Event.__bool__ due to ambiguous nature

2021-04-24 Thread Aritn Sarraf


New submission from Aritn Sarraf :

I'll sometimes find myself accidentally doing something like this (especially 
after a long break from using the threading module): 
```
stop_thread = threading.Event()
...
while not stop_thread:  # bug - bool(stop_thread) will always evaluate to True
...
```

Since the intention behind bool(event) is ambiguous and most likely often used 
improperly, I think that it would be a good idea to protect against this easy 
to produce bug, by overriding __bool__ to raise. There is precedent for this 
behavior in the popular numpy library, see here:
https://github.com/numpy/numpy/blob/623bc1fae1d47df24e7f1e29321d0c0ba2771ce0/numpy/core/src/multiarray/number.c#L829

Expanding on my thoughts:
1) Most operations on a threading.Event are associated with checking the 
truthiness of the underlying state of the Event. Meaning that there are many 
opportunities for bool(event) to be called improperly.
2) I can't think of any cases where  you would want to evaluate truthiness on 
anything other than the underlying "set" state of the Event. The one exception 
I can think of being the following (however, I believe this is generally 
accepted to be an anti-pattern, which I don't think should be considered a 
redeeming case for allowing bool(event)):
```
def my_func(event=None):
event = event or threading.Event()
...
```
3) It is an easy addition to protect against this. Simply by raising in __bool__
4) The only backwards incompatibilities this could create are in cases where 
the event is being evaluated for truthiness incorrectly, and in the 
anti-pattern case described in point 2.

--
components: Library (Lib)
messages: 391771
nosy: asarraf
priority: normal
severity: normal
status: open
title: Raise on threading.Event.__bool__ due to ambiguous nature
type: behavior
versions: Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 
3.9

___
Python tracker 
<https://bugs.python.org/issue43929>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com