Re: [Async-sig] Asyncio loop instrumentation

2018-01-02 Thread Yury Selivanov
I understand why it could be useful to have this in asyncio. But I'm big -1 on 
rushing this functionality in 3.7.

asyncio is no longer provisional, so we have to be careful when we design new 
APIs for it.

Example: I wanted to add support for Task groups to asyncio. A similar concept 
exists in curio and trio and I like it, it can be a big improvement over 
asyncio.gather. But there are too many caveats about handling multiple 
exceptions properly (MultiError?) and some issues with cancellation. That's why 
I decided that it's safer to prototype TaskGroups in a separate package, than 
to push a poorly thought out new API in 3.7.

Same applies to your proposal. You can easily publish a package on PyPI that 
provides an improved version of asyncio event loop. You won't even need to 
write a lot of code, just overload a few methods.

Yury

Sent from my iPhone

> On Jan 2, 2018, at 8:00 PM, Pau Freixes  wrote:
> 
> Agree, poll_start and poll_end suit much better.
> 
> Thanks for the feedback.
> 
> On Tue, Jan 2, 2018 at 1:34 AM, INADA Naoki  wrote:
> For this proposal [4], POC, I've preferred make a reduced list of events:
> 
> * `loop_start` : Executed when the loop starts for the first time.
> * `tick_start` : Executed when a new loop tick is started.
> * `io_start` : Executed when a new IO process starts.
> * `io_end` : Executed when the IO process ends.
> * `tick_end` : Executed when the loop tick ends.
> * `loop_stop` : Executed when the loop stops.
 
 What do you call a "IO process" in this context?
>>> 
>>> Basically the call to the `select/poll/whatever` syscall that will ask
>>> for read or write to a set of file descriptors.
>> 
>> `select/poll/whatever` syscalls doesn't ask for read or write.
>> It waits for read or write (more accurate, waits for readable or
>> writable state).
>> 
>> So poll_start / poll_end looks better name to me.
>> 
>> INADA Naoki  
>> 
>> 
>>> 
>>> Thanks,
>>> 
>>> --
>>> --pau
>>> ___
>>> Async-sig mailing list
>>> Async-sig@python.org
>>> https://mail.python.org/mailman/listinfo/async-sig
>>> Code of Conduct: https://www.python.org/psf/codeofconduct/
> 
> 
> 
> -- 
> --pau
> ___
> Async-sig mailing list
> Async-sig@python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/


Re: [Async-sig] Asyncio loop instrumentation

2018-01-02 Thread Pau Freixes
Agree, poll_start and poll_end suit much better.

Thanks for the feedback.

On Tue, Jan 2, 2018 at 1:34 AM, INADA Naoki  wrote:
 For this proposal [4], POC, I've preferred make a reduced list of events:

 * `loop_start` : Executed when the loop starts for the first time.
 * `tick_start` : Executed when a new loop tick is started.
 * `io_start` : Executed when a new IO process starts.
 * `io_end` : Executed when the IO process ends.
 * `tick_end` : Executed when the loop tick ends.
 * `loop_stop` : Executed when the loop stops.
>>>
>>> What do you call a "IO process" in this context?
>>
>> Basically the call to the `select/poll/whatever` syscall that will ask
>> for read or write to a set of file descriptors.
>
> `select/poll/whatever` syscalls doesn't ask for read or write.
> It waits for read or write (more accurate, waits for readable or
> writable state).
>
> So poll_start / poll_end looks better name to me.
>
> INADA Naoki  
>
>
>>
>> Thanks,
>>
>> --
>> --pau
>> ___
>> Async-sig mailing list
>> Async-sig@python.org
>> https://mail.python.org/mailman/listinfo/async-sig
>> Code of Conduct: https://www.python.org/psf/codeofconduct/



-- 
--pau
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/


Re: [Async-sig] Asyncio loop instrumentation

2018-01-02 Thread Pau Freixes
Hi Yuri,

Its good to know that we are on the same page regarding the lack of a
feature that should be a must. Since Asyncio has become stable and
widely used by many organizations - such as us [1], the needs of tools
that allow us to instrumentalize asynchronous code that runs on top of
Asyncio have increased.

A good example is how some changes in Aiohttp were implemented [2] -
disclaimer, I'm the author of this code part - to allow the developers
to gather more information about how the HTTP calls perform at both
layers, application, and protocol.

This proposal, just a POC, goes in the same direction and tries to
mitigate this lack for the event loop. The related work regarding the
`load` method is conjunctural but helps to understand why this feature
is such important.

I still believe that we can start to fill the gap for Python 3.7, if
finally the window time to implement it gets closed before all work is
done at least we will have some work done.

I still have some questions to be answered that might help to focus
this work in the right way. Few of them as a proof of the rationale.
Perhaps, how much coupled has to be this feature to the AbstractLoop
making it a specification for other loop implementations. And others
purely technical. But, it's true that we must go further with this
questions if we believe that we can take advantage of all of this
effort.

Regards,

[1] https://medium.com/@SkyscannerEng/running-aiohttp-at-scale-2656b7a83a09
[2] https://github.com/aio-libs/aiohttp/pull/2429

On Sun, Dec 31, 2017 at 8:12 PM, Yury Selivanov  wrote:
> When PEP 567 is accepted, I plan to implement advanced instrumentation in 
> uvloop, to monitor basically all io/callback/loop events. I'm still -1 to do 
> this in asyncio at least in 3.7, because i'd like us to have some time to 
> experiment with such instrumentation in real production code (preferably at 
> scale)
>
> Yury
>
> Sent from my iPhone
>
>> On Dec 31, 2017, at 10:02 PM, Antoine Pitrou  wrote:
>>
>> On Sun, 31 Dec 2017 18:32:21 +0100
>> Pau Freixes  wrote:
>>>
>>> These new implementation of the load method - remember that it returns
>>> a load factor between 0.0 and 1.0 that inform you about how bussy is
>>> your loop -
>>
>> What does it mean exactly? Is it the ratio of CPU time over wall clock
>> time?
>>
>> Depending on your needs, the `psutil` library (*) and/or the new
>> `time.thread_time` function (**) may also help.
>>
>> (*) https://psutil.readthedocs.io/en/latest/
>> (**) https://docs.python.org/3.7/library/time.html#time.thread_time
>>
>>> For this proposal [4], POC, I've preferred make a reduced list of events:
>>>
>>> * `loop_start` : Executed when the loop starts for the first time.
>>> * `tick_start` : Executed when a new loop tick is started.
>>> * `io_start` : Executed when a new IO process starts.
>>> * `io_end` : Executed when the IO process ends.
>>> * `tick_end` : Executed when the loop tick ends.
>>> * `loop_stop` : Executed when the loop stops.
>>
>> What do you call a "IO process" in this context?
>>
>> Regards
>>
>> Antoine.
>>
>>
>> ___
>> Async-sig mailing list
>> Async-sig@python.org
>> https://mail.python.org/mailman/listinfo/async-sig
>> Code of Conduct: https://www.python.org/psf/codeofconduct/
> ___
> Async-sig mailing list
> Async-sig@python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/



-- 
--pau
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/


Re: [Async-sig] Asyncio loop instrumentation

2018-01-01 Thread INADA Naoki
>>> For this proposal [4], POC, I've preferred make a reduced list of events:
>>>
>>> * `loop_start` : Executed when the loop starts for the first time.
>>> * `tick_start` : Executed when a new loop tick is started.
>>> * `io_start` : Executed when a new IO process starts.
>>> * `io_end` : Executed when the IO process ends.
>>> * `tick_end` : Executed when the loop tick ends.
>>> * `loop_stop` : Executed when the loop stops.
>>
>> What do you call a "IO process" in this context?
>
> Basically the call to the `select/poll/whatever` syscall that will ask
> for read or write to a set of file descriptors.

`select/poll/whatever` syscalls doesn't ask for read or write.
It waits for read or write (more accurate, waits for readable or
writable state).

So poll_start / poll_end looks better name to me.

INADA Naoki  


>
> Thanks,
>
> --
> --pau
> ___
> Async-sig mailing list
> Async-sig@python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/


Re: [Async-sig] Asyncio loop instrumentation

2018-01-01 Thread Pau Freixes
HI Antonie,

Regarding your questions

>
> What does it mean exactly? Is it the ratio of CPU time over wall clock
> time?

This can be considered a metric that informs you how much CPU
resources are being consumed by your loop, in the best case scenario
where there is only your process, this metric will match with the CPU
usage - important notice that will match with CPU where your process
is executed. Having many processes fighting for the same CPU this
number will be significantly different, taking into account that the
resources
are being divided by many consumers.

Therefore I would like to notice that this load is relative to your
loop rather than an objective value taken from the CPU metric.

To make so with `psutil` you must gather the CPU usage from that
specific CPU where your loop is currently running. Not an impossible
problem
but making it from something trivial to something more complicated.

In the case of the `time.thread_time`  I cant see how I could do that.
You would gather information related to the thread where your loop is
currently running, but there
s nothing straightforward that will help you to take into account
other threads that are fighting for that
specific CPU.

The solution presented is not perfect, and there is still some corner
cases where the load factor might not be enough accurate. The way of
the `load` method has to guess
if the loop is fighting for the CPU resources with other processes is
basically attributing only at maximum the timeout as sleeping time,
perhaps:

t0 = time()
select(fds, timeout=1)
t1 = time()
sleeping_time = min(t1 - t0, 1)

Therefore, if the call to the select took more than 1 second because
the scheduler decided to give the CPU to another process this lambda
time that goes beyond 1 second will be considered
as resource usage time. As you can imagine, the problem with that is
what happens when the select was ready before of 1 second, and the
schedule did not give back the CPU because there
was another more priority process, in that case, this time will be
attributed as sleeping time.


>> For this proposal [4], POC, I've preferred make a reduced list of events:
>>
>> * `loop_start` : Executed when the loop starts for the first time.
>> * `tick_start` : Executed when a new loop tick is started.
>> * `io_start` : Executed when a new IO process starts.
>> * `io_end` : Executed when the IO process ends.
>> * `tick_end` : Executed when the loop tick ends.
>> * `loop_stop` : Executed when the loop stops.
>
> What do you call a "IO process" in this context?

Basically the call to the `select/poll/whatever` syscall that will ask
for read or write to a set of file descriptors.

Thanks,

-- 
--pau
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/


Re: [Async-sig] Asyncio loop instrumentation

2017-12-31 Thread Antoine Pitrou
On Sun, 31 Dec 2017 18:32:21 +0100
Pau Freixes  wrote:
> 
> These new implementation of the load method - remember that it returns
> a load factor between 0.0 and 1.0 that inform you about how bussy is
> your loop -

What does it mean exactly? Is it the ratio of CPU time over wall clock
time?

Depending on your needs, the `psutil` library (*) and/or the new
`time.thread_time` function (**) may also help.

(*) https://psutil.readthedocs.io/en/latest/
(**) https://docs.python.org/3.7/library/time.html#time.thread_time

> For this proposal [4], POC, I've preferred make a reduced list of events:
> 
> * `loop_start` : Executed when the loop starts for the first time.
> * `tick_start` : Executed when a new loop tick is started.
> * `io_start` : Executed when a new IO process starts.
> * `io_end` : Executed when the IO process ends.
> * `tick_end` : Executed when the loop tick ends.
> * `loop_stop` : Executed when the loop stops.

What do you call a "IO process" in this context?

Regards

Antoine.


___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/


[Async-sig] Asyncio loop instrumentation

2017-12-31 Thread Pau Freixes
Hi, foks

First of all, I hope that you have had a good 2017 and I wish for your
the best for 2018.

This email is the continuation of a plan B of the first proposal [1]
to articulate a way to measure
the load of the Asyncio loop. The main objections with the first
implementation were focused on
the technical debt that the implementation imposed, taking into
account that the feature was
definitely out of the main scope of the Asyncio loop goal.

Nathaniel proposed a plan B based on implement some kind of
instrumentalization that will allow
developers to implement features such as the load one. I put off the
plan for a while having, wrongly, feeling that an implementation of
the loop wired with the proper events will impact with the loop
performance. Far away from the reality, the suggested implementation
in terms of performance penalty is almost negligible, at least for
what I considered the happy path which means that there are no
instruments listening for these events.

These new implementation of the load method - remember that it returns
a load factor between 0.0 and 1.0 that inform you about how bussy is
your loop - based on an instrument can be checked with the following
snippet:

async def coro(loop, idx):
await asyncio.sleep(idx % 10)
if load() > 0.9:
return False
start = loop.time()
while loop.time() - start < 0.02:
pass
return True

async def run(loop, n):
tasks = [coro(loop, i) for i in range(n)]
results = await asyncio.gather(*tasks)
abandoned = len([r for r in results if not r])
print("Load reached for {} coros/seq: {}, abandoned
{}/{}".format(n/10, load(), abandoned))

async def main(loop):
await run(loop, 100)

loop = asyncio.get_event_loop()
loop.add_instrument(LoadInstrument)
loop.run_until_complete(main(loop))

The `LoadInstrument` [2] meets the contract of the LoopInstrument[3]
that allow it to listen the proper
loop signals that will be used to calculate the load of the loop.

For this proposal [4], POC, I've preferred make a reduced list of events:

* `loop_start` : Executed when the loop starts for the first time.
* `tick_start` : Executed when a new loop tick is started.
* `io_start` : Executed when a new IO process starts.
* `io_end` : Executed when the IO process ends.
* `tick_end` : Executed when the loop tick ends.
* `loop_stop` : Executed when the loop stops.

The idea of giving just this short list of events try to avoid over
complicate third loops implementations, implementing the minimum set
of events that a typical reactor has to implement.

I would like to gather your feedback for this new approximation, and
if you believe that it might be interesting which are the next steps
that must be done.

Cheers,

[1] https://mail.python.org/pipermail/async-sig/2017-August/000382.html
[2] 
https://github.com/pfreixes/asyncio_load_instrument/blob/master/asyncio_load_instrument/instrument.py#L8
[3] 
https://github.com/pfreixes/cpython/blob/asyncio_loop_instrumentation/Lib/asyncio/loop_instruments.py#L9
[4] 
https://github.com/pfreixes/cpython/commit/adc3ba46979394997c40aa89178b4724442b28eb



-- 
--pau
___
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/