Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread Tobias Oberstein
> Just a final note.. a single no-fds call to select with a 0 timeout seems to 
> take
> around 280ns on my Core 2. Presumably the better interfaces (e.g. epoll, but

Are you sure there is actually a context switch happening with this syscall 
using no FDs and timeout 0?

280ns means your machine can do 3.5 millions selects() per second. Mmh. I have 
a hard time believing this would sustain with timeout>0 and/or FDs - anywhere 
near that magnitude. 

http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html
 
> not poll) will also take around the same time.
> 
> It's really hard to write even a single Python function that gets anywhere 
> below
> 1usec CPU time, and given how function-heavy Twisted is, I'd be surprised

PyPy:


import timeit
import time

def f1():
   pass

def f2(x = 3):
   return x*x

def f3():
   return map(lambda x: x^2, range(10))

for f in [f1, f2, f3]:
   for i in range(5):
  print f, timeit.timeit(f, number = 100)
  time.sleep(2)


oberstet@vbox-ubuntu1310:~/scm/scratchbox/python/twisted/timers$ 
~/pypy-2.3-linux64/bin/pypy test2.py 
 0.00407981872559
 0.00138401985168
 0.00138092041016
 0.00148296356201
 0.00153708457947
 0.0269131660461
 0.0319480895996
 0.0243380069733
 0.0251939296722
 0.025454044342
 0.474506855011
 0.452003002167
 0.446584939957
 0.441245079041
 0.467758893967


This is in a VM on low-end gear. My notebook actually.

With trivial functions like above, PyPy might even optimize away funs 
altogether and inline those. Not sure though.

> considerations like this factored usefully into a design at all :)
> 
> Adjusting timer coalescing to extreme settings might even worsen your app's
> performance, since it'll cause interpreter time and syscalls to all be
> compressed around the slack intervals, leaving the CPU idle more often, rather
> than running evenly spaced over time. This might produce less desirable app
> behavior overall (e.g. it has knock-on effects for network interface queues,
> bursts of disk IO/SQL queries, network switch buffer overruns, or whatever
> else).

I can see those points. Will keep in mind and measure.

However, I am running a multi-process server, so as long as those processes 
tick slightly pitched w.r.t. each other, and there are more processes than 
cores, I guess I can saturate the CPUs.

And then this is a network server, so there will be FD activity (lots of).

It's just that I want to limit the impacts of having massive amounts of timers 
associated with network connections (ping/pong to detect lost TCP).

In any case: thanks a lot for all your hints!

/Tobias

> 
> 
> David
> 
> ___
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread Tobias Oberstein
>>I want to trade less precision (timers fire at less exact times) for higher 
>>efficiency (less context switches). 

>It's easy enough to write one yourself. This might work:
>from twisted.internet.task import Clock, LoopingCall
>
>clock = Clock()
>LoopingCall(lambda: clock.advance(0.001)).start(0.001)
>Now just do "clock.callLater" instead of "reactor.callLater".

Oh, cool. That make me smile;)

Does what I want, is simple and portable. Great.

Only worry is

http://twistedmatrix.com/trac/browser/tags/releases/twisted-14.0.0/twisted/internet/task.py#L767

Why does it sort after each and every callLater?

And:
http://twistedmatrix.com/trac/browser/tags/releases/twisted-14.0.0/twisted/internet/task.py#L793

It also sorts after each firing of a delayed call. Presumably because that 
delayed call might reschedule another call that might also fire in same time 
period?

/Tobias

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread Itamar Turner-Trauring

On 08/10/2014 06:23 PM, Tobias Oberstein wrote:
I want to trade less precision (timers fire at less exact times) for 
higher efficiency (less context switches).


It's easy enough to write one yourself. This might work:

   from twisted.internet.task import Clock, LoopingCall

   clock = Clock()
   LoopingCall(lambda: clock.advance(0.001)).start(0.001)

Now just do "clock.callLater" instead of "reactor.callLater".
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread dw+twisted-python
On Sun, Aug 10, 2014 at 03:31:11PM -0700, Tobias Oberstein wrote:

> Thanks a lot for those hints! I will read into this material.

Just a final note.. a single no-fds call to select with a 0 timeout
seems to take around 280ns on my Core 2. Presumably the better
interfaces (e.g. epoll, but not poll) will also take around the same
time.

It's really hard to write even a single Python function that gets
anywhere below 1usec CPU time, and given how function-heavy Twisted is,
I'd be surprised considerations like this factored usefully into a
design at all :)

Adjusting timer coalescing to extreme settings might even worsen your
app's performance, since it'll cause interpreter time and syscalls to
all be compressed around the slack intervals, leaving the CPU idle more
often, rather than running evenly spaced over time. This might produce
less desirable app behavior overall (e.g. it has knock-on effects for
network interface queues, bursts of disk IO/SQL queries, network switch
buffer overruns, or whatever else).


David

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread Tobias Oberstein
> > But the coalescing you describe would also apply to those implicit
> > timers (created from select(timeout = ..) within the Linux kernel)?
> 
> It applies to all kernel timers not created by realtime processes.
> 

Alright. I see.

> 
> > This is all fine. But how do I _explicitly_ limit the rate at which
> > select() is called to say 1000Hz (at the expense of timer precision)?
> 
> > I don't want to let the box hit it's syscall rate limit. Because the
> > box will spend a fair amount of resources for context switching all
> > the time with to real gain.
> 
> From a reading of http://lwn.net/Articles/296578/, at time of writing the
> default select() implementation coalesces sub-second timeouts to 50us
> boundaries, and this can be adjusted via prctl(PR_SET_TIMERSLACK)
> (http://linux.die.net/man/2/prctl) on a per-process basis.

Even more interesting! This might be what I'm looking for .. need to read 
through.

> 
> That article is from 2008, and though the relevant kernel code seems to match
> the article content, a huge amount of power-efficiency related changes went
> into the kernel since that time. My assumption is nowadays the kernel rounds
> more aggressively than the default of 50us documented by that article.

Ah, ok. 50us corresponds to 20k syscalls/sec .. which seems well below the 
syscall rate limit on modern boxes.

> 
> Short answer is yes, you can set the max hz of select(), but in all 
> likelihood you
> won't have to. As always, benchmarking and profiling real code might reveal
> this to be a non-issue.

Thanks a lot for those hints! I will read into this material.

/Tobias

> 
> 
> David
> 
> >
> > Thanks for your hints and patience,
> > Tobias
> >
> > >
> > >
> > > On Sun, Aug 10, 2014 at 05:16:51AM -0700, Tobias Oberstein wrote:
> > > > Hi,
> > > >
> > > > I have a question regarding scalability of timers in Twisted.
> > > >
> > > > Say I have a massive number of periodic timers (lets say each with
> > > > period 1s,
> > > but all slightly time shifted to each other).
> > > >
> > > > As far as I understand, timers are implemented ultimately by
> > > > setting the
> > > timeout parameter when calling into OS select/poll/epoll/kqueue.
> > > >
> > > > If this  is true, then the number of timers scales linearly with
> > > > the number of
> > > syscalls. This can get limiting (the total number of syscalls a
> > > Linux box can sustain is a couple of 100k's per second). As more and
> > > more timers are setup, the timeout essentially will approach 0. On
> > > the upside, timers will fire precisely.
> > > >
> > > > However, say I am fine with a precision of 1ms.
> > > >
> > > > Is there a way that limits the syscall rate to 1000/s (given no FD
> > > > activity
> > > happens) _independently_ of the number of timers setup?
> > > >
> > > > Timers that fall into a certain ms slice would all fire roughly at
> > > > the same time
> > > (still ordered).
> > > >
> > > > Is that possible?
> > > >
> > > > Thanks,
> > > > Tobias
> > > >
> > > > ___
> > > > Twisted-Python mailing list
> > > > Twisted-Python@twistedmatrix.com
> > > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> > >
> > > ___
> > > Twisted-Python mailing list
> > > Twisted-Python@twistedmatrix.com
> > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> >
> > ___
> > Twisted-Python mailing list
> > Twisted-Python@twistedmatrix.com
> > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> 
> ___
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread Tobias Oberstein
> >What I am after is to explicitly _control_ the maximum syscall rate to
> >select() - not simply max. out the box on syscall rate.
> >
> >Like: limit syscall rate to select() at 1000Hz - regardless how many
> >timers I issue per second.
> 
> In other words:
> 
> If you ask Twisted to wake up N times per second, is there an API to make
> Twisted wake up M (M 
> Is that what you're looking for?

Yes, exactly.

I want to trade less precision (timers fire at less exact times) for higher 
efficiency (less context switches).

/Tobias

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread exarkun

On 09:38 pm, tobias.oberst...@tavendo.de wrote:


What I am after is to explicitly _control_ the maximum syscall rate to 
select() - not simply max. out the box on syscall rate.


Like: limit syscall rate to select() at 1000Hz - regardless how many 
timers I issue per second.


In other words:

If you ask Twisted to wake up N times per second, is there an API to 
make Twisted wake up M (M

Is that what you're looking for?

Jean-Paul

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread dw+twisted-python
On Sun, Aug 10, 2014 at 02:51:02PM -0700, Tobias Oberstein wrote:

> But the coalescing you describe would also apply to those implicit
> timers (created from select(timeout = ..) within the Linux kernel)?

It applies to all kernel timers not created by realtime processes.


> This is all fine. But how do I _explicitly_ limit the rate at which
> select() is called to say 1000Hz (at the expense of timer precision)?

> I don't want to let the box hit it's syscall rate limit. Because the
> box will spend a fair amount of resources for context switching all
> the time with to real gain.

>From a reading of http://lwn.net/Articles/296578/, at time of writing
the default select() implementation coalesces sub-second timeouts to
50us boundaries, and this can be adjusted via prctl(PR_SET_TIMERSLACK)
(http://linux.die.net/man/2/prctl) on a per-process basis.

That article is from 2008, and though the relevant kernel code seems to
match the article content, a huge amount of power-efficiency related
changes went into the kernel since that time. My assumption is nowadays
the kernel rounds more aggressively than the default of 50us documented
by that article.

Short answer is yes, you can set the max hz of select(), but in all
likelihood you won't have to. As always, benchmarking and profiling real
code might reveal this to be a non-issue.


David

> 
> Thanks for your hints and patience,
> Tobias
> 
> > 
> > 
> > On Sun, Aug 10, 2014 at 05:16:51AM -0700, Tobias Oberstein wrote:
> > > Hi,
> > >
> > > I have a question regarding scalability of timers in Twisted.
> > >
> > > Say I have a massive number of periodic timers (lets say each with period 
> > > 1s,
> > but all slightly time shifted to each other).
> > >
> > > As far as I understand, timers are implemented ultimately by setting the
> > timeout parameter when calling into OS select/poll/epoll/kqueue.
> > >
> > > If this  is true, then the number of timers scales linearly with the 
> > > number of
> > syscalls. This can get limiting (the total number of syscalls a Linux box 
> > can
> > sustain is a couple of 100k's per second). As more and more timers are 
> > setup,
> > the timeout essentially will approach 0. On the upside, timers will fire
> > precisely.
> > >
> > > However, say I am fine with a precision of 1ms.
> > >
> > > Is there a way that limits the syscall rate to 1000/s (given no FD 
> > > activity
> > happens) _independently_ of the number of timers setup?
> > >
> > > Timers that fall into a certain ms slice would all fire roughly at the 
> > > same time
> > (still ordered).
> > >
> > > Is that possible?
> > >
> > > Thanks,
> > > Tobias
> > >
> > > ___
> > > Twisted-Python mailing list
> > > Twisted-Python@twistedmatrix.com
> > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> > 
> > ___
> > Twisted-Python mailing list
> > Twisted-Python@twistedmatrix.com
> > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> 
> ___
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread Tobias Oberstein
> Individual OS have their own mechanisms for avoiding the kind of waste you're
> describing. For example, Linux quite aggressively rounds up the expiry of
> certain classes of timer at progressively less granular intervals the further 
> in
> the future they're scheduled ("timer coalescing").

As far as I understand, Twisted implements timers via the timeout parameter to 
select(), not using Linux explicit timers.

But the coalescing you describe would also apply to those implicit timers 
(created from select(timeout = ..) within the Linux kernel)?

> When Twisted wakes, there is no guarantee that that only one timer has
> expired by then. In fact under load you would expect the select loop to always
> be running (and thus timing out) late, and so each iteration may process
> several timers simultaneously.
> 
> Twisted will set the select() timeout to the timer due to expire the earliest.
> Finding this timer is a constant time operation. There is only ever one active
> select() (or select-equivalent) call active at a time.
> 
> The Twisted timer implementation internally uses a heap, so scheduling and
> expiry are quite efficint O(logN). With 4 billion timers active, scheduling a 
> new
> timer in the worst case would require 32 array elements to be swapped.

This is all fine. But how do I _explicitly_ limit the rate at which select() is 
called to say 1000Hz (at the expense of timer precision)?

I don't want to let the box hit it's syscall rate limit. Because the box will 
spend a fair amount of resources for context switching all the time with to 
real gain.

Thanks for your hints and patience,
Tobias

> 
> 
> On Sun, Aug 10, 2014 at 05:16:51AM -0700, Tobias Oberstein wrote:
> > Hi,
> >
> > I have a question regarding scalability of timers in Twisted.
> >
> > Say I have a massive number of periodic timers (lets say each with period 
> > 1s,
> but all slightly time shifted to each other).
> >
> > As far as I understand, timers are implemented ultimately by setting the
> timeout parameter when calling into OS select/poll/epoll/kqueue.
> >
> > If this  is true, then the number of timers scales linearly with the number 
> > of
> syscalls. This can get limiting (the total number of syscalls a Linux box can
> sustain is a couple of 100k's per second). As more and more timers are setup,
> the timeout essentially will approach 0. On the upside, timers will fire
> precisely.
> >
> > However, say I am fine with a precision of 1ms.
> >
> > Is there a way that limits the syscall rate to 1000/s (given no FD activity
> happens) _independently_ of the number of timers setup?
> >
> > Timers that fall into a certain ms slice would all fire roughly at the same 
> > time
> (still ordered).
> >
> > Is that possible?
> >
> > Thanks,
> > Tobias
> >
> > ___
> > Twisted-Python mailing list
> > Twisted-Python@twistedmatrix.com
> > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> 
> ___
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread Tobias Oberstein
>There is only one select() call (or whatever) at any given time, regardless of 
>how many timers.

Yes, I do understand this.

>Syscalls are thus O(1). Timers are stored in sorted order. When event loop 
>wakes up it removes timers that have been reached, which is fast because 
>they're sorted so when you hit one that is still in future you can stop. So 
>that's pretty scalable.

syscalls are O(1). But the constant is non zero. A syscall is still quite 
expensive. try doing 1 mio. syscalls/sec on any x86 box (Linux, BSD, whatever). 
DEC Alphas and Itanium might be able to do more, but the context switching 
overhead of x86 architecture is "huge".

But I feel I failed in formulating what I am asking.

Could you please correct where my thinking below goes wrong?

Let's say I issue 1 mio. timers with expirary times t0, t0+1us, t0+2us, .., 
t0+1s

That is 1 mio. timers expiring in 1us pitch within 1s.

That will mean 1 mio. select() syscalls done in 1s each with timeout set to 1us

Since it's unlikely that a box supports a rate of 1 mio. syscalls/sec, that 
means a select with timeout 1us won't return after 1us, but >1us. Twisted will 
process all timers that have expired in the meantime. But there is no way of 
letting 1 mio. timers fire in 1us pitch within 1s using a syscall.

What I am after is to explicitly _control_ the maximum syscall rate to select() 
- not simply max. out the box on syscall rate.

Like: limit syscall rate to select() at 1000Hz - regardless how many timers I 
issue per second.

In above example, with syscall max set to 1000Hz, each time select() returns 
1000 timers would have expired. That's fine. That's what I want.

How can I do that?

Thanks!
Tobias


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread dw+twisted-python
Hey Tobias,

Individual OS have their own mechanisms for avoiding the kind of waste
you're describing. For example, Linux quite aggressively rounds up the
expiry of certain classes of timer at progressively less granular
intervals the further in the future they're scheduled ("timer
coalescing").

When Twisted wakes, there is no guarantee that that only one timer has
expired by then. In fact under load you would expect the select loop to
always be running (and thus timing out) late, and so each iteration may
process several timers simultaneously.

Twisted will set the select() timeout to the timer due to expire the
earliest. Finding this timer is a constant time operation. There is only
ever one active select() (or select-equivalent) call active at a time.

The Twisted timer implementation internally uses a heap, so scheduling
and expiry are quite efficint O(logN). With 4 billion timers active,
scheduling a new timer in the worst case would require 32 array elements
to be swapped.


On Sun, Aug 10, 2014 at 05:16:51AM -0700, Tobias Oberstein wrote:
> Hi,
> 
> I have a question regarding scalability of timers in Twisted.
> 
> Say I have a massive number of periodic timers (lets say each with period 1s, 
> but all slightly time shifted to each other).
> 
> As far as I understand, timers are implemented ultimately by setting the 
> timeout parameter when calling into OS select/poll/epoll/kqueue.
> 
> If this  is true, then the number of timers scales linearly with the number 
> of syscalls. This can get limiting (the total number of syscalls a Linux box 
> can sustain is a couple of 100k's per second). As more and more timers are 
> setup, the timeout essentially will approach 0. On the upside, timers will 
> fire precisely.
> 
> However, say I am fine with a precision of 1ms.
> 
> Is there a way that limits the syscall rate to 1000/s (given no FD activity 
> happens) _independently_ of the number of timers setup?
> 
> Timers that fall into a certain ms slice would all fire roughly at the same 
> time (still ordered).
> 
> Is that possible?
> 
> Thanks,
> Tobias
> 
> ___
> Twisted-Python mailing list
> Twisted-Python@twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] Scalability of timers

2014-08-10 Thread Itamar Turner-Trauring
 

There is only one select() call (or whatever) at any given time,
regardless of how many timers. Syscalls are thus O(1). Timers are stored
in sorted order. When event loop wakes up it removes timers that have
been reached, which is fast because they're sorted so when you hit one
that is still in future you can stop. So that's pretty scalable. ___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


[Twisted-Python] Scalability of timers

2014-08-10 Thread Tobias Oberstein
Hi,

I have a question regarding scalability of timers in Twisted.

Say I have a massive number of periodic timers (lets say each with period 1s, 
but all slightly time shifted to each other).

As far as I understand, timers are implemented ultimately by setting the 
timeout parameter when calling into OS select/poll/epoll/kqueue.

If this  is true, then the number of timers scales linearly with the number of 
syscalls. This can get limiting (the total number of syscalls a Linux box can 
sustain is a couple of 100k's per second). As more and more timers are setup, 
the timeout essentially will approach 0. On the upside, timers will fire 
precisely.

However, say I am fine with a precision of 1ms.

Is there a way that limits the syscall rate to 1000/s (given no FD activity 
happens) _independently_ of the number of timers setup?

Timers that fall into a certain ms slice would all fire roughly at the same 
time (still ordered).

Is that possible?

Thanks,
Tobias

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python