Re: [Twisted-Python] Scalability of timers
> Just a final note.. a single no-fds call to select with a 0 timeout seems to > take > around 280ns on my Core 2. Presumably the better interfaces (e.g. epoll, but Are you sure there is actually a context switch happening with this syscall using no FDs and timeout 0? 280ns means your machine can do 3.5 millions selects() per second. Mmh. I have a hard time believing this would sustain with timeout>0 and/or FDs - anywhere near that magnitude. http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html > not poll) will also take around the same time. > > It's really hard to write even a single Python function that gets anywhere > below > 1usec CPU time, and given how function-heavy Twisted is, I'd be surprised PyPy: import timeit import time def f1(): pass def f2(x = 3): return x*x def f3(): return map(lambda x: x^2, range(10)) for f in [f1, f2, f3]: for i in range(5): print f, timeit.timeit(f, number = 100) time.sleep(2) oberstet@vbox-ubuntu1310:~/scm/scratchbox/python/twisted/timers$ ~/pypy-2.3-linux64/bin/pypy test2.py 0.00407981872559 0.00138401985168 0.00138092041016 0.00148296356201 0.00153708457947 0.0269131660461 0.0319480895996 0.0243380069733 0.0251939296722 0.025454044342 0.474506855011 0.452003002167 0.446584939957 0.441245079041 0.467758893967 This is in a VM on low-end gear. My notebook actually. With trivial functions like above, PyPy might even optimize away funs altogether and inline those. Not sure though. > considerations like this factored usefully into a design at all :) > > Adjusting timer coalescing to extreme settings might even worsen your app's > performance, since it'll cause interpreter time and syscalls to all be > compressed around the slack intervals, leaving the CPU idle more often, rather > than running evenly spaced over time. This might produce less desirable app > behavior overall (e.g. it has knock-on effects for network interface queues, > bursts of disk IO/SQL queries, network switch buffer overruns, or whatever > else). I can see those points. Will keep in mind and measure. However, I am running a multi-process server, so as long as those processes tick slightly pitched w.r.t. each other, and there are more processes than cores, I guess I can saturate the CPUs. And then this is a network server, so there will be FD activity (lots of). It's just that I want to limit the impacts of having massive amounts of timers associated with network connections (ping/pong to detect lost TCP). In any case: thanks a lot for all your hints! /Tobias > > > David > > ___ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
>>I want to trade less precision (timers fire at less exact times) for higher >>efficiency (less context switches). >It's easy enough to write one yourself. This might work: >from twisted.internet.task import Clock, LoopingCall > >clock = Clock() >LoopingCall(lambda: clock.advance(0.001)).start(0.001) >Now just do "clock.callLater" instead of "reactor.callLater". Oh, cool. That make me smile;) Does what I want, is simple and portable. Great. Only worry is http://twistedmatrix.com/trac/browser/tags/releases/twisted-14.0.0/twisted/internet/task.py#L767 Why does it sort after each and every callLater? And: http://twistedmatrix.com/trac/browser/tags/releases/twisted-14.0.0/twisted/internet/task.py#L793 It also sorts after each firing of a delayed call. Presumably because that delayed call might reschedule another call that might also fire in same time period? /Tobias ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
On 08/10/2014 06:23 PM, Tobias Oberstein wrote: I want to trade less precision (timers fire at less exact times) for higher efficiency (less context switches). It's easy enough to write one yourself. This might work: from twisted.internet.task import Clock, LoopingCall clock = Clock() LoopingCall(lambda: clock.advance(0.001)).start(0.001) Now just do "clock.callLater" instead of "reactor.callLater". ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
On Sun, Aug 10, 2014 at 03:31:11PM -0700, Tobias Oberstein wrote: > Thanks a lot for those hints! I will read into this material. Just a final note.. a single no-fds call to select with a 0 timeout seems to take around 280ns on my Core 2. Presumably the better interfaces (e.g. epoll, but not poll) will also take around the same time. It's really hard to write even a single Python function that gets anywhere below 1usec CPU time, and given how function-heavy Twisted is, I'd be surprised considerations like this factored usefully into a design at all :) Adjusting timer coalescing to extreme settings might even worsen your app's performance, since it'll cause interpreter time and syscalls to all be compressed around the slack intervals, leaving the CPU idle more often, rather than running evenly spaced over time. This might produce less desirable app behavior overall (e.g. it has knock-on effects for network interface queues, bursts of disk IO/SQL queries, network switch buffer overruns, or whatever else). David ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
> > But the coalescing you describe would also apply to those implicit > > timers (created from select(timeout = ..) within the Linux kernel)? > > It applies to all kernel timers not created by realtime processes. > Alright. I see. > > > This is all fine. But how do I _explicitly_ limit the rate at which > > select() is called to say 1000Hz (at the expense of timer precision)? > > > I don't want to let the box hit it's syscall rate limit. Because the > > box will spend a fair amount of resources for context switching all > > the time with to real gain. > > From a reading of http://lwn.net/Articles/296578/, at time of writing the > default select() implementation coalesces sub-second timeouts to 50us > boundaries, and this can be adjusted via prctl(PR_SET_TIMERSLACK) > (http://linux.die.net/man/2/prctl) on a per-process basis. Even more interesting! This might be what I'm looking for .. need to read through. > > That article is from 2008, and though the relevant kernel code seems to match > the article content, a huge amount of power-efficiency related changes went > into the kernel since that time. My assumption is nowadays the kernel rounds > more aggressively than the default of 50us documented by that article. Ah, ok. 50us corresponds to 20k syscalls/sec .. which seems well below the syscall rate limit on modern boxes. > > Short answer is yes, you can set the max hz of select(), but in all > likelihood you > won't have to. As always, benchmarking and profiling real code might reveal > this to be a non-issue. Thanks a lot for those hints! I will read into this material. /Tobias > > > David > > > > > Thanks for your hints and patience, > > Tobias > > > > > > > > > > > On Sun, Aug 10, 2014 at 05:16:51AM -0700, Tobias Oberstein wrote: > > > > Hi, > > > > > > > > I have a question regarding scalability of timers in Twisted. > > > > > > > > Say I have a massive number of periodic timers (lets say each with > > > > period 1s, > > > but all slightly time shifted to each other). > > > > > > > > As far as I understand, timers are implemented ultimately by > > > > setting the > > > timeout parameter when calling into OS select/poll/epoll/kqueue. > > > > > > > > If this is true, then the number of timers scales linearly with > > > > the number of > > > syscalls. This can get limiting (the total number of syscalls a > > > Linux box can sustain is a couple of 100k's per second). As more and > > > more timers are setup, the timeout essentially will approach 0. On > > > the upside, timers will fire precisely. > > > > > > > > However, say I am fine with a precision of 1ms. > > > > > > > > Is there a way that limits the syscall rate to 1000/s (given no FD > > > > activity > > > happens) _independently_ of the number of timers setup? > > > > > > > > Timers that fall into a certain ms slice would all fire roughly at > > > > the same time > > > (still ordered). > > > > > > > > Is that possible? > > > > > > > > Thanks, > > > > Tobias > > > > > > > > ___ > > > > Twisted-Python mailing list > > > > Twisted-Python@twistedmatrix.com > > > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > > > > > ___ > > > Twisted-Python mailing list > > > Twisted-Python@twistedmatrix.com > > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > > > ___ > > Twisted-Python mailing list > > Twisted-Python@twistedmatrix.com > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > ___ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
> >What I am after is to explicitly _control_ the maximum syscall rate to > >select() - not simply max. out the box on syscall rate. > > > >Like: limit syscall rate to select() at 1000Hz - regardless how many > >timers I issue per second. > > In other words: > > If you ask Twisted to wake up N times per second, is there an API to make > Twisted wake up M (M > Is that what you're looking for? Yes, exactly. I want to trade less precision (timers fire at less exact times) for higher efficiency (less context switches). /Tobias ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
On 09:38 pm, tobias.oberst...@tavendo.de wrote: What I am after is to explicitly _control_ the maximum syscall rate to select() - not simply max. out the box on syscall rate. Like: limit syscall rate to select() at 1000Hz - regardless how many timers I issue per second. In other words: If you ask Twisted to wake up N times per second, is there an API to make Twisted wake up M (M Is that what you're looking for? Jean-Paul ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
On Sun, Aug 10, 2014 at 02:51:02PM -0700, Tobias Oberstein wrote: > But the coalescing you describe would also apply to those implicit > timers (created from select(timeout = ..) within the Linux kernel)? It applies to all kernel timers not created by realtime processes. > This is all fine. But how do I _explicitly_ limit the rate at which > select() is called to say 1000Hz (at the expense of timer precision)? > I don't want to let the box hit it's syscall rate limit. Because the > box will spend a fair amount of resources for context switching all > the time with to real gain. >From a reading of http://lwn.net/Articles/296578/, at time of writing the default select() implementation coalesces sub-second timeouts to 50us boundaries, and this can be adjusted via prctl(PR_SET_TIMERSLACK) (http://linux.die.net/man/2/prctl) on a per-process basis. That article is from 2008, and though the relevant kernel code seems to match the article content, a huge amount of power-efficiency related changes went into the kernel since that time. My assumption is nowadays the kernel rounds more aggressively than the default of 50us documented by that article. Short answer is yes, you can set the max hz of select(), but in all likelihood you won't have to. As always, benchmarking and profiling real code might reveal this to be a non-issue. David > > Thanks for your hints and patience, > Tobias > > > > > > > On Sun, Aug 10, 2014 at 05:16:51AM -0700, Tobias Oberstein wrote: > > > Hi, > > > > > > I have a question regarding scalability of timers in Twisted. > > > > > > Say I have a massive number of periodic timers (lets say each with period > > > 1s, > > but all slightly time shifted to each other). > > > > > > As far as I understand, timers are implemented ultimately by setting the > > timeout parameter when calling into OS select/poll/epoll/kqueue. > > > > > > If this is true, then the number of timers scales linearly with the > > > number of > > syscalls. This can get limiting (the total number of syscalls a Linux box > > can > > sustain is a couple of 100k's per second). As more and more timers are > > setup, > > the timeout essentially will approach 0. On the upside, timers will fire > > precisely. > > > > > > However, say I am fine with a precision of 1ms. > > > > > > Is there a way that limits the syscall rate to 1000/s (given no FD > > > activity > > happens) _independently_ of the number of timers setup? > > > > > > Timers that fall into a certain ms slice would all fire roughly at the > > > same time > > (still ordered). > > > > > > Is that possible? > > > > > > Thanks, > > > Tobias > > > > > > ___ > > > Twisted-Python mailing list > > > Twisted-Python@twistedmatrix.com > > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > > > ___ > > Twisted-Python mailing list > > Twisted-Python@twistedmatrix.com > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > ___ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
> Individual OS have their own mechanisms for avoiding the kind of waste you're > describing. For example, Linux quite aggressively rounds up the expiry of > certain classes of timer at progressively less granular intervals the further > in > the future they're scheduled ("timer coalescing"). As far as I understand, Twisted implements timers via the timeout parameter to select(), not using Linux explicit timers. But the coalescing you describe would also apply to those implicit timers (created from select(timeout = ..) within the Linux kernel)? > When Twisted wakes, there is no guarantee that that only one timer has > expired by then. In fact under load you would expect the select loop to always > be running (and thus timing out) late, and so each iteration may process > several timers simultaneously. > > Twisted will set the select() timeout to the timer due to expire the earliest. > Finding this timer is a constant time operation. There is only ever one active > select() (or select-equivalent) call active at a time. > > The Twisted timer implementation internally uses a heap, so scheduling and > expiry are quite efficint O(logN). With 4 billion timers active, scheduling a > new > timer in the worst case would require 32 array elements to be swapped. This is all fine. But how do I _explicitly_ limit the rate at which select() is called to say 1000Hz (at the expense of timer precision)? I don't want to let the box hit it's syscall rate limit. Because the box will spend a fair amount of resources for context switching all the time with to real gain. Thanks for your hints and patience, Tobias > > > On Sun, Aug 10, 2014 at 05:16:51AM -0700, Tobias Oberstein wrote: > > Hi, > > > > I have a question regarding scalability of timers in Twisted. > > > > Say I have a massive number of periodic timers (lets say each with period > > 1s, > but all slightly time shifted to each other). > > > > As far as I understand, timers are implemented ultimately by setting the > timeout parameter when calling into OS select/poll/epoll/kqueue. > > > > If this is true, then the number of timers scales linearly with the number > > of > syscalls. This can get limiting (the total number of syscalls a Linux box can > sustain is a couple of 100k's per second). As more and more timers are setup, > the timeout essentially will approach 0. On the upside, timers will fire > precisely. > > > > However, say I am fine with a precision of 1ms. > > > > Is there a way that limits the syscall rate to 1000/s (given no FD activity > happens) _independently_ of the number of timers setup? > > > > Timers that fall into a certain ms slice would all fire roughly at the same > > time > (still ordered). > > > > Is that possible? > > > > Thanks, > > Tobias > > > > ___ > > Twisted-Python mailing list > > Twisted-Python@twistedmatrix.com > > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python > > ___ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
>There is only one select() call (or whatever) at any given time, regardless of >how many timers. Yes, I do understand this. >Syscalls are thus O(1). Timers are stored in sorted order. When event loop >wakes up it removes timers that have been reached, which is fast because >they're sorted so when you hit one that is still in future you can stop. So >that's pretty scalable. syscalls are O(1). But the constant is non zero. A syscall is still quite expensive. try doing 1 mio. syscalls/sec on any x86 box (Linux, BSD, whatever). DEC Alphas and Itanium might be able to do more, but the context switching overhead of x86 architecture is "huge". But I feel I failed in formulating what I am asking. Could you please correct where my thinking below goes wrong? Let's say I issue 1 mio. timers with expirary times t0, t0+1us, t0+2us, .., t0+1s That is 1 mio. timers expiring in 1us pitch within 1s. That will mean 1 mio. select() syscalls done in 1s each with timeout set to 1us Since it's unlikely that a box supports a rate of 1 mio. syscalls/sec, that means a select with timeout 1us won't return after 1us, but >1us. Twisted will process all timers that have expired in the meantime. But there is no way of letting 1 mio. timers fire in 1us pitch within 1s using a syscall. What I am after is to explicitly _control_ the maximum syscall rate to select() - not simply max. out the box on syscall rate. Like: limit syscall rate to select() at 1000Hz - regardless how many timers I issue per second. In above example, with syscall max set to 1000Hz, each time select() returns 1000 timers would have expired. That's fine. That's what I want. How can I do that? Thanks! Tobias ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
Hey Tobias, Individual OS have their own mechanisms for avoiding the kind of waste you're describing. For example, Linux quite aggressively rounds up the expiry of certain classes of timer at progressively less granular intervals the further in the future they're scheduled ("timer coalescing"). When Twisted wakes, there is no guarantee that that only one timer has expired by then. In fact under load you would expect the select loop to always be running (and thus timing out) late, and so each iteration may process several timers simultaneously. Twisted will set the select() timeout to the timer due to expire the earliest. Finding this timer is a constant time operation. There is only ever one active select() (or select-equivalent) call active at a time. The Twisted timer implementation internally uses a heap, so scheduling and expiry are quite efficint O(logN). With 4 billion timers active, scheduling a new timer in the worst case would require 32 array elements to be swapped. On Sun, Aug 10, 2014 at 05:16:51AM -0700, Tobias Oberstein wrote: > Hi, > > I have a question regarding scalability of timers in Twisted. > > Say I have a massive number of periodic timers (lets say each with period 1s, > but all slightly time shifted to each other). > > As far as I understand, timers are implemented ultimately by setting the > timeout parameter when calling into OS select/poll/epoll/kqueue. > > If this is true, then the number of timers scales linearly with the number > of syscalls. This can get limiting (the total number of syscalls a Linux box > can sustain is a couple of 100k's per second). As more and more timers are > setup, the timeout essentially will approach 0. On the upside, timers will > fire precisely. > > However, say I am fine with a precision of 1ms. > > Is there a way that limits the syscall rate to 1000/s (given no FD activity > happens) _independently_ of the number of timers setup? > > Timers that fall into a certain ms slice would all fire roughly at the same > time (still ordered). > > Is that possible? > > Thanks, > Tobias > > ___ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Re: [Twisted-Python] Scalability of timers
There is only one select() call (or whatever) at any given time, regardless of how many timers. Syscalls are thus O(1). Timers are stored in sorted order. When event loop wakes up it removes timers that have been reached, which is fast because they're sorted so when you hit one that is still in future you can stop. So that's pretty scalable. ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
[Twisted-Python] Scalability of timers
Hi, I have a question regarding scalability of timers in Twisted. Say I have a massive number of periodic timers (lets say each with period 1s, but all slightly time shifted to each other). As far as I understand, timers are implemented ultimately by setting the timeout parameter when calling into OS select/poll/epoll/kqueue. If this is true, then the number of timers scales linearly with the number of syscalls. This can get limiting (the total number of syscalls a Linux box can sustain is a couple of 100k's per second). As more and more timers are setup, the timeout essentially will approach 0. On the upside, timers will fire precisely. However, say I am fine with a precision of 1ms. Is there a way that limits the syscall rate to 1000/s (given no FD activity happens) _independently_ of the number of timers setup? Timers that fall into a certain ms slice would all fire roughly at the same time (still ordered). Is that possible? Thanks, Tobias ___ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python