[Haskell-cafe] Re: Joels Time Leak
Bulat Ziganshin wrote: Hello Simon, Wednesday, January 04, 2006, 7:33:22 PM, you wrote: The minimum time between context switches is 20 milliseconds. SM> Sure, there's no reason why we couldn't do this. Of course, even SM> idle Haskell processes will be ticking away in the background, so SM> there's a reason not to make the interval too short. What do SM> you think is reasonable? Simon, the talk is about changing GHC _tick_, which is a _minimal_ possible context switch interval SM> Yes, I know. in this case, why you say that idle processes will be ticking in background? it's entirely up to program writers/users if they wants to decrease this interval. by default switches will occur in the same 20 ms, lesser tick will have no impact on ordinary programs as long as default switching time will be not changed It will have an impact, because the tick signal has to be delivered to the process and the signal handler run. The impact on CPU time is small, but could be noticeable if we were to choose too small an interval. Furthermore, the ticker prevents idle Haskell processes from being completely swapped out (that problem already exists). Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Joels Time Leak
Bulat Ziganshin wrote: Tuesday, January 03, 2006, 7:43:21 PM, you wrote: The minimum time between context switches is 20 milliseconds. Is there any good reason why 0.02 seconds is the best that you can get here? Couldn't GHC's internal timer tick at a _much_ faster rate (like 50-100µs or so)? SM> Sure, there's no reason why we couldn't do this. Of course, even SM> idle Haskell processes will be ticking away in the background, so SM> there's a reason not to make the interval too short. What do SM> you think is reasonable? Simon, the talk is about changing GHC _tick_, which is a _minimal_ possible context switch interval. so, we want to decrease this tick and retain current 20 ms _default_ switch interval. this will make possible to decrease switch interval for programs that really need it, which is currently entirely impossible. Yes, I know. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Re[2]: [Haskell-cafe] Re: Joels Time Leak
This is my latest version. Based on Don's tweaks. {-# INLINE sequ #-} sequ :: (b -> a) -> PU a -> (a -> PU b) -> PU b sequ a b c | a `seq` b `seq` c `seq` False = undefined sequ f pa k = PU fn1 fn2 fn3 where {-# INLINE fn1 #-} fn1 ptr b = case f b of a -> case k a of pb -> do ptr' <- appP pa ptr a appP pb ptr' b {-# INLINE fn2 #-} fn2 ptr = do (a, ptr') <- appU pa ptr case k a of pb -> appU pb ptr' {-# INLINE fn3 #-} fn3 b = case f b of a -> case k a of pb -> do sz1 <- appS pa a sz2 <- appS pb b return $! sz1 + sz2 On Jan 4, 2006, at 4:18 PM, Bulat Ziganshin wrote: are you tried to inline it? and all other pickling combinators the problem is what when you write put (Cmd a b) = do putWord16 a; putWord32 b and inline putWord16/putWord32, you can be sure that you will get sequencing for free. but what is a pickling combinators? it's a high-order functions, which combines drivers for simple types like Word16 to final driver which can read entire Command. the principial question - will this final driver be interpreted, i.e. executed as a large number of enclosed calls to pickler combination functions, or it will be compiled, i.e. executed as simple sequence of getByte calls, which then builds the final value. in first case your program will spend all its time in these combinator funtions calls, so you will not have much effect from using Ptrs in elementary picklers -- http://wagerlabs.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Joels Time Leak
Bulat Ziganshin wrote: Tuesday, January 03, 2006, 7:43:21 PM, you wrote: The minimum time between context switches is 20 milliseconds. Is there any good reason why 0.02 seconds is the best that you can get here? Couldn't GHC's internal timer tick at a _much_ faster rate (like 50-100µs or so)? SM> Sure, there's no reason why we couldn't do this. Of course, even SM> idle Haskell processes will be ticking away in the background, so SM> there's a reason not to make the interval too short. What do SM> you think is reasonable? Simon, the talk is about changing GHC _tick_, which is a _minimal_ possible context switch interval. so, we want to decrease this tick and retain current 20 ms _default_ switch interval. this will make possible to decrease switch interval for programs that really need it, which is currently entirely impossible. Yes, I know. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Re: Joels Time Leak
Hello Joel, Wednesday, January 04, 2006, 12:42:24 AM, you wrote: JR> contribute to my delays and timeouts. There are also quite a few JR> unanswered questions at the moment (why is 'sequ' slow? are you tried to inline it? and all other pickling combinators the problem is what when you write put (Cmd a b) = do putWord16 a; putWord32 b and inline putWord16/putWord32, you can be sure that you will get sequencing for free. but what is a pickling combinators? it's a high-order functions, which combines drivers for simple types like Word16 to final driver which can read entire Command. the principial question - will this final driver be interpreted, i.e. executed as a large number of enclosed calls to pickler combination functions, or it will be compiled, i.e. executed as simple sequence of getByte calls, which then builds the final value. in first case your program will spend all its time in these combinator funtions calls, so you will not have much effect from using Ptrs in elementary picklers because of this issue i said you that pickler combinators can't guarantee performance, in constrast to Binary package which uses an artless approach to combine individual "picklers" - separate get and put fucntions so that each one becomes an straightforward imperative program as opposite to tuple carrying several functional values JR> does the scheduler need to be tuned?) try something like this: forever recvPacket withMVar global unzip unpickle runScript sendAnswer yield it will ensure that commands will be processed sequentially; i think it's the best you can do in this program: all tasks inside lock are cpu-bound, so it is better to finish them in one thread before going to another in production code you will also need to guard whole withMVar block with small timeout (say, 0.02s) -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Joels Time Leak
Joel Reymont wrote: I don't think CPU usage is the issue. An individual thread will take a fraction of a second to deserialize a large packet. It's a combination of CPU usage by the pickler and GC load. Those 50k packets take 0.03 seconds to unpickle (version of unstuff.hs to measure that is attached). With 100 threads running, even with a completely fair scheduler the time taken for one thread to unpickle that packet is going to be 3 seconds. If optimisation is turned on, the time to unpickle that packet goes down to 0.007 seconds (on my machine). So, it should be more like 400 threads before a fair scheduler would run into problems, but that doesn't take into account GC load, which increases with more threads running, so in fact you still run into trouble with 50 threads. You can reduce the GC load using RTS options: eg. +RTS -H256m and that will reduce the number of timeouts you get. GHC's scheduler may not be completely fair, but I haven't found anything gratuitous in my investigations. Sometimes the time between context switches is more like 0.04 seconds instead of 0.02 seconds, and I still don't understand exactly why, but that's not a serious issue. Somtimes a thread is unlucky enough to have to do a major GC during its timeslice, so it doesn't get its fair share of CPU, but the effect is random and therefore amortized (this isn't a realtime system, after all). What can you do about this? (a) improve performance of the unpickler. As you say, a 2x boost here will double the number of threads you can run in parallel in the time limits you have. (b) try to reduce your heap residency, which will reduce GC load which again means you can run more threads in parallel without hitting your limits. (b) try to manage your latency better, by limiting the number of threads that try to unpickle in parallel. You may reduce the GC load this way, too. Or failing that, just get faster hardware. Or more CPUs, and use GHC's new SMP support. I'm surprised if your real application is this CPU-bound, though. The network communication latency should mean you can run a lot more threads, provided you can improve that pickler so it isn't the bottleneck. I don't have a lot of time to investigate the unpickling code in detail, but I have worked on similar problems in the past and I know that the unpickler in GHC is very fast, for example. It is derived from the original nhc98 interface, with tweaks by me to improve performance, and later NewBinary was derived from it. I haven't measured NewBinary's performance relative to GHC's Binary library, but I don't expect there to be much difference. Cheers, Simon module Main where import System.IO import System.Time import System.Environment import Control.Monad import Control.Concurrent import Control.Exception import Foreign import Pickle import Endian import Util import ZLib import Records import Prelude hiding (read) import Text.Printf main = do args <- getArgs process (head args) 100 waitToFinish {-# NOINLINE lock #-} lock :: MVar () lock = unsafePerformIO $ newMVar () trace s = withMVar lock $ const $ putStrLn s process _ 0 = return () process file n = do h <- openBinaryFile file ReadMode forkChild $ read_ h process file (n - 1) read_ :: Handle -> IO () read_ h = do cmd <- read h (\_ -> return ()) -- lots of ALERTs -- you should not get any alerts if you pass in trace -- below and comment the line above. the lock synch seems -- to have a magical effect -- cmd <- read h trace eof <- hIsEOF h unless eof $ read_ h read :: Handle -> (String -> IO ()) -> IO Command read h trace = do TOD time1 _ <- getClockTime allocaBytes 4 $ \p1 -> do hGetBuf h p1 4 TOD time2 _ <- getClockTime (size', _) <- unpickle endian32 p1 0 TOD time3 _ <- getClockTime let size = fromIntegral $ size' - 4 allocaBytes size $ \packet -> do TOD time4 _ <- getClockTime hGetBuf h packet size TOD time5 _ <- getClockTime cmd <- unstuff packet 0 size TOD time6 _ <- getClockTime trace $ "read: " ++ cmdDesc cmd ++ ": " ++ show (time6 - time1) ++ "s: " ++ show (time2 - time1) ++ "s, " ++ show (time3 - time2) ++ "s, " ++ show (time4 - time3) ++ "s, " ++ show (time5 - time4) ++ "s, " ++ show (time6 - time5) ++ "s" when (time6 - time5 > 3) $ fail $ "RED ALERT: time: " ++ show (time6 - time5) ++ "s, size: " ++ show size' ++ ", cmd
Re[2]: [Haskell-cafe] Re: Joels Time Leak
Hello Simon, Tuesday, January 03, 2006, 7:43:21 PM, you wrote: >> The minimum time between context switches is 20 milliseconds. >> >> Is there any good reason why 0.02 seconds is the best that you can get >> here? Couldn't GHC's internal timer tick at a _much_ faster rate (like >> 50-100µs or so)? SM> Sure, there's no reason why we couldn't do this. Of course, even SM> idle Haskell processes will be ticking away in the background, so SM> there's a reason not to make the interval too short. What do SM> you think is reasonable? Simon, the talk is about changing GHC _tick_, which is a _minimal_ possible context switch interval. so, we want to decrease this tick and retain current 20 ms _default_ switch interval. this will make possible to decrease switch interval for programs that really need it, which is currently entirely impossible. -C[]: Sets the context switch interval to seconds. A context switch will occur at the next heap block allocation after the timer expires (a heap block allocation occurs every 4k of allocation). With -C0 or -C, context switches will occur as often as possible (at every heap block allocation). By default, context switches occur every 20ms milliseconds. Note that GHC's internal timer ticks every 20ms, and the context switch timer is always a multiple of this timer, so 20ms is the maximum granularity available for timed context switches. -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Joels Time Leak
On 03 January 2006 17:32, Chris Kuklewicz wrote: > Thanks for the answer, but I should I written a longer comment. I have > added such a longer comment below: > > Simon Marlow wrote: >> Chris Kuklewicz wrote: >> >>> Another comment: between 1000's of threads and writing a custom >>> continuation based scheduler, what about using a thread pool? Does >>> anyone have a library with a "fork-IO-Pool" command? >> >> >> You don't need a thread pool, because threads are so cheap. Thread >> pools are just a workaround for lack of lightweight concurrency. >> >> Cheers, >> Simon >> > > Since the round-robin scheduler has (0.02 * N) seconds of delay for N > therads, then one could trade off latency between time spent waiting > for the thread pool to start a job and time spend running the job and > getting interrupted. > > In the limit of 1 worker thread, all the latency is waiting to get > run, and there are no interruptions, so the time taken *while > running* is very short. With 10 threads, there can be a delay to > start, and each interruption adds 0.2 seconds to the job's run time > once it has started. > > For a server, the client requests queue up and wait for room in the > thread pool, and the pool is kept small enough that the > round-robin-schedular-delay keeps requests from timing out while being > serviced. Otherwise 1000 client requests would cause 20 seconds of > reschedule penalty for all threads and they could all timeout. With a > thread pool, one can drop threads that have been waiting for too long > instead of running them, so those threads will timeout. But the pool > keeps servicing at least some of the client requests on time. > > All hypothetical to me, of course. I see - you want to use a thread pool as a way of restricting parallelism, so as to control latency. Good idea, it would make a useful library. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Joels Time Leak
The timeleak code is just a repro case. In real life I'm reading from sockets as opposed to a file. All I'm trying to do is run poker bots. They talk to the server and play poker. Of course some events are more important than others, a request to make a bet is more important than, say, a table update. I do need to run as many poker bots as I can. I think that my customer's goal of 4,000 bots is unattainable in a single app. It's probably possible per machine. Overall, I find this too complex to manage with Haskell as there are many factors that can contribute to my delays and timeouts. There are also quite a few unanswered questions at the moment (why is 'sequ' slow? does the scheduler need to be tuned?) that leave me scratching my head. On Jan 3, 2006, at 9:17 PM, S. Alexander Jacobson wrote: You should be grouping incoming events into queues by expected workload/event. Then you can give the client fairly reliable information about how long it will have to wait based on the size of the queue on which event is waiting. And if you have no way to differentiate between event workloads a priori then you really can't be giving clients response guarantees and need to rethink your business logic. -- http://wagerlabs.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Joels Time Leak
Joel, In most cases, it just doesn't make sense to run 1000 threads simultaneously that are all bottlenecked on the same resource (e.g. CPU/memory) See e.g. http://www.eecs.harvard.edu/~mdw/proj/seda/ You should be grouping incoming events into queues by expected workload/event. Then you can give the client fairly reliable information about how long it will have to wait based on the size of the queue on which event is waiting. And if you have no way to differentiate between event workloads a priori then you really can't be giving clients response guarantees and need to rethink your business logic. FYI: I actually created a Haskell application server based on this logic called HAppS (see http://happs.org) and am in the process of getting binaryIO added to it. -Alex- __ S. Alexander Jacobson tel:917-770-6565 http://alexjacobson.com On Tue, 3 Jan 2006, Joel Reymont wrote: Simon, I don't think CPU usage is the issue. An individual thread will take a fraction of a second to deserialize a large packet. The issue is that, as you pointed out, you can get alerts even with 50 threads. Those fractions of a second add up in a certain way that's detrimental to the performance of the app. The timeleak code uses Ptr Word8 to pickle which should be very efficient. I believe the delay comes from the way 'sequ' is compiled by GHC. I'll take the liberty of quoting Andrew Kennedy (your colleague from MS Research) who wrote the picklers: -- My original pickler implementation was for SML. It was used in the MLj compiler, and is still used in the SML.NET compiler, and has acceptable performance (few ms pickling/unpickling for typical intermediate language object files). I must admit that I've not used the Haskell variant in anger. Apart from the inherent slowdown associated with laziness, is there a particular reason for poor performance? -- 'sequ' by itself does not seem like a big deal but when used to model records it builds a large nested lambda-list and I don't think that list is being compiled efficiently. I would appreciate if you could look at that and issue a verdict now that Andrew cofirms using the picklers in a real-life environment and w/o major problems. Suppose I chose a different implementation of binary IO and disposed of pickler combinators. Suppose I gained a 2x speed-up by doing so. I would now be getting alerts with 100 threads instead of 50, no? That's still far from ideal. Joel On Jan 3, 2006, at 4:43 PM, Simon Marlow wrote: The reason things are the way they are is that a large number of *running* threads is not a workload we've optimised for. In fact, Joel's program is the first one I've seen with a lot of running threads, apart from our testsuite. And I suspect that when Joel uses a better binary I/O implementation a lot of that CPU usage will disappear. -- http://wagerlabs.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Joels Time Leak
On Tue, Jan 03, 2006 at 02:30:53PM +, Simon Marlow wrote: > I measured the time taken to unpickle those large 50k packets as 0.3 > seconds on my amd64 box (program compiled *without* optimisation), so > the thread can get descheduled twice during while unpickling a large > packet, giving a >4s delay with 100 threads running. I might have made an error when counting the packets. I simply placed a putStrLn in "read", but some of the packets are nested (SrvCompressedCommands), so "read" is called more than once for a top-level packet. Best regards Tomasz -- I am searching for programmers who are good at least in (Haskell || ML) && (Linux || FreeBSD || math) for work in Warsaw, Poland ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Joels Time Leak
Thanks for the answer, but I should I written a longer comment. I have added such a longer comment below: Simon Marlow wrote: > Chris Kuklewicz wrote: > >> Another comment: between 1000's of threads and writing a custom >> continuation based scheduler, what about using a thread pool? Does >> anyone have a library with a "fork-IO-Pool" command? > > > You don't need a thread pool, because threads are so cheap. Thread > pools are just a workaround for lack of lightweight concurrency. > > Cheers, > Simon > Since the round-robin scheduler has (0.02 * N) seconds of delay for N therads, then one could trade off latency between time spent waiting for the thread pool to start a job and time spend running the job and getting interrupted. In the limit of 1 worker thread, all the latency is waiting to get run, and there are no interruptions, so the time taken *while running* is very short. With 10 threads, there can be a delay to start, and each interruption adds 0.2 seconds to the job's run time once it has started. For a server, the client requests queue up and wait for room in the thread pool, and the pool is kept small enough that the round-robin-schedular-delay keeps requests from timing out while being serviced. Otherwise 1000 client requests would cause 20 seconds of reschedule penalty for all threads and they could all timeout. With a thread pool, one can drop threads that have been waiting for too long instead of running them, so those threads will timeout. But the pool keeps servicing at least some of the client requests on time. All hypothetical to me, of course. -- Chris ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Joels Time Leak
Simon, I don't think CPU usage is the issue. An individual thread will take a fraction of a second to deserialize a large packet. The issue is that, as you pointed out, you can get alerts even with 50 threads. Those fractions of a second add up in a certain way that's detrimental to the performance of the app. The timeleak code uses Ptr Word8 to pickle which should be very efficient. I believe the delay comes from the way 'sequ' is compiled by GHC. I'll take the liberty of quoting Andrew Kennedy (your colleague from MS Research) who wrote the picklers: -- My original pickler implementation was for SML. It was used in the MLj compiler, and is still used in the SML.NET compiler, and has acceptable performance (few ms pickling/unpickling for typical intermediate language object files). I must admit that I've not used the Haskell variant in anger. Apart from the inherent slowdown associated with laziness, is there a particular reason for poor performance? -- 'sequ' by itself does not seem like a big deal but when used to model records it builds a large nested lambda-list and I don't think that list is being compiled efficiently. I would appreciate if you could look at that and issue a verdict now that Andrew cofirms using the picklers in a real-life environment and w/o major problems. Suppose I chose a different implementation of binary IO and disposed of pickler combinators. Suppose I gained a 2x speed-up by doing so. I would now be getting alerts with 100 threads instead of 50, no? That's still far from ideal. Joel On Jan 3, 2006, at 4:43 PM, Simon Marlow wrote: The reason things are the way they are is that a large number of *running* threads is not a workload we've optimised for. In fact, Joel's program is the first one I've seen with a lot of running threads, apart from our testsuite. And I suspect that when Joel uses a better binary I/O implementation a lot of that CPU usage will disappear. -- http://wagerlabs.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Joels Time Leak
On 1/3/06, Simon Marlow <[EMAIL PROTECTED]> wrote: > On 03 January 2006 15:37, Sebastian Sylvan wrote: > > > On 1/3/06, Simon Marlow <[EMAIL PROTECTED]> wrote: > >> Tomasz Zielonka wrote: > >>> On Thu, Dec 29, 2005 at 01:20:41PM +, Joel Reymont wrote: > >>> > Why does it take a fraction of a second for 1 thread to unpickle > and several seconds per thread for several threads to do it at the > same time? I think this is where the mistery lies. > >>> > >>> > >>> Have you considered any of this: > >>> > >>> - too big memory pressure: more memory means more frequent and more > >>> expensive GCs, 1000 threads using so much memory means bad cache > >>> performance - a deficiency of GHC's thread scheduler - giving too > >>> much time one thread steals it from others (Simons, don't get > >>> angry at me - I am probably wrong here ;-) > >> > >> I don't think there's anything really strange going on here. > >> > >> The default context switch interval in GHC is 0.02 seconds, measured > >> in CPU time by default. GHC's scheduler is stricly round-robin, so > >> therefore with 100 threads in the system it can be 2 seconds between > >> a thread being descheduled and scheduled again. > > > > According to this: > > http://www.haskell.org/ghc/docs/latest/html/users_guide/sec-using-parallel.html#parallel-rts-opts > > > > The minimum time between context switches is 20 milliseconds. > > > > Is there any good reason why 0.02 seconds is the best that you can get > > here? Couldn't GHC's internal timer tick at a _much_ faster rate (like > > 50-100µs or so)? > > Sure, there's no reason why we couldn't do this. Of course, even idle > Haskell processes will be ticking away in the background, so there's a reason > not to make the interval too short. What do you think is reasonable? Not sure. Could it be configurable via a command line flag? If the profiler could report the % of time spent doing context switches (or maybe it already does?) the user could fine tune this to his liking. For the (hypothetical) real-time simulation app I would *guess* that something along the lines of 500µs would be more than enough to not introduce any unnecessary lag in rendering (seeing as the target frame time would be around 15ms, and you'd want to have a good amount of context switches to allow some of the next frame to be computed in parallell to all the render-surface optimizations etc. for the current frame). But then again, there may be other apps which need it to be even lower.. So a command line flag sure would be nice. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Re: Joels Time Leak
On 03 January 2006 15:37, Sebastian Sylvan wrote: > On 1/3/06, Simon Marlow <[EMAIL PROTECTED]> wrote: >> Tomasz Zielonka wrote: >>> On Thu, Dec 29, 2005 at 01:20:41PM +, Joel Reymont wrote: >>> Why does it take a fraction of a second for 1 thread to unpickle and several seconds per thread for several threads to do it at the same time? I think this is where the mistery lies. >>> >>> >>> Have you considered any of this: >>> >>> - too big memory pressure: more memory means more frequent and more >>> expensive GCs, 1000 threads using so much memory means bad cache >>> performance - a deficiency of GHC's thread scheduler - giving too >>> much time one thread steals it from others (Simons, don't get >>> angry at me - I am probably wrong here ;-) >> >> I don't think there's anything really strange going on here. >> >> The default context switch interval in GHC is 0.02 seconds, measured >> in CPU time by default. GHC's scheduler is stricly round-robin, so >> therefore with 100 threads in the system it can be 2 seconds between >> a thread being descheduled and scheduled again. > > According to this: > http://www.haskell.org/ghc/docs/latest/html/users_guide/sec-using-parallel.html#parallel-rts-opts > > The minimum time between context switches is 20 milliseconds. > > Is there any good reason why 0.02 seconds is the best that you can get > here? Couldn't GHC's internal timer tick at a _much_ faster rate (like > 50-100µs or so)? Sure, there's no reason why we couldn't do this. Of course, even idle Haskell processes will be ticking away in the background, so there's a reason not to make the interval too short. What do you think is reasonable? > Apart from meaning big trouble for applications with a large number of > threads (such as Joels) it'll also make life difficult for any sort of > real-time application. For instance if you want to use HOpenGL to > render a simulation engine and you split it up into tons of concurrent > processes (say one for each dynamic entity in the engine), the 20ms > granularity would make it quite hard to achieve 60 frames per second > in that case... The reason things are the way they are is that a large number of *running* threads is not a workload we've optimised for. In fact, Joel's program is the first one I've seen with a lot of running threads, apart from our testsuite. And I suspect that when Joel uses a better binary I/O implementation a lot of that CPU usage will disappear. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Joels Time Leak
General follow-up questions: Would adding Control.Concurrent.yield commands cause a context switch more often than every 0.02 seconds? Is there any command in GHC to allow a thread to prevent itself from being rescheduled while computing something? Another comment: between 1000's of threads and writing a custom continuation based scheduler, what about using a thread pool? Does anyone have a library with a "fork-IO-Pool" command? -- Chris Simon Marlow wrote: > Tomasz Zielonka wrote: > >> On Thu, Dec 29, 2005 at 01:20:41PM +, Joel Reymont wrote: >> >>> Why does it take a fraction of a second for 1 thread to unpickle and >>> several seconds per thread for several threads to do it at the same >>> time? I think this is where the mistery lies. >> >> >> >> Have you considered any of this: >> >> - too big memory pressure: more memory means more frequent and more >> expensive GCs, 1000 threads using so much memory means bad cache >> performance >> - a deficiency of GHC's thread scheduler - giving too much time one >> thread steals it from others (Simons, don't get angry at me - I am >> probably wrong here ;-) > > > I don't think there's anything really strange going on here. > > The default context switch interval in GHC is 0.02 seconds, measured in > CPU time by default. GHC's scheduler is stricly round-robin, so > therefore with 100 threads in the system it can be 2 seconds between a > thread being descheduled and scheduled again. > > I measured the time taken to unpickle those large 50k packets as 0.3 > seconds on my amd64 box (program compiled *without* optimisation), so > the thread can get descheduled twice during while unpickling a large > packet, giving a >4s delay with 100 threads running. > > The actual context switch interval seems to often be larger than 0.2 > seconds; I'm not sure exactly why this is, it might be due to delays in > the OS delivering the signal. This does mean that the timeleak program > reports alerts for as little as 50 threads, though. > > Cheers, > Simon > > ___ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: Joels Time Leak
On 1/3/06, Simon Marlow <[EMAIL PROTECTED]> wrote: > Tomasz Zielonka wrote: > > On Thu, Dec 29, 2005 at 01:20:41PM +, Joel Reymont wrote: > > > >>Why does it take a fraction of a second for 1 thread to unpickle and > >>several seconds per thread for several threads to do it at the same > >>time? I think this is where the mistery lies. > > > > > > Have you considered any of this: > > > > - too big memory pressure: more memory means more frequent and more > > expensive GCs, 1000 threads using so much memory means bad cache > > performance > > - a deficiency of GHC's thread scheduler - giving too much time one > > thread steals it from others (Simons, don't get angry at me - I am > > probably wrong here ;-) > > I don't think there's anything really strange going on here. > > The default context switch interval in GHC is 0.02 seconds, measured in > CPU time by default. GHC's scheduler is stricly round-robin, so > therefore with 100 threads in the system it can be 2 seconds between a > thread being descheduled and scheduled again. According to this: http://www.haskell.org/ghc/docs/latest/html/users_guide/sec-using-parallel.html#parallel-rts-opts The minimum time between context switches is 20 milliseconds. Is there any good reason why 0.02 seconds is the best that you can get here? Couldn't GHC's internal timer tick at a _much_ faster rate (like 50-100µs or so)? Apart from meaning big trouble for applications with a large number of threads (such as Joels) it'll also make life difficult for any sort of real-time application. For instance if you want to use HOpenGL to render a simulation engine and you split it up into tons of concurrent processes (say one for each dynamic entity in the engine), the 20ms granularity would make it quite hard to achieve 60 frames per second in that case... /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Re: Joels Time Leak
Tomasz Zielonka wrote: On Thu, Dec 29, 2005 at 01:20:41PM +, Joel Reymont wrote: Why does it take a fraction of a second for 1 thread to unpickle and several seconds per thread for several threads to do it at the same time? I think this is where the mistery lies. Have you considered any of this: - too big memory pressure: more memory means more frequent and more expensive GCs, 1000 threads using so much memory means bad cache performance - a deficiency of GHC's thread scheduler - giving too much time one thread steals it from others (Simons, don't get angry at me - I am probably wrong here ;-) I don't think there's anything really strange going on here. The default context switch interval in GHC is 0.02 seconds, measured in CPU time by default. GHC's scheduler is stricly round-robin, so therefore with 100 threads in the system it can be 2 seconds between a thread being descheduled and scheduled again. I measured the time taken to unpickle those large 50k packets as 0.3 seconds on my amd64 box (program compiled *without* optimisation), so the thread can get descheduled twice during while unpickling a large packet, giving a >4s delay with 100 threads running. The actual context switch interval seems to often be larger than 0.2 seconds; I'm not sure exactly why this is, it might be due to delays in the OS delivering the signal. This does mean that the timeleak program reports alerts for as little as 50 threads, though. Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe