On 2010-03-02 11:33 +0900 (Tue), Simon Cranshaw wrote: > I can confirm that without tweaking the RTS settings we were seeing > over 100ms GC pauses.
Actually, we can't quite confirm that yet. We're seeing large amounts of time go by in our main trading loop, but I'm still building the profiling tools to see what exactly is going on there. However, GC is high on our list of suspects, since twiddling the GC parameters can improve things drastically. On 2010-03-02 00:06 -0500 (Tue), John Van Enk wrote: > Would a more predictable GC or a faster GC be better in your case? (Of > course, both would be nice.) Well, as on 2010-03-01 17:18 -0600 (Mon), Jeremy Shaw wrote: > For audio apps, there is a callback that happens every few milliseconds. As > often as 4ms. The callback needs to finish as quickly as possible to avoid a > buffer underruns. I think we're in about the same position. Ideally we'd never have to stop for GC, but that's obviously not practical; what will hurt pretty badly, and we should be able to prevent, is us gathering up a bunch of market data, making a huge pause for a big GC, and then generating orders based on that now oldish market data. We'd be far better off doing the GC first, and then looking at the state of the market and doing our thing, because though the orders will still not get out as quickly as they would without the GC, at least they'll be using more recent data. I tried invoking System.Mem.performGC at the start of every loop, but that didn't help. Now that I know it was invoking a major GC, I can see why. :-) But really, before I go much further with this: On 2010-03-01 14:41 +0100 (Mon), Peter Verswyvelen wrote: > Sounds like we need to come up with some benchmarking programs so we > can measure the GC latency and soft-realtimeness... Exactly. Right now I'm working from logs made by my own logging and profiling system. These are timestamped, and they're "good enough" to get a sense of what's going on, but incomplete. I also have the information from the new event logging system, which is excellent in terms of knowing exactly when things are starting and stopping, but doesn't include information about my program, nor does it include any sort of GC stats. Then we have the GC statistics we get with -S, which don't have timestamps. My plan is to bring all of this together. The first step was to fix GHC.Exts.traceEvent so that we can use that to report information about what the application is doing. In 6.12.1 it segfaults, but we have a fix (see http://hackage.haskell.org/trac/ghc/ticket/3874) and it looks as if it will go into 6.12.2, even. The next step is to start recording the information generated by -S in the eventlog as well, so that not only do we know when a GC started or stopped in relation to our application code, but we know what generation it was, how big the heap was at the time, how much was collected, and so on and so forth. Someone mentioned that there were various other stats that were collected but not printed by -S; we should probably throw those in too. With all of that information it should be much easier to figure out where and when GC behaviour is causing us pain in low-latency applications. However: now that Simon's spent a bunch of time experimenting with the runtime's GC settings and found a set that's mitigated much of our problem, other things are pushing their way up my priority list. Between that and an upcoming holiday, I'm probably not going to get back to this for a few weeks. But I'd be happy to discuss my ideas with anybody else who's interested in similar things, even if just to know what would be useful to others. What do you guys think about setting up a separate mailing list for this? I have to admit, I don't follow haskell-cafe much due to the high volume of the list. (Thus my late presence in this thread.) I would be willing to keep much closer track of a low-volume list that dealt with only GC stuff. I'd even be open to providing hosting for the list, using my little baby mailing list manager written in Haskell (mhailist). It's primitive, but it does handle subscribing, unsubscribing and forwarding of messages. cjs -- Curt Sampson <c...@cynic.net> +81 90 7737 2974 http://www.starling-software.com The power of accurate observation is commonly called cynicism by those who have not got it. --George Bernard Shaw _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe