Will I was trying to get a feeling for what those coloured squares actually denoted - typically we examine this sort of performance information as CDFs (cumulative distribution functions[1]) trying to pull apart the issues that “mean” effecting (i.e typical path through code/system) and those that are “tail” effecting (i.e exceptions - and GC running could be seen as an “exception” - one that you can manage and time shift in the relative timing).
I’m assuming that messages have a similar “cost” (i.e similar work to complete) - so that a uniform arrival rate equates to a uniform rate of work to be done arriving. Neil [1] We plot the CDF’s in two ways, the “usual” way for the major part of the probability mass and then as a (1-CDF) on a log log scale to expose the tail behaviour. On 29 Sep 2015, at 10:35, Will Sewell <m...@willsewell.com> wrote: > Thank you for the reply Neil. > > The spikes are in response time. The graph I linked to shows the > distribution of response times in a given window of time (darkness of > the square is the number of messages in a particular window of > response time). So the spikes are in the mean and also the max > response time. Having said that I'm not exactly sure what you mean by > "mean values". > > I will have a look into -I0. > > Yes the arrival of messages is constant. This graph shows the number > of messages that have been published to the system: > http://i.imgur.com/ADzMPIp.png > > On 29 September 2015 at 10:16, Neil Davies > <semanticphilosop...@gmail.com> wrote: >> Will >> >> is your issue with the spikes i response time, rather than the mean values? >> >> If so, once you’ve reduced the amount of unnecessary mutation, you might want >> to take more control over when the GC is taking place. You might want to >> disable >> GC on timer (-I0) and force GC to occur at points you select - we found this >> useful. >> >> Lastly, is the arrival pattern (and distribution pattern) of messages >> constant or >> variable? just making sure that you are not trying to fight basic queueing >> theory here. >> >> >> Neil >> >> On 29 Sep 2015, at 10:03, Will Sewell <m...@willsewell.com> wrote: >> >>> Thanks for the reply Greg. I have already tried tweaking these values >>> a bit, and this is what I found: >>> >>> * I first tried -A256k because the L2 cache is that size (Simon Marlow >>> mentioned this can lead to good performance >>> http://stackoverflow.com/a/3172704/1018290) >>> * I then tried a value of -A2048k because he also said "using a very >>> large young generation size might outweigh the cache benefits". I >>> don't exactly know what he meant by "a very large young generation >>> size", so I guessed at this value. Is it in the right ballpark? >>> * With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m >>> >>> But all lead to worse performance over the defaults (and -H didn't >>> really have much affect at all). >>> >>> I will try your suggestion of setting -A to the L3 cache size. >>> >>> Are there any other values I should try setting these at? >>> >>> As for your final point, I have run space profiling, and it looks like >>>> 90% of the memory is used for our message index, which is a temporary >>> store of messages that have gone through the system. These messages >>> are stored in aligned chunks in memory that are merged together. I >>> initially though this was causing the spikes, but they were still >>> there even after I removed the component. I will try and run space >>> profiling in the build with the message index. >>> >>> Thanks again. >>> >>> On 28 September 2015 at 19:02, Gregory Collins <g...@gregorycollins.net> >>> wrote: >>>> >>>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <m...@willsewell.com> wrote: >>>>> >>>>> If it is the GC, then is there anything that can be done about it? >>>> >>>> Increase value of -A (the default is too small) -- best value for this is >>>> L3 >>>> cache size of the chip >>>> Increase value of -H (total heap size) -- this will use more ram but you'll >>>> run GC less often >>>> This will sound flip, but: generate less garbage. Frequency of GC runs is >>>> proportional to the amount of garbage being produced, so if you can lower >>>> mutator allocation rate then you will also increase net productivity. >>>> Built-up thunks can transparently hide a lot of allocation so fire up the >>>> profiler and tighten those up (there's an 80-20 rule here). Reuse output >>>> buffers if you aren't already, etc. >>>> >>>> G >>>> >>>> -- >>>> Gregory Collins <g...@gregorycollins.net> >>> _______________________________________________ >>> Glasgow-haskell-users mailing list >>> Glasgow-haskell-users@haskell.org >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users >> _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users