That's interesting. I have not done this kind of work before, and had not come across CDFs. I can see why it make sense to look at the mean and tail.
Your assumption is correct. The messages have a similar cost, which is why the graph I posted is relatively flat most of the time. The spikes suggest to me that it is a tail affecting issue because the messages are following the same code path as when it is running normally. On 29 September 2015 at 11:45, Neil Davies <semanticphilosop...@gmail.com> wrote: > Will > > I was trying to get a feeling for what those coloured squares actually > denoted - typically we examine this sort of performance information > as CDFs (cumulative distribution functions[1]) trying to pull apart the > issues that “mean” effecting (i.e typical path through code/system) and > those that are “tail” effecting (i.e exceptions - and GC running could > be seen as an “exception” - one that you can manage and time shift > in the relative timing). > > I’m assuming that messages have a similar “cost” (i.e similar work > to complete) - so that a uniform arrival rate equates to a uniform > rate of work to be done arriving. > > Neil > [1] We plot the CDF’s in two ways, the “usual” way for the major part > of the probability mass and then as a (1-CDF) on a log log scale to > expose the tail behaviour. > > On 29 Sep 2015, at 10:35, Will Sewell <m...@willsewell.com> wrote: > >> Thank you for the reply Neil. >> >> The spikes are in response time. The graph I linked to shows the >> distribution of response times in a given window of time (darkness of >> the square is the number of messages in a particular window of >> response time). So the spikes are in the mean and also the max >> response time. Having said that I'm not exactly sure what you mean by >> "mean values". >> >> I will have a look into -I0. >> >> Yes the arrival of messages is constant. This graph shows the number >> of messages that have been published to the system: >> http://i.imgur.com/ADzMPIp.png >> >> On 29 September 2015 at 10:16, Neil Davies >> <semanticphilosop...@gmail.com> wrote: >>> Will >>> >>> is your issue with the spikes i response time, rather than the mean values? >>> >>> If so, once you’ve reduced the amount of unnecessary mutation, you might >>> want >>> to take more control over when the GC is taking place. You might want to >>> disable >>> GC on timer (-I0) and force GC to occur at points you select - we found >>> this useful. >>> >>> Lastly, is the arrival pattern (and distribution pattern) of messages >>> constant or >>> variable? just making sure that you are not trying to fight basic queueing >>> theory here. >>> >>> >>> Neil >>> >>> On 29 Sep 2015, at 10:03, Will Sewell <m...@willsewell.com> wrote: >>> >>>> Thanks for the reply Greg. I have already tried tweaking these values >>>> a bit, and this is what I found: >>>> >>>> * I first tried -A256k because the L2 cache is that size (Simon Marlow >>>> mentioned this can lead to good performance >>>> http://stackoverflow.com/a/3172704/1018290) >>>> * I then tried a value of -A2048k because he also said "using a very >>>> large young generation size might outweigh the cache benefits". I >>>> don't exactly know what he meant by "a very large young generation >>>> size", so I guessed at this value. Is it in the right ballpark? >>>> * With -H, I tried values of -H8m, -H32m, -H128m, -H512m, -H1024m >>>> >>>> But all lead to worse performance over the defaults (and -H didn't >>>> really have much affect at all). >>>> >>>> I will try your suggestion of setting -A to the L3 cache size. >>>> >>>> Are there any other values I should try setting these at? >>>> >>>> As for your final point, I have run space profiling, and it looks like >>>>> 90% of the memory is used for our message index, which is a temporary >>>> store of messages that have gone through the system. These messages >>>> are stored in aligned chunks in memory that are merged together. I >>>> initially though this was causing the spikes, but they were still >>>> there even after I removed the component. I will try and run space >>>> profiling in the build with the message index. >>>> >>>> Thanks again. >>>> >>>> On 28 September 2015 at 19:02, Gregory Collins <g...@gregorycollins.net> >>>> wrote: >>>>> >>>>> On Mon, Sep 28, 2015 at 9:08 AM, Will Sewell <m...@willsewell.com> wrote: >>>>>> >>>>>> If it is the GC, then is there anything that can be done about it? >>>>> >>>>> Increase value of -A (the default is too small) -- best value for this is >>>>> L3 >>>>> cache size of the chip >>>>> Increase value of -H (total heap size) -- this will use more ram but >>>>> you'll >>>>> run GC less often >>>>> This will sound flip, but: generate less garbage. Frequency of GC runs is >>>>> proportional to the amount of garbage being produced, so if you can lower >>>>> mutator allocation rate then you will also increase net productivity. >>>>> Built-up thunks can transparently hide a lot of allocation so fire up the >>>>> profiler and tighten those up (there's an 80-20 rule here). Reuse output >>>>> buffers if you aren't already, etc. >>>>> >>>>> G >>>>> >>>>> -- >>>>> Gregory Collins <g...@gregorycollins.net> >>>> _______________________________________________ >>>> Glasgow-haskell-users mailing list >>>> Glasgow-haskell-users@haskell.org >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users >>> > _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users