On Sat, Oct 31, 2009 at 09:52:15AM -0700, Thorsten von Eicken wrote: > Quick follow-up. I decided to add another 3k updates per second (extra > 30k tree nodes) to my test run. See results in > http://www.voneicken.com/dl/rrd/rrdcached-7.png > What's interesting is that the server got somewhat overloaded sitting a > lot in I/O wait. By and large the flush queue length remained under > control, except when doing backups (10pm, 8:30am). Memory usage by > rrdcached and collectd remained under control, but there is a long term > upward-trending slope to rrdcached's memory usage which is not good. > Possibly related to the power-of-two allocator patch that Florian > provided. The graph I find the most interesting one is the disk sdk disk > ops (3rd from the end). Before adding the last chunk of traffic the disk > load was write-dominated, which means that rrds were mostly cached in > memory (5-6 GB left after the processes). After adding the extra load > the disk load became read-dominated indicating that the rrd working set > exceeded memory.
Thorsten, If you're becoming read dominated, you should consider lowering your file update/sec rate by increasing your -w/-f timers. This just trades one kind of cache memory (f/s blocks) for another (update strings). I'm sending a linear chunk allocator along for allocating cache_item_t.values in operator-defined block sizes.. I'd appreciate if you'd test it with your load to see if it reduces your CPU usage related to frequent realloc(). -- kevin brintnall =~ /kbr...@rufus.net/ > Thorsten > > > Thorsten von Eicken wrote: > > Thorsten von Eicken wrote: > > > >>> 37.1 % of the time it spent in ?handle_request_update? the daemon is > >>> actually waiting for ?realloc?. This is (to me) very unexpected and a > >>> schoolbook example of ?measure before you optimize?. > >>> > >>> I think we can get rid of this bottleneck by writing a specialized > >>> version of ?rrd_add_strdup? which reallocates powers of ten. Something > >>> like: > >>> > >>> [...] > >>> > >>> It'd be great if you could give the attached patch a try > >>> > >> spends all of its time (more than 99 %) in ?realloc?. So in consequence > >> Test is running, including Kevin's simplification... Thanks for the > >> help!!! > >> > > Things are again looking much better, almost great I should say! The one > > thing that still makes me a bit uncomfortable is that at the end of the > > second hour of run-time there was a cpu spike which caused collectd to > > grow rapidly. (Still using -w 3600 -z 3600 -f 7200, I put a load of ~50k > > tree nodes right from the start.) You can see the graphs at > > http://www.voneicken.com/dl/rrd/ look for the rrdcached-6* series. It > > flattens out nicely after the spike, but it's one of those things that > > tend to bite sooner or later. I'm not sure what to do about it. > > Thorsten > > > > _______________________________________________ > > rrd-developers mailing list > > rrd-developers@lists.oetiker.ch > > https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers > > > > > > _______________________________________________ > rrd-developers mailing list > rrd-developers@lists.oetiker.ch > https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers _______________________________________________ rrd-developers mailing list rrd-developers@lists.oetiker.ch https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers