heh. I'll stop the Donaldson references after this, I promise.
I don't know if I'm addressing the symptoms or actually fixing anything at
this point, but I've gotten RRD updates to smooth out considerably for me
by adding a simple retry-on-failure mechanism to the *RRD_update()
functions in gmetad/rrd_helpers.c.
Tossing in a sleep(1) in there before the retry addresses the following errors:
* "Updates must be at least one second apart" error.
* "Could not lock RRD for updating" error.
* "Expected 1 value for data source (got 0)" error.
(this one's weird, it's like the argv array is not being parsed properly
by rrd_update() - when I hand-update using the debug-printed params,
it *works* ... but the error message is consistent with passing the
filename as the data source entry! Someone's offset by one every
so often...
* "xxxx.rrd is not a valid RRD file" error.
I've seen these for both summaries and regular RRDs.
This hasn't totally smoothed out my data collection, but it's improved
things noticeably and I'm now looking into exactly why I'm getting socket
errors. At the very least I'll add another dumb retry to it (there seems
to be no rhyme or reason to the socket issues).
Rolling this into a patch is going to be annoying. I've got debug
statements EVERYWHERE.