heh.  I'll stop the Donaldson references after this, I promise.

I don't know if I'm addressing the symptoms or actually fixing anything at this point, but I've gotten RRD updates to smooth out considerably for me by adding a simple retry-on-failure mechanism to the *RRD_update() functions in gmetad/rrd_helpers.c.

Tossing in a sleep(1) in there before the retry addresses the following errors:

*  "Updates must be at least one second apart" error.
*  "Could not lock RRD for updating" error.
*  "Expected 1 value for data source (got 0)" error.
   (this one's weird, it's like the argv array is not being parsed properly
    by rrd_update() - when I hand-update using the debug-printed params,
    it *works* ... but the error message is consistent with passing the
    filename as the data source entry!  Someone's offset by one every
    so often...
*   "xxxx.rrd is not a valid RRD file" error.

I've seen these for both summaries and regular RRDs.

This hasn't totally smoothed out my data collection, but it's improved things noticeably and I'm now looking into exactly why I'm getting socket errors. At the very least I'll add another dumb retry to it (there seems to be no rhyme or reason to the socket issues).

Rolling this into a patch is going to be annoying. I've got debug statements EVERYWHERE.


Reply via email to