I am very hungry and am going to go get a burrito.

I suspect something is not being updated between loops or something, because check out what my copious print statements tell me in gmetad:

RRD_update(): error expected 1 data source readings (got 0) from /www/gmetad/rrds/SOME_CLUSTER/SOME_HOST/cpu_idle.rrd:... updating /www/gmetad/rrds/SOME_CLUSTER/SOME_HOST/cpu_idle.rrd with value N:0.6 process_xml.c: Call to write_data_to_rrd(SOME_CLUSTER,SOME_HOST,cpu_idle,0.6) was nonzero ... RRD_update(): error expected 1 data source readings (got 0) from /www/gmetad/rrds/SOME_CLUSTER/SOME_HOST/cpu_idle.rrd:... updating /www/gmetad/rrds/SOME_OTHER_CLUSTER/SOME_OTHER_HOST/mem_cached.rrd with value N:0 process_xml.c: Call to write_data_to_rrd(SOME_OTHER_CLUSTER,SOME_OTHER_HOST,mem_cached,0) was nonzero ...

So first of all, 0.6 is not a zero value. That's a little freaky. Second, the second value passed *is* zero, but RRD_update() is returning the same error message. So either the error string wasn't updated, or the rrd_update() string wasn't updated. This data appears to be being shared between the two threads. And that just don't make sense.

I'm also noticing a couple of RRD writes failing due to locking issues. But those are fairly few and far between.

This error, btw, trips the xml_data.rval flag and causes no further RRDs to be updated in this pass. Bummer. But not necessarily the same thing as what's causing ALL my sources to intermittently "die."

Anyway, I'm just sharing here before I leave today. If someone wants to take this up before I do in the morning, have at it. :)


Reply via email to