Re: [collectd] long sleeps when using collectd and ntpd
Hello Lode, On Tue, Sep 24, 2013 at 10:53:02AM +0200, danta wrote: > Whenever ntpd sets the clock in the past (because of a clock drift), > collectd sleeps until it is back at it's 'normal' time. > When trying to debug the problem, we found that the do_loop function > in collectd.c determines the amount of time to sleep based on times > abtained from the "gettimeofday" function. Was this a design choice? > Wouldn't it be better to use a monotonic clock? it would be possible to use a monotonic clock _on Linux_, but according to POSIX (i.e. to be portable) one must use a real time clock. Also, I don't think this is causing the behavior; the problem is a bit more complicated. Each callback, after it returns, is put into a heap which is sorted by the absolute, real time when it should be called next. I.e. the loop in src/collectd.c is still waking up periodically, but next to no work is done inside this loop, especially not the reads. If you were to "fix" this, you'd for example end up with the metric for 12:00:00 being followed by the metric for 11:45:10, i.e. almost 15 minutes into the "past". This would result in collectd refusing this measurement because it is "too old". This would continue until the time has progressed enough. Last, but not least, metrics _have_ to use the wall clock time. Otherwise you won't be able to correlate between metrics and alerts / behavior you observe. Hope this helps, best regards, —octo -- collectd – The system statistics collection daemon Website: http://collectd.org Google+: http://collectd.org/+ GitHub: https://github.com/collectd Twitter: http://twitter.com/collectd signature.asc Description: Digital signature ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] long sleeps when using collectd and ntpd
On 9/24/2013 4:53 AM, danta wrote: We have some problems using collectd and ntpd. Whenever ntpd sets the clock in the past (because of a clock drift), collectd sleeps until it is back at it's 'normal' time. To give an example: - Suppose it's 10:00am, - ntpd sees a large clock drift and set the clock to 09:45am - collectd will sleep till 10:00am https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Virtualization/chap-Virtualization-KVM_guest_timing_management.html http://kb.vmware.com/kb/1006427 How big is the time step? You may want to look at the -g and -x options that are being passed to the ntpd daemon. Under RHEL, these are set in /etc/sysconfig/ntpd. http://www.eecis.udel.edu/~mills/ntp/html/ntpd.html And make sure that there is no periodic process running "ntpdate" somewhere in the background. You may want to check through the logfiles to figure out whether it is ntpd that is stepping the clock or some other process. ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] long sleeps when using collectd and ntpd
Hello, I have already noticed this problem too. Forget about ntpd. Some may try to understand what ntpd has to do with it and maybe suggest solutions for ntpd. The problem is the time on the machine. How to reproduce : 0/ disable ntpd (so you can see that there is nothing to do with ntpd) 1/ Set the date&time in the future on your server (it's "t0", set it to t1, for example t1="t0 + 2 hours") 2/ Wait 2 or 3 minutes that Collectd collects and send data to the main Collectd server (and break your rrd files) 3/ Set the date&time back to the correct value (t0 + the time elapsed since step 1) 4/ no data will be written inside the rrd files until the real date&time reaches t1 (in our example, we loose 2 hours of data). The problem is inside the rrd files. You can run "rrdtool last .rrd" and see the timestamp of the last value written. As far as I know rrd files and rrdtool, you cannot write data with timestamps before that value. So if you set a bad date&time in the future, it will continue to work. But when you set a date&time in the past (or like in the example, back from the future), it's broken. Bigger problem when the date&time goes crazy and set a date years later. When this happens, you can consider your rrd file as corrupted. Well, here is my version of the description of the problem. If anybody has an idea on how to fix, I'm interested too. Regards, Yves Le 2013-09-24 10:53, danta a écrit : We have some problems using collectd and ntpd. Whenever ntpd sets the clock in the past (because of a clock drift), collectd sleeps until it is back at it's 'normal' time. To give an example: - Suppose it's 10:00am, - ntpd sees a large clock drift and set the clock to 09:45am - collectd will sleep till 10:00am When trying to debug the problem, we found that the do_loop function in collectd.c determines the amount of time to sleep based on times abtained from the "gettimeofday" function. Was this a design choice? Wouldn't it be better to use a monotonic clock? Greetz, Lode ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd -- - Homepage - http://ymettier.free.fr - - GPG key- http://ymettier.free.fr/gpg.txt - - C en action- http://ymettier.free.fr/livres/C_en_action_ed2.html - - Guide Survie C - http://www.pearson.fr/livre/?GCOI=27440100673730- ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] long sleeps when using collectd and ntpd
On 24. September 2013 at 12:12:15, danta (merte...@axsguard.net) wrote: > >Hi Dave, > >We see the problem on virtual machines, where the host machine has it's >clock set to the local time. I realize this is a wrong setup but >unfortunately we can't change the settings on the host machine. Also the >host clock seems rather unstable, leading to large drifts if the ntpd >server can't be reached for large periods of time. > >The long sleep problem also occurs if you correct your date to a date in >the past. > >For the moment we patched the collectd code to exit if it detects a >large time drift (a watchdog will then restart it), but I think this >isn't a clean solution. > >Best regards, >Lode Makes sense, OTOH I would imagine this would cause problems with lots of other things too. >From twitter, serendipitiously: http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time A+ Dave ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] long sleeps when using collectd and ntpd
Hi Dave, We see the problem on virtual machines, where the host machine has it's clock set to the local time. I realize this is a wrong setup but unfortunately we can't change the settings on the host machine. Also the host clock seems rather unstable, leading to large drifts if the ntpd server can't be reached for large periods of time. The long sleep problem also occurs if you correct your date to a date in the past. For the moment we patched the collectd code to exit if it detects a large time drift (a watchdog will then restart it), but I think this isn't a clean solution. Best regards, Lode On 09/24/2013 11:27 AM, Dave Cottlehuber wrote: On 24. September 2013 at 10:56:07, danta (merte...@axsguard.net) wrote: We have some problems using collectd and ntpd. Whenever ntpd sets the clock in the past (because of a clock drift), collectd sleeps until it is back at it's 'normal' time. To give an example: - Suppose it's 10:00am, - ntpd sees a large clock drift and set the clock to 09:45am - collectd will sleep till 10:00am When trying to debug the problem, we found that the do_loop function in collectd.c determines the amount of time to sleep based on times abtained from the "gettimeofday" function. Was this a design choice? Wouldn't it be better to use a monotonic clock? Greetz, Lode Hi Lode I can't answer for collectd but this doesn't sound like the correct usage for ntpd. Is the large drift an example, & the timescale is a few seconds delay? Your ntpd should use a driftfile which over time will keep things in line most of the time, even if your ntpd servers are temporarily unavailable. Are these VMs that are being hibernated or similar? If so, use host clock sync and not ntpd. Obviously the hosts will use ntpd though! A+ Dave ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
Re: [collectd] long sleeps when using collectd and ntpd
On 24. September 2013 at 10:56:07, danta (merte...@axsguard.net) wrote: > >We have some problems using collectd and ntpd. > >Whenever ntpd sets the clock in the past (because of a clock drift), >collectd sleeps until it is back at it's 'normal' time. >To give an example: >- Suppose it's 10:00am, >- ntpd sees a large clock drift and set the clock to 09:45am >- collectd will sleep till 10:00am > >When trying to debug the problem, we found that the do_loop function in >collectd.c determines the amount of time to sleep based on times >abtained from the "gettimeofday" function. Was this a design choice? >Wouldn't it be better to use a monotonic clock? > >Greetz, >Lode Hi Lode I can't answer for collectd but this doesn't sound like the correct usage for ntpd. Is the large drift an example, & the timescale is a few seconds delay? Your ntpd should use a driftfile which over time will keep things in line most of the time, even if your ntpd servers are temporarily unavailable. Are these VMs that are being hibernated or similar? If so, use host clock sync and not ntpd. Obviously the hosts will use ntpd though! A+ Dave ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd
[collectd] long sleeps when using collectd and ntpd
We have some problems using collectd and ntpd. Whenever ntpd sets the clock in the past (because of a clock drift), collectd sleeps until it is back at it's 'normal' time. To give an example: - Suppose it's 10:00am, - ntpd sees a large clock drift and set the clock to 09:45am - collectd will sleep till 10:00am When trying to debug the problem, we found that the do_loop function in collectd.c determines the amount of time to sleep based on times abtained from the "gettimeofday" function. Was this a design choice? Wouldn't it be better to use a monotonic clock? Greetz, Lode ___ collectd mailing list collectd@verplant.org http://mailman.verplant.org/listinfo/collectd