Re: [Linux-HA] Antw: Re: Q: "exec-time" values

Ulrich Windl Wed, 07 Dec 2011 00:16:13 -0800

>>> Lars Ellenberg <lars.ellenb...@linbit.com> schrieb am 06.12.2011 um 18:38 in
Nachricht <20111206173820.GH14351@barkeeper1-xen.linbit>:
> On Tue, Dec 06, 2011 at 08:31:39AM +0100, Ulrich Windl wrote:
> > > Requirements for the return value of "time_longclock()"
> > > (which is what actually should be used)
> > > are at least:
> > >   It must be monotonic, no jumping backwards.
> > >   It must not care for the wall clock time
> > >   (ignore any jumping around caused by "date --set" or equivalent,
> > >   so gettimeofday or similar are out).
> > >   It must not care which CPU it runs on. 
> > >   It must be portable.
> > > 
> > >   It should be as linear as possible wrt actually elapsed real time,
> > >   not return a diff of 100 for one elapsed second, and some time
> > >   later return a diff of 117 for the same "real time" period.
> > > 
> > > The return value of times(), (wrapped in time_longclock(), if
> > > sizeof(clock_t) < 8) meets those requirements.
> > > 
> > > I don't see how getrusage would meet these requirements.
> > 
> > Oops! ???
> > 
> > I don't see why resource usage shouldn't be
> > monotonic
> > independent of wall-time
> > independent of the CPU it runs on
> > portable
> 
> It measures resource usage.
> Depending on how "busy" this process or the overall system was,
> it will not be "linear" wrt actually elapsed real time.
> "Resource usage units" can not be converted back to "real time ms",
> for example.
> What we typically care about is the elapsed real time.


So I'd use any function to get the time, maybe the most recent version (POSIX 
clock_gettime()). Messing with non-linear time is definitely not task of the 
cluster software, especially as various parts of the cluster software fail 
anyway if time jumps (e.g. sbd). I soo no reason of polishing some execution 
statistics if the rest of the software can't deal with time jumps. You are 
making the software just complicated without any real benefit.

If NTP (or a comparable protocol) is working properly, these worries about time 
jumps are all non-issues.

> 
> Besides, at least on the box I just tried (2.6.32 kernel),
> getrusage returns just the same granularity of 0.01 seconds,
> so you'd be back to square one, even if it met the other requirements
> 
> :)

OK, it's not worse, but has a cleaner interface.

> 
> > > So if you want to hack something up with getrusage,
> > > restrict that to a certain usage of time_longclock(),
> > > not to the implementation of time_longclock() itself.
> > 
> > I don't think that "longclock" stuff is actually needed at all (unless I 
> overlooked something).
> 
> You probably have.
> 32bit wrap around, for example.

Hey, I proposed using "struct timeval" or "struct timespec". I can well 
remember those complaints in 1984 that UNIX time will wrap in 2036 (or so). As 
it tutned out the year 2000 was much more a problem that that. So you have a 
problem with some 32-bit tick counter when getrusage or any of the normal time 
funtions would not have a problem. So you add a wrapper around that 32 bit 
counter.

You don't need to change a thing, I'm just wondering.

> 
> > > If you find something that fullfills those requirements,
> > > so could possibly be used instead of times() in the
> > > implementation of cl_times(), go ahead and suggest it.
> > > 
> > > clock_gettime with CLOCK_MONOTONIC may be a candidate.
> > > 
> > > But really: just to get some sub 10ms granularity of "exec time"?
> > > Why even bother.
> > 
> > Because I can complete a whole complex job in just a fraction of 10ms.
> 
> Even if that statement may be true, what is its relevance?
> Having lrmd report "5ms" instead of "10ms" (or "0ms", I did not check
> if it does ceil, round, or floor the value) makes a difference
> because ...?
> 
> It makes you feel better?

Yes, because if a task needs 7ms to complete, that's neither 0ms, nor 10ms.

> 
> There is no better reason than that, I suppose :)
> 
> As I said: if you want to change how time is tracked in the mainloop queue,
> waiting queue because of "max-children" or for whatever reason, and the
> actual exec time from fork to sigchld processing, that's a specific
> scope, no problem to replace time_longclock there with gettimeofday,
> or whatever you feel gives you the most pleasure
> (getrusage is not it, I dare say).
> If the patch is small and consistent, it may even go upstream ;)
> 
> The point I was trying to make is:
> if you attempt to replace time_longclock altogether,
> be aware that this would have more implications.

If you can get rid of some complex stuff that isn't needed, it's always a win 
IMHO.

Regards,
Ulrich


 
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Q: "exec-time" values

Reply via email to