Re: [Pixman] RFC: Pixman benchmark CPU time measurement
On Tue, 2 Jun 2015 15:03:01 -0700 Bill Spitzak spit...@gmail.com wrote: I would have the first call return 0.0 and all the others return the difference between current time and when that first call was done. Then there is no worry about accuracy of floating point. I do not think any callers are interested in the absolute time, only in subtracting two results to get an elapsed time. OpenMP threading must be taken into account, but that'd be doable. It would indeed allow removing the elapsed() function from my proposal. There would be a problem with clock() wrapping around too often, but that's an orthogonal issue. Not sure if cpu time is what the benchmarks want. This does not include blocking waiting for the X server or for the GPU or for reading files. Elapsed real time is probably more useful. This is Pixman, not Cairo. There are no connections to X, no GPUs, and the benchmarks are not reading any files during the benchmark either. Thanks, pq ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] RFC: Pixman benchmark CPU time measurement
On Wed, 3 Jun 2015 08:38:25 +0300 Siarhei Siamashka siarhei.siamas...@gmail.com wrote: On Tue, 2 Jun 2015 10:32:35 +0300 Pekka Paalanen ppaala...@gmail.com wrote: Hi, most pixman performance benchmarks currently rely on gettime() from test/util.[ch]: - lowlevel-blt-bench - prng-test - radial-perf-test - scaling-bench Furthermore, affine-bench has its own gettimei() which is essentially gettime() but with uin32_t instead of double. double gettime (void) { #ifdef HAVE_GETTIMEOFDAY struct timeval tv; gettimeofday (tv, NULL); return (double)((int64_t)tv.tv_sec * 100 + tv.tv_usec) / 100.; #else return (double)clock() / (double)CLOCKS_PER_SEC; #endif } This definition of gettime() has several potential drawbacks: 1. clock() will wrap around often, the manual page warns that in some cases it wraps around every 72 minutes. As the code in Pixman never expects a wraparound, this is subtly broken. This is a fallback path for systems that do not provide gettimeofday(), so it is rarely used if at all. Yes, the clock() fallback is there just to fill the void and make this code compile on the systems, which don't have gettimeofday(). It also measures the CPU time instead of wall-clock time, but this can't be helped. 2. gettimeofday() measures wall-clock time, which might not be the best to measure code performance on a CPU, because all other load in the system will affect the result. It's probably not a significant problem on fast systems where you know to run your benchmarks uncontended. In my opinion, wall-clock time is exactly what we need to measure. Hi Siarhei, any other reason than the below? Btw. you don't need to spend time answering this, I already got the high level feeling I was looking for: nothing to fix here. The rest is just details for the enthusiastic. :-) If you have something else running concurrently in the system, the benchmark results are already screwed. Relying on the OS scheduler to correctly account the CPU time still does not compensate, for example, the cache thrashing effects. Further down below, you give the same arguments why measuring on rpi is inaccurate as I would use here to justify using process time (the kernel drivers and other processes, even if the system is seemingly idle). :-) Basically, with the wall-clock time measurements it is easier to notice that something is wrong if the benchmark is not run uncontended. It happened to me a few times to accidentally run benchmarks with the compilation running in the background. Fortunately, the results were very much off and this was sufficient to raise a suspicion :-) Maybe trying to measure both wall-clock and CPU time at the same time and comparing the results in the end would be an even better safety guard? Would even that detect a contended system if there were multiple CPU cores? 3. gettimeofday() is not only subject to NTP adjustments but is also affected by setting the system clock. IOW, this is not a guaranteed monotonic clock. Again, unlikely to be a problem in most cases, as benchmarks run long enough to even out NTP skews, but short enough to avoid accidentally hitting clock changes. (Or so I would assume.) Yes, this is unfortunate. But this is hardly the worst problem. Yeah, none of these are really bad problems. We could fix just this one by using CLOCK_MONOTONIC when it's available. I'm not completely sure if that requires a runtime check for the existense of CLOCK_MONOTONIC particularly. I think CLOCK_MONOTONIC_RAW would need it. 4. Using double to store an absolute timestamp is suspicious to me. In the end, you always compute the difference between two timestamps, and using a floating point type may cause catastrophic cancellation [1] depending on absolute values. However, [1] also explains that a double is enough. But, given that we read an arbitrary system clock whose starting point is unknown (ok, Epoch for the moment), we might already be getting values too large to maintain the expected accuracy (for floats, sure; for doubles, who knows?) The double type for storing the time in seconds (since Epoch) was selected on purpose. First, there is no accuracy problem. The time in seconds since Epoch is going to fit in a 32-bit value at least until 2038. The mantissa in double has 53 bits. Which leaves us with 53 - 32 = 21 bits for the fractional part. As such, the granularity of the time representation can be estimated as 1 / 2^21 = ~0.12 nanoseconds. Which is roughly equivalent to one clock cycle of a hypothetical 8 GHz processor. The calculations may be off by one here or there, but we are interested in the order of magnitude. This is more than enough. Yes, I totally expected you to have that justification. However, if we change to CLOCK_MONOTONIC or something, the Epoch is no
Re: [Pixman] RFC: Pixman benchmark CPU time measurement
On Wed, 3 Jun 2015 10:51:25 +0300 Pekka Paalanen ppaala...@gmail.com wrote: On Wed, 3 Jun 2015 08:38:25 +0300 Siarhei Siamashka siarhei.siamas...@gmail.com wrote: On Tue, 2 Jun 2015 10:32:35 +0300 Pekka Paalanen ppaala...@gmail.com wrote: I would propose the following: - runtime clock selection with this priority order: 1. clock_gettime(CLOCK_PROCESS_CPUTIME_ID) 2. getrusage(RUSAGE_SELF) - rusage.ru_utime (user time) 3. gettimeofday() 4. clock() Naturally with build time checks, too. For 3 and 4 would print a warning about inaccurate measurements. clock_gettime(CLOCK_MONOTONIC) is not in the list because I would assume getrusage() is more widely available and I'd like to use process time before wall-clock delta. - A separate void init_gettime(void) for choosing the clock. - void gettime(struct timespec *ts) for reading the clock. - double elapsed(const struct timespec *begin, const struct timespec *end) for getting the elapsed time in seconds. This API looks excessively complicated to me. Only if you keep the assumption that using a double for absolute timestamps is always fine. You always compute time intervals anyway. This would even give a nice common place to detect clock wraparounds or going backwards so you'd be guaranteed to never get a negative interval length. If we need runtime detection of available clocks, then having init_gettime() would be useful to detect once instead of per OpenMP thread. Sorry, nevermind this, there is no threading in benchmarks. So, the explicit init call is not necessary, and like Bill pointed out, one could during the first call not just init but also take the base timestamp and always return elapsed time from that. I could keep the API exactly like it is right now. Thanks, pq ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] RFC: Pixman benchmark CPU time measurement
On 06/03/2015 12:36 AM, Pekka Paalanen wrote: On Tue, 2 Jun 2015 15:03:01 -0700 Bill Spitzak spit...@gmail.com wrote: I would have the first call return 0.0 and all the others return the difference between current time and when that first call was done. Then there is no worry about accuracy of floating point. I do not think any callers are interested in the absolute time, only in subtracting two results to get an elapsed time. OpenMP threading must be taken into account, but that'd be doable. It would indeed allow removing the elapsed() function from my proposal. There would be a problem with clock() wrapping around too often, but that's an orthogonal issue. But it would only wrap 72 days after the test started, rather than at some random point during the test. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] RFC: Pixman benchmark CPU time measurement
On Wed, 03 Jun 2015 08:51:25 +0100, Pekka Paalanen ppaala...@gmail.com wrote: If we fixed gettime() for clock() wraparounds rather than ignoring them, I suppose there would be no reason to have gettimei(). Ben? Well, that would help, but I still don't like the idea of using floating point in the middle of benchmarking loops when an integer version works just as well and floating point doesn't really gain you anything. Even the division by 100 is nearly always undone by the time we reach the printf because megapixels per second or megabytes per second are practical units - and those are the same things as pixels or bytes per microsecond. Nobody is currently doing more to the times than adding or subtracting them. I know they're increasingly rare these days, but a machine with no hardware FPU might take an appreciable time to do the integer-floating point conversion and floating point maths. Even if you have an FPU, it might be powered down on each context switch and only have its state restored lazily on the first floating point instruction encountered, resulting in a random timing element. In both cases this can be avoided by sticking to integers. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] RFC: Pixman benchmark CPU time measurement
On Wed, 03 Jun 2015 08:40:37 +0100, Pekka Paalanen ppaala...@gmail.com wrote: On Tue, 02 Jun 2015 17:03:30 +0100 Ben Avison bavi...@riscosopen.org wrote: If I were to make one change to gettimei() now, looking back, it would be to make it return int32_t instead. This is because most often you'd be subtracting two sample outputs of the function, and it's more often useful to consider time intervals as signed (say if you're comparing the current time against a timeout time which may be in the past or the future). If gettimei() returns a signed integer, then C's type promotion rules make the result of the subtraction signed automatically. Wasn't overflow well-defined only for unsigned integers? You're right, I'd forgotten that aspect of C. To be fully portable, a modulo-2^32-arithmetic comparison needs to be written uint32_t time1, time2; if (((time1 - time2) (1u31)) != 0) rather than int32_t time1, time2; if ((time1 - time2) 0) or uint32_t time1, time2; if ((int32_t) (time1 - time2) 0) even though the latter two are more readable and all three are actually equivalent if you're using two's complement integers, which (nearly?) everybody does nowadays. Sometimes I wish C had an inbuilt modulo integer type. At the assembly level for a lot of CPUs it's a third type of comparison with similar ease of use as signed and unsigned comparisons, but it's a pain to express in C. Anyway, this doesn't change the fact that struct timeval currently uses signed long ints, and will enter undefined behaviour territory in 2038. I think it's reasonable to assume that in practice, tv_sec will actually contain an unsigned long int value (albeit one that has been cast to signed) after that, as that would break the least software. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman