Tom Lane wrote:
"Michael Paesold" <[EMAIL PROTECTED]> writes:
To have other data, I have retested the patches on a single-cpu Intel P4
3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4 dual-Xeon results it's clear that this is in reality only one cpu. While the runtime
for N=1 is better than the other system, for N=4 it's already worse. The
situation with the patches is quite different, though. Unfortunatly.

CVS tip from 2005-09-12:
1: 36s   2: 77s (cpu ~85%)    4: 159s (cpu ~98%)

only slock-no-cmpb:
1: 36s   2: 81s (cpu ~79%)    4: 177s (cpu ~94%)
(doesn't help this time)

Hm.  This is the first configuration we've seen in which slock-no-cmpb
was a loss.  Could you double-check that result?

The first tests were compiled with CFLAGS='-O2 -mcpu=pentium4 -march=pentium4'. I had redone the tests with just CFLAGS='-O2' yesterday. The difference was just about a second, but the result from the patch was the same. The results for N=4 and N=8 show the positive effect more clearly.

configure: CFLAGS='-O2' --enable-casserts
On RHEL 4.1, gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.1)

CVS tip from 2005-09-12:
1: 37s   2: 78s      4: 159s       8: 324

only slock-no-cmpb:
1: 37s   2: 82s (5%) 4: 178s (12%) 8: 362 (12%)

configure:  --enable-casserts

(Btw. I have always done "make clean ; make ; make install" between tests)

Best Regards,
Michael Paesold

I can't see any reasonable way to do runtime switching of the cmpb test
--- whatever logic we put in to control it would cost as much or more
than the cmpb anyway :-(.  I think that has to be a compile-time choice.
From my perspective it'd be acceptable to remove the cmpb only for
x86_64, since only there does it seem to be a really significant win.
On the other hand it seems that removing the cmpb is a net win on most
x86 setups too, so maybe we should just do it and accept that there are
some cases where it's not perfect.

How many test cases do we have yet?
Summary of the effects without the cmpb instruction seems to be:

8-way Opteron:           better
Dual/HT Xeon w/o EM64T:  better
Dual/HT EM64T:           better for N<=cpus, worse for N>cpus (Stephen's)
HT P4 w/o EM64T:         worse (stronger for N>cpus)

Have I missed other reports that did test the slock-no-cmpb.patch alone?
Two of the systems with positive effects are x86_64, one is an older high-end Intel x86 chip. The negative effect is on a low-cost Pentium 4 with only hyper threading. According to the mentions thread's title, this was an optimization for hyperthreading, not regular multi-cpus.

We could have more data, especially on newer and high-end systems. Could some of you test the slock-no-cmpb.patch? You'll need an otherwise idle system to get repeatable results.

http://archives.postgresql.org/pgsql-hackers/2005-09/msg00565.php
http://archives.postgresql.org/pgsql-hackers/2005-09/msg00566.php

I have re-attached the relevant files from Tom's posts because in the archive it's not clear anymore what should go into which file. See instructions in the first messages above.

The patch applies to CVS tip with
patch -p1 < slock-no-cmpb.patch

Best Regards,
Michael Paesold

Attachment: slock-no-cmpb.patch
Description: Binary data

Attachment: test_setup.sql
Description: Binary data

Attachment: test_run_small.sql
Description: Binary data

Attachment: startn.sh
Description: Binary data

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to