Re: [HACKERS] [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-19 Thread Robert Haas
On Tue, Jun 30, 2009 at 3:08 AM, Jeremy Kerrj...@ozlabs.org wrote:
 Move the shift-and-test login into a separate fls() function, which
 can use __builtin_clz() if it's available.

 This requires a new check for __builtin_clz in the configure script.

 Results in a ~2% performance increase on PowerPC.

There are some comments on this patch that have been posted to
commitfest.postgresql.org rather than emailed to -hackers.  Note to
all: please don't do this.  The purpose of commitfest.postgresql.org
is to summarize the mailing list discussion for easy reference, not to
replace the mailing lists.

That having been said, Jeremy, you probably want to take a look at
those comments and I have a few responses to them as well.

Dan Colish, the round-robin reviewer assigned to this patch, added the
following comment:
 Applied and built cleanly. Regress passes. Trying to hunt down ppc box to see 
 if performance enhancement
 can be seen.

Question: are we only doing this because of PowerPC?  What is going to
happen on x86 and other architectures?  x86 is not the center of the
universe, but at a very minimum we need to make sure that things don't
go backwards on what is a very common platform.  Has anyone done any
benchmarking of this?

A related question: the original email for this patch says that it
results in a performance increase of about 2% on PPC.  But since it
gives no details on exactly what the test was that improved by 2%,
it's hard to judge the impact of this.  If this means that
AllocSetFreeIndex() is 2% faster, I think we should reject this patch
and move on.  It's not worth introducing architecture-specific code to
get a performance improvement that probably won't even move the needle
on a macro-benchmark.  On the other hand, if the test was something
like a pgbench run, then this is really very impressive.  But we don't
know which it is.

Zdenek Kotala added this comment:
 I have several objections:

 - inline is forbidden to use in PostgreSQL - you need exception or do it 
 differently

 - type mismatch - Size vs unsigned int vs 32. you should use only Size and 
 sizeof(Size)

 And general comment:

 Do we want to have this kind of code which is optimized for one compiler and 
 platform. See openSSL or
 MySQL, they have many optimizations which work fine on one platform with 
 specific version of compiler and
 specific version of OS. But if you select different version of compiler or 
 different compiler you can get very
 different performance result (e.g. MySQL 30% degradation, OpenSSL RSA 3x 
 faster and so on).

 I think in this case, it makes sense to have optimization here, but we should 
 do it like spinlocks are
 implemented and put this code into /port. It should help to implemented 
 special code for SPARC and SUN
 Studio for example.

I don't have any thoughts on this part beyond what I stated above, but
hopefully Jeremy does.

I am going to mark this Waiting on author pending a response to all
of the above, though hopefully Dan Colish will continue with his
reviewing efforts in the meantime.

Thanks,

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-19 Thread Stefan Kaltenbrunner

Robert Haas wrote:

On Tue, Jun 30, 2009 at 3:08 AM, Jeremy Kerrj...@ozlabs.org wrote:

Move the shift-and-test login into a separate fls() function, which
can use __builtin_clz() if it's available.

This requires a new check for __builtin_clz in the configure script.

Results in a ~2% performance increase on PowerPC.


There are some comments on this patch that have been posted to
commitfest.postgresql.org rather than emailed to -hackers.  Note to
all: please don't do this.  The purpose of commitfest.postgresql.org
is to summarize the mailing list discussion for easy reference, not to
replace the mailing lists.

That having been said, Jeremy, you probably want to take a look at
those comments and I have a few responses to them as well.

Dan Colish, the round-robin reviewer assigned to this patch, added the
following comment:

Applied and built cleanly. Regress passes. Trying to hunt down ppc box to see 
if performance enhancement
can be seen.


Question: are we only doing this because of PowerPC?  What is going to
happen on x86 and other architectures?  x86 is not the center of the
universe, but at a very minimum we need to make sure that things don't
go backwards on what is a very common platform.  Has anyone done any
benchmarking of this?

A related question: the original email for this patch says that it
results in a performance increase of about 2% on PPC.  But since it
gives no details on exactly what the test was that improved by 2%,
it's hard to judge the impact of this.  If this means that
AllocSetFreeIndex() is 2% faster, I think we should reject this patch
and move on.  It's not worth introducing architecture-specific code to
get a performance improvement that probably won't even move the needle
on a macro-benchmark.  On the other hand, if the test was something
like a pgbench run, then this is really very impressive.  But we don't
know which it is.


There are numbers in the original thread this patch 
http://archives.postgresql.org/pgsql-hackers/2009-06/msg00159.php 
resulted in.


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-19 Thread Jeremy Kerr
Hi Robert,

 That having been said, Jeremy, you probably want to take a look at
 those comments and I have a few responses to them as well.

OK, thanks for the heads-up.

 following comment:
  Applied and built cleanly. Regress passes. Trying to hunt down ppc
  box to see if performance enhancement can be seen.

 Question: are we only doing this because of PowerPC?

No, this patch will benefit any architecture that has a gcc 
implementation of __builtin_clz. I know that both x86 and powerpc gcc 
support this.

 What is going to happen on x86 and other architectures?  x86 is not
 the center of the universe, but at a very minimum we need to make sure
 that things don't go backwards on what is a very common platform.  Has
 anyone done any benchmarking of this?

Yes, Atsushi Ogawa did some benchmarking with this patch on x86, his 
numbers are here:

 http://archives.postgresql.org/message-id/4a2895c4.9050...@hi-ho.ne.jp

In fact, Atushi originally submitted a patch using inline asm (using 
bsr) to do this on x86. Coincidentally, I was working on a powerpc 
implementation (using cntlz) at the same time, so submitted a patch 
using the gcc builtin that would work on both (and possibly other) 
architectures.

 A related question: the original email for this patch says that it
 results in a performance increase of about 2% on PPC.  But since it
 gives no details on exactly what the test was that improved by 2%,
 it's hard to judge the impact of this.  If this means that
 AllocSetFreeIndex() is 2% faster, I think we should reject this patch
 and move on.  It's not worth introducing architecture-specific code
 to get a performance improvement that probably won't even move the
 needle on a macro-benchmark.  On the other hand, if the test was
 something like a pgbench run, then this is really very impressive. 
 But we don't know which it is.

The 2% improvement I saw is for a sysbench OLTP run. I'm happy to re-do 
the run and report specific numbers if you need them.

 Zdenek Kotala added this comment:
  I have several objections:
 
  - inline is forbidden to use in PostgreSQL - you need exception or
  do it differently
 
  - type mismatch - Size vs unsigned int vs 32. you should use only
  Size and sizeof(Size)

OK, happy to make these changes. What's the commitfest process here, 
should I redo the patch and re-send to -hackers?

(inline again: should I just make this a static, the compiler can inline 
where possible? or do you want a macro?)

  And general comment:
 
  Do we want to have this kind of code which is optimized for one
  compiler and platform.

One compiler, multiple platforms.

  See openSSL or MySQL, they have many
  optimizations which work fine on one platform with specific version
  of compiler and specific version of OS. But if you select different
  version of compiler or different compiler you can get very
  different performance result (e.g. MySQL 30% degradation, OpenSSL
  RSA 3x faster and so on).

I don't think we're going to see a lot of variation here (besides the 
difference where __builtin_clz isn't available). Given that this is 
implemented with a couple of instructions, I'm confident that we won't 
see any degradation for platforms that support __builtin_clz. For the 
other cases, we just use the exiting while-loop algorithm so there 
should be no change (unless we mess up the inlining...).

  I think in this case, it makes sense to have optimization here, but
  we should do it like spinlocks are implemented and put this code
  into /port.

Unless I'm missing something, spinlocks are done the same way - we have 
one file, src/include/storage/s_lock.h, which is mostly different 
implementations of the lock primitives for different platforms, 
separated by preprocessor tests.

Essentially, this is just one line of code, protected by 
HAVE_BUILTIN_CLZ (which is a feature-test, rather than a specific 
platform or compiler test), and is only used in one place. I don't think 
that warrants a separate file.

Cheers,


Jeremy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-19 Thread Tom Lane
Jeremy Kerr j...@ozlabs.org writes:
 - inline is forbidden to use in PostgreSQL - you need exception or
 do it differently

 (inline again: should I just make this a static, the compiler can inline 
 where possible? or do you want a macro?)

I don't know where Zdenek got the idea that we have something against
inline.

So far as I can see, recent versions of gcc claim to support
__builtin_clz on all supported architectures.  On some it might be no
faster than our existing loop, but it seems unlikely to be slower.

The two comments I have are

* do something other than the hardwired 32 for word size; perhaps
sizeof(int) * BITS_PER_BYTE.

* do not use the separate fls function.  On a compiler that fails to
inline it, this patch would be a net performance loss, which we're not
likely to tolerate for a patch that has no other reason to live than
performance.  Just #if the builtin right into the one place where it
will be used.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers