Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

pg Tue, 21 Jul 2009 08:10:09 -0700

 Normally I'd try a small lookup table (1-byte index to 1-byte value) in this 
case. But if the bitscan instruction were even close in performance, it'd be 
preferable, due to its more-reliable caching behavior; it should be possible to 
capture this at code-configuration time (aligned so as to produce an optimal 
result for each test case; see below).


The specific code for large-versus-small testing would be useful; did I 
overlook it?

Note that instruction alignment with respect to words is not the only potential 
instruction-alignment issue. In the past, when optimizing code to an extreme, 
I've run into cache-line issues where a small change that should've produced a 
small improvement resulted in a largish performance loss, without further work. 
Lookup tables can have an analogous issue; this could, in a simplistic test, 
explain an anomalous large-better-than-small result, if part of the large 
lookup table remains cached. (Do any modern CPUs attempt to address this??) 
This is difficult to tune in a multiplatform code base, so the numbers in a 
particular benchmark do not tell the whole tale; you'd need to make a judgment 
call, and perhaps to allow a code-configuration override.

David Hudson

Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

Reply via email to