On Fri, 11 Nov 2022 at 22:20, John Naylor <john.nay...@enterprisedb.com> wrote: > > > On Wed, Oct 12, 2022 at 4:37 PM David Rowley <dgrowle...@gmail.com> wrote: > > [v3] > > + /* > + * Compute a shift that guarantees that shifting chunksPerBlock with it > + * yields is smaller than SLAB_FREELIST_COUNT - 1 (one freelist is used for > full blocks). > + */ > + slab->freelist_shift = 0; > + while ((slab->chunksPerBlock >> slab->freelist_shift) >= > (SLAB_FREELIST_COUNT - 1)) > + slab->freelist_shift++; > > + /* > + * Ensure, without a branch, that index 0 is only used for blocks entirely > + * without free chunks. > + * XXX: There probably is a cheaper way to do this. Needing to shift twice > + * by slab->freelist_shift isn't great. > + */ > + index = (freecount + (1 << slab->freelist_shift) - 1) >> > slab->freelist_shift; > > How about something like > > #define SLAB_FREELIST_COUNT ((1<<3) + 1) > index = (freecount & (SLAB_FREELIST_COUNT - 2)) + (freecount != 0);
Doesn't this create a sort of round-robin use of the free list? What we want is a sort of "histogram" bucket set of free lists so we can group together blocks that have a close-enough free number of chunks. Unless I'm mistaken, I think what you have doesn't do that. I wondered if simply: index = -(-freecount >> slab->freelist_shift); would be faster than Andres' version. I tried it out and on my AMD machine, it's about the same speed. Same on a Raspberry Pi 4. Going by [2], the instructions are very different with each method, so other machines with different latencies on those instructions might show something different. I attached what I used to test if anyone else wants a go. AMD Zen2 $ ./freecount 2000000000 Test 'a' in 0.922766 seconds Test 'd' in 0.922762 seconds (0.000433% faster) RPI4 $ ./freecount 2000000000 Test 'a' in 3.341350 seconds Test 'd' in 3.338690 seconds (0.079672% faster) That was gcc. Trying it with clang, it went in a little heavy-handed and optimized out my loop, so some more trickery might be needed for a useful test on that compiler. David [2] https://godbolt.org/z/dh95TohEG
#include <stdio.h> #include <time.h> #include <stdlib.h> #define SLAB_FREELIST_COUNT 9 int main(int argc, char **argv) { clock_t start, end; double v1_time, v2_time; int freecount; int freelist_shift = 0; int chunksPerBlock; int index; int zerocount = 0; if (argc < 2) { printf("Syntax: %s <chunksPerBlock>\n", argv[0]); return -1; } chunksPerBlock = atoi(argv[1]); printf("chunksPerBlock = %d\n", chunksPerBlock); while ((chunksPerBlock >> freelist_shift) >= (SLAB_FREELIST_COUNT - 1)) freelist_shift++; printf("freelist_shift = %d\n", freelist_shift); start = clock(); for (freecount = 0; freecount <= chunksPerBlock; freecount++) { index = (freecount + (1 << freelist_shift) - 1) >> freelist_shift; /* try to prevent optimizing the above out */ if (index == 0) zerocount++; } end = clock(); v1_time = (double) (end - start) / CLOCKS_PER_SEC; printf("Test 'a' in %f seconds\n", v1_time); printf("zerocount = %d\n", zerocount); zerocount = 0; start = clock(); for (freecount = 0; freecount <= chunksPerBlock; freecount++) { index = -(-freecount >> freelist_shift); /* try to prevent optimizing the above out */ if (index == 0) zerocount++; } end = clock(); v2_time = (double) (end - start) / CLOCKS_PER_SEC; printf("Test 'd' in %f seconds (%f%% faster)\n", v2_time, v1_time / v2_time * 100.0 - 100.0); printf("zerocount = %d\n", zerocount); return 0; }