On Sun, 7 Apr 2024 at 05:45, Matthias van de Meent <boekewurm+postg...@gmail.com> wrote: > Malloc's docs specify the minimum chunk size at 4*sizeof(void*) and itself > uses , so using powers of 2 for chunks would indeed fail to detect 1s in the > 4th bit. I suspect you'll get different results when you check the allocation > patterns of multiples of 8 bytes, starting from 40, especially on 32-bit arm > (where MALLOC_ALIGNMENT is 8 bytes, rather than the 16 bytes on i386 and > 64-bit architectures, assuming [0] is accurate)
I'm prepared to be overruled, but I just don't have strong feelings that 32-bit is worth making these reservations for. Especially so given the rate we're filling these slots. The only system that I see the 4th bit change is Cygwin. It doesn't look like a very easy system to protect against pfreeing of malloc'd chunks as the prior 8-bytes seem to vary depending on the malloc'd size and I see all bit patterns there, including the ones we use for our memory contexts. Since we can't protect against every possible bit pattern there, we need to draw the line somewhere. I don't think 32-bit systems are worth reserving these precious slots for. I'd hazard a guess that there are more instances of Postgres running on Windows today than on 32-bit CPUs and we don't seem too worried about the bit-patterns used for Windows. > In your updated 0001, you don't seem to fill the RESERVED_GLIBC memctx array > entries with BOGUS_MCTX(). Oops. Thanks >> Another reason not to make it 5 bits is that I believe that would make >> the mcxt_methods[] array 2304 bytes rather than 576 bytes. 4 bits >> makes it 1152 bytes, if I'm counting correctly. > > > I don't think I understand why this would be relevant when only 5 of the > contexts are actually in use (thus in caches). Is that size concern about TLB > entries then? It's a static const array. I don't want to bloat the binary with something we'll likely never need. If we one day need it, we can reserve another bit using the same overlapping method. >> I revised the patch to simplify hdrmask logic. This started with me >> having trouble finding the best set of words to document that the >> offset is "half the bytes between the chunk and block". So, instead >> of doing that, I've just made it so these two fields effectively >> overlap. The lowest bit of the block offset is the same bit as the >> high bit of what MemoryChunkGetValue returns. > > > Works for me, I suppose. hmm. I don't detect much enthusiasm for it. Personally, I quite like the method as it adds no extra instructions when encoding the MemoryChunk and only a simple bitwise-AND when decoding it. Your method added extra instructions in the encode and decode. I went to great lengths to make this code as fast as possible, so I know which method that I prefer. We often palloc and never do anything that requires the chunk header to be decoded, so not adding extra instructions on the encoding stage is a big win. The only method I see to avoid adding instructions in encoding and decoding is to reduce the bit-space for the MemoryChunkGetValue field to 29 bits. Effectively, that means non-external chunks can only be 512MB rather than 1GB. As far as I know, that just limits slab.c to only being able to do 512MB pallocs as generation.c and aset.c use external chunks well below that threshold. Restricting slab to 512MB is probably far from the end of the world. Anything close to that would be a terrible use case for slab. I was just less keen on using a bit from there as that's a field we allow the context implementation to do what they like with. Having bitspace for 2^30 possible values in there just seems nice given that it can store any possible value from zero up to MaxAllocSize. David