I wrote: > FreeBSD 13.0, arm64: Usually the low-order nibble is 0000 or 1111, > but for some smaller values of N it sometimes comes up as 0010. > NetBSD 9.2, amd64: results similar to FreeBSD.
I looked into NetBSD's malloc.c, and what I discovered is that their implementation doesn't have any chunk headers: chunks of the same size are allocated consecutively within pages, and all the bookkeeping data is somewhere else. Presumably FreeBSD is the same. So the apparent special case with 0010 is an illusion, even though I saw it on two different machines (maybe it's a specific value that we're allocating??) The most likely case is 0000 due to the immediately previous word having never been used (note that like palloc, they round chunk sizes up to powers of two, so unused space at the end of a chunk is common). I'm not sure whether the cases I saw with 1111 are chance artifacts or reflect some real mechanism, but probably the former. I thought for a bit that that might be the effects of wipe_mem on the previous chunk, but palloc'd storage would never share the same page as malloc'd storage under this allocator, because we grab it from malloc in larger-than-page chunks. However ... after looking into glib's malloc.c, I find that it does use a chunk header, and very conveniently the three bits that we care about are flag bits (at least on 64-bit machines): chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of previous chunk, if unallocated (P clear) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Size of chunk, in bytes |A|M|P| mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | User data starts here... . The A bit is only used when threading, and hence should always be zero in our usage. The M bit only gets set in chunks large enough to be separately mmap'd, so when it is set P must be 0. If M is not set then P seems to usually be 1, although it could be 0. So the three possibilities for what we can see under glibc are 000, 001, 010 (the last only occuring for chunks larger than 128K). This squares with experimental results on my machine --- I'd not thought to try sizes above 100K before. So I'm still inclined to leave 001 and 010 both unused, but the reason why is different than I thought before. Going forward, we could commandeer 010 if we need to without losing very much debuggability, since malloc'ing more than 128K in a chunk won't happen often. regards, tom lane