On Thu, Nov 10, 2016 at 12:37 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Tue, Nov 1, 2016 at 5:06 PM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro >> <thomas.mu...@enterprisedb.com> wrote: >>> [dsa-v3.patch] >> >> Here is a new version which just adds CLOBBER_FREED_MEMORY support to >> dsa_free. > > Here is a new version that fixes a bug I discovered in freepage.c today. > > Details: When dsa_free decides to give back a whole superblock back > to the free page manager for a segment with FreePageManagerPut, and > there was already exactly one span of exactly one free page in that > segment, and the span being 'put' is not adjacent to that existing > free page, then the singleton format must be converted to a btree with > the existing page as root and the newly put span as the sole leaf. > But in that special case we forgot to add the newly put span to the > appropriate free list. Not only did we lose track of it, but a future > call to FreePageManagerPut might try to merge it with another adjacent > span, which will try to manipulate the freelist that it expects it to > be in and blow up. The fix is just to add a call to > FreePagePushSpanLeader in this corner case before the early return.
Since a lot of the design of this patch is mine - from my earlier work on sb_alloc - I don't expect to have a ton of objections to it. And I'd like to get it committed, because other parallelism work depends on it (bitmap heap scan and parallel hash join in particular), and because it's got other uses as well. However, I don't want to be perceived as slamming my code (or that of my colleagues) into the tree without due opportunity for other people to provide feedback, so if anyone has questions, comments, concerns, or review to offer, please do. I think we should develop versions of this that (1) allocate from the main shared memory segment and (2) allocate from backend-private memory. Per my previous benchmarking results, allocating from backend-private memory would be a substantial win for tuplesort.c because this allocator is substantially more memory-efficient for large memory contexts than aset.c, and Tomas Vondra tested it out and found that it is also faster for logical decoding than the approach he proposed. Perhaps that's not an argument for holding up his proposed patches for that problem, but I think it IS a good argument for pressing forward with a backend-private version of this allocator. I'm not saying that should be part of the initial commit of this code, but I think it's a good direction to pursue. One question that we need to resolve is where the code should live in the source tree. When I wrote the original patches upon which this work was based, I think that I put all the code in src/backend/utils/mmgr, since it's all memory-management code. In this patch, Thomas left the free page manager code there, but put the allocator itself in src/backend/storage/ipc. There's a certain logic to that because dynamic shared areas (dsa.c) sit right next to dynamic shared memory (dsm.c) but it feels a bit schizophrenic to have half of the code in storage/ipc and the other half in utils/mmgr. I guess my view is that utils/mmgr is a better fit, because I think that this is basically memory management code that happens to use shared memory, rather than basically IPC that happens to be an allocator. If we decide that this stuff goes in storage/ipc then that's basically saying that everything that uses dynamic shared memory is going to end up in that directory, which seems like a mess. The fact that the free-page manager, at least, could be used for allocations not based on DSM strengthens that argument in my view. Other opinions? The #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P hack in relptr.h, for which I believe I'm responsible, is ugly. There is probably a compiler out there that has __typeof__ but not __builtin_types_compatible_p, and we could cater to that by adding a separate configure test for __typeof__. A little browsing of the documentation on at https://gcc.gnu.org/onlinedocs/ seems to suggest that __builtin_types_compatible_p didn't exist before GCC 3.1, but __typeof__ seems to be there even in 2.95.3. That's not very interesting, honestly, because 3.1 came out in 2002, but there might be non-GCC compilers that implement __typeof__ but not __builtin_types_compatible_p. I am inclined not to worry about this unless somebody feels otherwise, but it's not beautiful. I wonder if it wouldn't be a good idea to allow the dsa_area_control to be stored wherever instead of insisting that it's got to be located inside the first DSM segment backing the pool itself. For example, you could make dsa_create_dynamic() take a third argument which is a pointer to enough space for a dsa_area_control, and it would initialize it in place. Then you could make dsa_attach_dynamic() take a pointer to that same structure instead of taking a dsa_handle. Actually, I think dsa_handle goes away: it becomes the caller's responsibility to figure out the correct pointer address to pass in the second process. The advantage of this design change is that you could stuff the dsa_area_control into the existing parallel DSM and only create additional DSMs if anything is actually allocated. What would be even cooler is to allow a little bit of space inside the parallel DSM that gets used first, and then, when that overflows, we start creating new DSMs. Say, 64kB. Am I sounding greedy yet? It just seems like a good idea not to needlessly multiply the number of DSMs. + /* Unlink span. */ + /* TODO: Does it even need to be linked in in the first place? */ + LWLockAcquire(DSA_SCLASS_LOCK(area, DSA_SCLASS_SPAN_LARGE), + LW_EXCLUSIVE); + unlink_span(area, span); + LWLockRelease(DSA_SCLASS_LOCK(area, DSA_SCLASS_SPAN_LARGE)); In answer to the TODO, I think this isn't strictly necessary, but it seems like a good idea to do it anyway for debuggability. If we didn't do this, the space occupied by a large object wouldn't be "known" in any way other than by having disappeared from the free page map, whereas this way it's linked into the DSA's listed of allocated chunks like anything else, so for example dsa_dump() can print it. I recommend removing this TODO. + /* + * TODO: We could take Max(fpm->contiguous_pages, result of + * FreePageBtreeCleanup) and give it to FreePageManagerUpdatLargest as a + * starting point for its search, potentially avoiding a bunch of work, + * since there is no way the largest contiguous run is bigger than that. + */ Typo: Updat. + /* + * TODO: Figure out how to avoid setting this every time. It may not be as + * simple as it looks. + */ Something isn't right with this function, because it takes the trouble to calculate a value for contiguous_pages that it then doesn't use for anything. I think the original idea here was that if we calculated a value for contiguous_pages that was less than fpm->contiguous_pages, there was no need to dirty it. If we can't get away with that for some reason, then there's no point in calculating the value in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers