Re: [HACKERS] Dynamic shared memory areas
On Tue, Nov 15, 2016 at 5:31 PM, Robert Haas wrote: > I think we should develop versions of this that (1) allocate from the > main shared memory segment and (2) allocate from backend-private > memory. Per my previous benchmarking results, allocating from > backend-private memory would be a substantial win for tuplesort.c > because this allocator is substantially more memory-efficient for > large memory contexts than aset.c, and Tomas Vondra tested it out and > found that it is also faster for logical decoding than the approach he > proposed. The approach that I'd prefer to take with tuplesort.c is to have a buffer for caller tuples that is written to sequentially, and repalloc()'d as needed, much like the memtuples array. It would be slightly tricky to make this work when memtuples needs to be a heap (I'm mostly thinking of top-N heapsorts here). That has perhaps unbeatable efficiency, while also helping cases with significant physical/logical correlation in their input, which is pretty common. Creating an index on a serial PK within pg_restore would probably get notably faster if we went this way. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic shared memory areas
On Fri, Dec 2, 2016 at 3:46 PM, Thomas Munro wrote: > Here's a patch to provide the right format string for dsa_pointer to > printf-like functions, which clears a warning coming from dsa_dump (a > debugging function) on 32 bit systems. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic shared memory areas
On Sat, Dec 3, 2016 at 9:02 AM, Robert Haas wrote: > On Fri, Dec 2, 2016 at 2:56 PM, Robert Haas wrote: >> On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas wrote: >>> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro >>> wrote: Please find attached dsa-v8.patch, and also a small test module for running random allocate/free exercises and dumping the internal allocator state. >>> >>> OK, I've committed the main patch. >> >> ...but the buildfarm isn't very happy about it. >> >> tern complains: >> >> In file included from dsa.c:58:0: >> ../../../../src/include/utils/dsa.h:59:1: error: unknown type name >> 'pg_atomic_uint64' >> typedef pg_atomic_uint64 dsa_pointer_atomic; >> >> ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails >> to be true. And that should always be true unless >> PG_HAVE_ATOMIC_U64_SUPPORT is defined. So apparently tern claims to >> PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define >> pg_atomic_uint64? That doesn't seem right. > > No, that's not the problem. Just a garden variety thinko in dsa.h. > Will push a fix presently. Here's a patch to provide the right format string for dsa_pointer to printf-like functions, which clears a warning coming from dsa_dump (a debugging function) on 32 bit systems. -- Thomas Munro http://www.enterprisedb.com fix-dsa-pointer-format.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic shared memory areas
On Fri, Dec 2, 2016 at 2:56 PM, Robert Haas wrote: > On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas wrote: >> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro >> wrote: >>> Please find attached dsa-v8.patch, and also a small test module for >>> running random allocate/free exercises and dumping the internal >>> allocator state. >> >> OK, I've committed the main patch. > > ...but the buildfarm isn't very happy about it. > > tern complains: > > In file included from dsa.c:58:0: > ../../../../src/include/utils/dsa.h:59:1: error: unknown type name > 'pg_atomic_uint64' > typedef pg_atomic_uint64 dsa_pointer_atomic; > > ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails > to be true. And that should always be true unless > PG_HAVE_ATOMIC_U64_SUPPORT is defined. So apparently tern claims to > PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define > pg_atomic_uint64? That doesn't seem right. No, that's not the problem. Just a garden variety thinko in dsa.h. Will push a fix presently. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic shared memory areas
On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas wrote: > On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro > wrote: >> Please find attached dsa-v8.patch, and also a small test module for >> running random allocate/free exercises and dumping the internal >> allocator state. > > OK, I've committed the main patch. ...but the buildfarm isn't very happy about it. tern complains: In file included from dsa.c:58:0: ../../../../src/include/utils/dsa.h:59:1: error: unknown type name 'pg_atomic_uint64' typedef pg_atomic_uint64 dsa_pointer_atomic; ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails to be true. And that should always be true unless PG_HAVE_ATOMIC_U64_SUPPORT is defined. So apparently tern claims to PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define pg_atomic_uint64? That doesn't seem right. The failures on several other BF members appear to be similar. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic shared memory areas
On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro wrote: > Please find attached dsa-v8.patch, and also a small test module for > running random allocate/free exercises and dumping the internal > allocator state. OK, I've committed the main patch. As far as test-dsa.patch, can we tie that into make check-world so that committing it delivers some buildfarm coverage for this code? Of course, the test settings would have to be fairly conservative given that some buildfarm machines have very limited resources, but it still seems worth doing. test_shm_mq might provide some useful precedent. Note that you don't need the prototype if you've already used PG_FUNCTION_INFO_V1. I'm not sure that using the same random seed every time is a good idea. Maybe you should provide a way to set the seed as part of starting the test, or to not do that (pass NULL?) and then elog(LOG, ...) the seed that's chosen. Then if the BF crashes, we can see what seed was in use for that particular test. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic shared memory areas
On Thu, Dec 1, 2016 at 10:33 PM, Thomas Munro wrote: > > Please find attached dsa-v8.patch, and also a small test module for > running random allocate/free exercises and dumping the internal > allocator state. Moved to next CF with "needs review" status. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Dynamic shared memory areas
More review: + * For large objects, we just stick all of the allocations in fullness class + * 0. Since we can just return the space directly to the free page manager, + * we don't really need them on a list at all, except that if someone wants + * to bulk release everything allocated using this BlockAreaContext, we + * have no other way of finding them. This comment is out-of-date. + /* +* If this is the only span, and there is no active span, then maybe +* we should probably move this span to fullness class 1. (Otherwise +* if you allocate exactly all the objects in the only span, it moves +* to class 3, then you free them all, it moves to 2, and then is +* given back, leaving no active span). +*/ "maybe we should probably" seems to have one more doubt-expressing word than it needs. + if (size_class == DSA_SCLASS_SPAN_LARGE) + /* Large object frees give back segments aggressively already. */ + continue; We generally use braces in this kind of case. +* Search the fullness class 1 only. That is where we expect to find extra "the" + /* Call for effect (we don't need the result). */ + get_segment_by_index(area, index); ... + return area->segment_maps[index].mapped_address + offset; It isn't guaranteed that area->segment_maps[index].mapped_address will be non-NULL on return from get_segment_by_index, and then this function will return a completely bogus pointer to the caller. I think you should probably elog() instead. + elog(ERROR, "dsa: can't attach to area handle %u", handle); Avoid ":" in elog messages. You don't really need to - and it isn't project style to - tag these with "dsa:"; that's what \errverbose or \set VERBOSITY verbose is for. In this particular case, I might just adopt the formulation from parallel.c: ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("could not map dynamic shared memory segment"))); + elog(FATAL, +"dsa couldn't find run of pages: fpm_largest out of sync"); Here I'd go with "dsa could not find %u free pages". + elog(ERROR, "dsa_pin: area already pinned"); "dsa_area already pinned" + elog(ERROR, "dsa_unpin: area not pinned"); "dsa_area not pinned" + if (segment == NULL) + elog(ERROR, "dsa: can't attach to segment"); As above. +static dsa_segment_map * +get_segment_by_index(dsa_area *area, dsa_segment_index index) +{ + if (unlikely(area->segment_maps[index].mapped_address == NULL)) + { + dsm_handle handle; + dsm_segment *segment; + dsa_segment_map *segment_map; + + handle = area->control->segment_handles[index]; Don't you need to acquire the lock for this? + /* Check all currently mapped segments to find what's been freed. */ + for (i = 0; i <= area->high_segment_index; ++i) + { + if (area->segment_maps[i].header != NULL && + area->segment_maps[i].header->freed) + { + dsm_detach(area->segment_maps[i].segment); + area->segment_maps[i].segment = NULL; + area->segment_maps[i].header = NULL; + area->segment_maps[i].mapped_address = NULL; + } + } + area->freed_segment_counter = freed_segment_counter; And this? +/* + * Release a DSA area that was produced by dsa_create_in_place or + * dsa_attach_in_place. It is preferable to use one of the 'dsa_on_XXX' + * callbacks so that this is managed automatically, because failure to release + * an area created in-place leaks its segments permanently. + */ +void +dsa_release_in_place(void *place) +{ + decrement_reference_count((dsa_area_control *) place); +} Since this seems to be the only caller of decrement_reference_count, you could just put the logic here. The contract for this function is also a bit unclear from the header comment. I initially thought that it was your intention that this should be called from every process that has either created or attached the segment. But that doesn't seem like it will work, because decrement_reference_count calls dsm_unpin_segment on every segment, and a segment can only be pinned once, so everybody else would fail. So maybe the idea is that ANY ONE process has to call dsa_release_in_place. But then that could lead to failures in other backends inside get_segment_by_index(), because segments they don't have mapped might already be gone. OK, third try: maybe the idea is that the LAST proce
Re: [HACKERS] Dynamic shared memory areas
On Wed, Nov 23, 2016 at 7:07 AM, Thomas Munro wrote: > Those let you create an area in existing memory (in a DSM segment, > traditional inherited shmem). The in-place versions will stlll create > DSM segments on demand as required, though I suppose if you wanted to > prevent that you could with dsa_set_size_limit(area, size). One > complication is that of course the automatic detach feature doesn't > work if you're in some random piece of memory. I have exposed > dsa_on_dsm_detach, so that there is a way to hook it up to the detach > hook for a pre-existing DSM segment, but that's the caller's > responibility. shm_mq_attach() made the opposite decision about how to solve this problem, and frankly I think that API is a lot more convenient: if the first argument to shm_mq_attach() happens to be located inside of a DSM, you can pass the DSM as the second argument and it registers the on_dsm_detach() hook for you. If not, you can pass NULL and deal with it in some other way. But this makes the common case very simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic shared memory areas
On Thu, Nov 10, 2016 at 12:37 AM, Thomas Munro wrote: > On Tue, Nov 1, 2016 at 5:06 PM, Thomas Munro > wrote: >> On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro >> wrote: >>> [dsa-v3.patch] >> >> Here is a new version which just adds CLOBBER_FREED_MEMORY support to >> dsa_free. > > Here is a new version that fixes a bug I discovered in freepage.c today. > > Details: When dsa_free decides to give back a whole superblock back > to the free page manager for a segment with FreePageManagerPut, and > there was already exactly one span of exactly one free page in that > segment, and the span being 'put' is not adjacent to that existing > free page, then the singleton format must be converted to a btree with > the existing page as root and the newly put span as the sole leaf. > But in that special case we forgot to add the newly put span to the > appropriate free list. Not only did we lose track of it, but a future > call to FreePageManagerPut might try to merge it with another adjacent > span, which will try to manipulate the freelist that it expects it to > be in and blow up. The fix is just to add a call to > FreePagePushSpanLeader in this corner case before the early return. Since a lot of the design of this patch is mine - from my earlier work on sb_alloc - I don't expect to have a ton of objections to it. And I'd like to get it committed, because other parallelism work depends on it (bitmap heap scan and parallel hash join in particular), and because it's got other uses as well. However, I don't want to be perceived as slamming my code (or that of my colleagues) into the tree without due opportunity for other people to provide feedback, so if anyone has questions, comments, concerns, or review to offer, please do. I think we should develop versions of this that (1) allocate from the main shared memory segment and (2) allocate from backend-private memory. Per my previous benchmarking results, allocating from backend-private memory would be a substantial win for tuplesort.c because this allocator is substantially more memory-efficient for large memory contexts than aset.c, and Tomas Vondra tested it out and found that it is also faster for logical decoding than the approach he proposed. Perhaps that's not an argument for holding up his proposed patches for that problem, but I think it IS a good argument for pressing forward with a backend-private version of this allocator. I'm not saying that should be part of the initial commit of this code, but I think it's a good direction to pursue. One question that we need to resolve is where the code should live in the source tree. When I wrote the original patches upon which this work was based, I think that I put all the code in src/backend/utils/mmgr, since it's all memory-management code. In this patch, Thomas left the free page manager code there, but put the allocator itself in src/backend/storage/ipc. There's a certain logic to that because dynamic shared areas (dsa.c) sit right next to dynamic shared memory (dsm.c) but it feels a bit schizophrenic to have half of the code in storage/ipc and the other half in utils/mmgr. I guess my view is that utils/mmgr is a better fit, because I think that this is basically memory management code that happens to use shared memory, rather than basically IPC that happens to be an allocator. If we decide that this stuff goes in storage/ipc then that's basically saying that everything that uses dynamic shared memory is going to end up in that directory, which seems like a mess. The fact that the free-page manager, at least, could be used for allocations not based on DSM strengthens that argument in my view. Other opinions? The #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P hack in relptr.h, for which I believe I'm responsible, is ugly. There is probably a compiler out there that has __typeof__ but not __builtin_types_compatible_p, and we could cater to that by adding a separate configure test for __typeof__. A little browsing of the documentation on at https://gcc.gnu.org/onlinedocs/ seems to suggest that __builtin_types_compatible_p didn't exist before GCC 3.1, but __typeof__ seems to be there even in 2.95.3. That's not very interesting, honestly, because 3.1 came out in 2002, but there might be non-GCC compilers that implement __typeof__ but not __builtin_types_compatible_p. I am inclined not to worry about this unless somebody feels otherwise, but it's not beautiful. I wonder if it wouldn't be a good idea to allow the dsa_area_control to be stored wherever instead of insisting that it's got to be located inside the first DSM segment backing the pool itself. For example, you could make dsa_create_dynamic() take a third argument which is a pointer to enough space for a dsa_area_control, and it would initialize it in place. Then you could make dsa_attach_dynamic() take a pointer to that same structure instead of taking a dsa_handle. Actually, I think dsa_handle goes away: it becomes the caller's responsibility to
Re: [HACKERS] Dynamic shared memory areas
On Wed, Oct 5, 2016 at 3:00 AM, Thomas Munro wrote: > Here's a new version that does that. While testing this patch I found some issue, + total_size = DSA_INITIAL_SEGMENT_SIZE; + total_pages = total_size / FPM_PAGE_SIZE; + metadata_bytes = + MAXALIGN(sizeof(dsa_area_control)) + + MAXALIGN(sizeof(FreePageManager)) + + total_pages * sizeof(dsa_pointer); + /* Add padding up to next page boundary. */ + if (metadata_bytes % FPM_PAGE_SIZE != 0) + metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE); + usable_pages = + (total_size - metadata_bytes) / FPM_PAGE_SIZE; + segment = dsm_create(total_size, 0); + dsm_pin_segment(segment); Actually problem is that size of dsa_area_control is bigger than DSA_INITIAL_SEGMENT_SIZE. but we are allocating segment of DSA_INITIAL_SEGMENT_SIZE size. (gdb) p sizeof(dsa_area_control) $8 = 67111000 (gdb) p DSA_INITIAL_SEGMENT_SIZE $9 = 1048576 In dsa-v1 problem was not exist because DSA_MAX_SEGMENTS was 1024, but in dsa-v2 I think it's calculated wrongly. (gdb) p DSA_MAX_SEGMENTS $10 = 16777216 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers