Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 17 August 2017 at 01:20, Heikki Linnakangas wrote: > Looks reasonable. I edited the comments and the variable names a bit, to my > liking, and committed. Thanks! Thanks for committing. I've just been catching up on all that went on while I was sleeping. Thanks for handling the cleanup too. I'll feel better once pademelon goes green again. From looking at the latest failure on it, it appears that your swapping of pg_atomic_write_u64 for pg_atomic_init_u64 should fix this. My apologies for that mistake. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Andres Freund writes: > On August 16, 2017 3:09:27 PM PDT, Tom Lane wrote: >> I wonder whether it's sensible to have --enable-cassert have the effect >> of filling memory allocated by ShmemAlloc or the DSA code with junk (as >> palloc does) instead of leaving it at zeroes. It's not modeling the >> same kind of effect, since we have no shmem-freeing primitives, but >> it might be useful for this sort of thing. > We kind of do - crash restarts... So yes, that's probably a good idea. Crash restart releases the shmem segment and acquires a new one, doesn't it? Or am I misremembering? I thought that it did do so, if only to make darn sure that no old processes remain connected to shmem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On August 16, 2017 3:09:27 PM PDT, Tom Lane wrote: >Heikki Linnakangas writes: >> On 08/17/2017 12:20 AM, Tom Lane wrote: >>> Indeed, gaur fails with >>> 2017-08-16 17:09:38.315 EDT [13043:11] PANIC: stuck spinlock >detected at pg_atomic_compare_exchange_u64_impl, atomics.c:196 > >> I was able to reproduce this locally, with --disable-atomics, but >only >> after hacking it to fill the struct with garbage, before initializing > >> it. IOW, it failed to fail, because the spinlock happened to be >> initialized correctly by accident. Perhaps that's happening on >piculet, too. > >Oh, right. HPPA is unique among our platforms, I think, in that the >"unlocked" state of a spinlock is not "all zeroes". So if you're >dealing >with pre-zeroed memory, which shmem generally would be, failing to >initialize a spinlock does not cause visible errors except on HPPA. > >I wonder whether it's sensible to have --enable-cassert have the effect >of filling memory allocated by ShmemAlloc or the DSA code with junk (as >palloc does) instead of leaving it at zeroes. It's not modeling the >same kind of effect, since we have no shmem-freeing primitives, but >it might be useful for this sort of thing. We kind of do - crash restarts... So yes, that's probably a good idea. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Heikki Linnakangas writes: > On 08/17/2017 12:20 AM, Tom Lane wrote: >> Indeed, gaur fails with >> 2017-08-16 17:09:38.315 EDT [13043:11] PANIC: stuck spinlock detected at >> pg_atomic_compare_exchange_u64_impl, atomics.c:196 > I was able to reproduce this locally, with --disable-atomics, but only > after hacking it to fill the struct with garbage, before initializing > it. IOW, it failed to fail, because the spinlock happened to be > initialized correctly by accident. Perhaps that's happening on piculet, too. Oh, right. HPPA is unique among our platforms, I think, in that the "unlocked" state of a spinlock is not "all zeroes". So if you're dealing with pre-zeroed memory, which shmem generally would be, failing to initialize a spinlock does not cause visible errors except on HPPA. I wonder whether it's sensible to have --enable-cassert have the effect of filling memory allocated by ShmemAlloc or the DSA code with junk (as palloc does) instead of leaving it at zeroes. It's not modeling the same kind of effect, since we have no shmem-freeing primitives, but it might be useful for this sort of thing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 08/17/2017 12:20 AM, Tom Lane wrote: Andres Freund writes: On 2017-08-16 16:20:28 +0300, Heikki Linnakangas wrote: + pg_atomic_write_u64(&target->phs_nallocated, 0); It's not ok to initialize an atomic with pg_atomic_write_u64 - works well enough for "plain" atomics, but the fallback implementation isn't ok with it. You're probably going to get a failure on the respective buildfarm animal soon. Indeed, gaur fails with 2017-08-16 17:09:38.315 EDT [13043:11] PANIC: stuck spinlock detected at pg_at\ omic_compare_exchange_u64_impl, atomics.c:196 2017-08-16 17:09:38.315 EDT [13043:12] STATEMENT: select count(*) from a_star; 2017-08-16 17:09:40.350 EDT [12437:3] LOG: server process (PID 13043) was term\ inated by signal 6 2017-08-16 17:09:40.350 EDT [12437:4] DETAIL: Failed process was running: sele\ ct count(*) from a_star; and I'm sure pademelon will fail once it gets to that. Fixed. I thought we had other buildfarm animals testing the fallback path, though? Yes, at least piculet is building with --disable-atomics. I was able to reproduce this locally, with --disable-atomics, but only after hacking it to fill the struct with garbage, before initializing it. IOW, it failed to fail, because the spinlock happened to be initialized correctly by accident. Perhaps that's happening on piculet, too. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Andres Freund writes: > On 2017-08-16 16:20:28 +0300, Heikki Linnakangas wrote: > + pg_atomic_write_u64(&target->phs_nallocated, 0); > It's not ok to initialize an atomic with pg_atomic_write_u64 - works > well enough for "plain" atomics, but the fallback implementation isn't > ok with it. You're probably going to get a failure on the respective > buildfarm animal soon. Indeed, gaur fails with 2017-08-16 17:09:38.315 EDT [13043:11] PANIC: stuck spinlock detected at pg_at\ omic_compare_exchange_u64_impl, atomics.c:196 2017-08-16 17:09:38.315 EDT [13043:12] STATEMENT: select count(*) from a_star; 2017-08-16 17:09:40.350 EDT [12437:3] LOG: server process (PID 13043) was term\ inated by signal 6 2017-08-16 17:09:40.350 EDT [12437:4] DETAIL: Failed process was running: sele\ ct count(*) from a_star; and I'm sure pademelon will fail once it gets to that. I thought we had other buildfarm animals testing the fallback path, though? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Heikki Linnakangas writes: > On 08/16/2017 09:00 PM, Tom Lane wrote: >> I don't buy that argument. A caller might think "Why do I need >> shm_toc_estimate, when I can compute the *exact* size I need?". >> And it would have worked, up till this proposed patch. > Well, no. The size of the shm_toc struct is subtracted from the size > that you give to shm_toc_create. As well as the sizes of the TOC > entries. And those sizes are private to shm_toc.c, so a caller has no > way to know what size it should pass to shm_toc_create(), in order to > have N bytes of space actually usable. You really need to use > shm_toc_estimate() if you want any guarantees on what will fit. Good point --- objection withdrawn. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 2017-08-16 16:20:28 +0300, Heikki Linnakangas wrote: > On 05/06/2017 04:57 PM, David Rowley wrote: > > Andres mentioned in [2] that it might be worth exploring using atomics > > to do the same job. So I went ahead and did that, and came up with the > > attached, which is a slight variation on what he mentioned in the > > thread. > > > > To keep things a bit more simple, and streamline, I ended up pulling > > out the logic for setting the startblock into another function, which > > we only call once before the first call to > > heap_parallelscan_nextpage(). I also ended up changing phs_cblock and > > replacing it with a counter that always starts at zero. The actual > > block is calculated based on that + the startblock modulo nblocks. > > This makes things a good bit more simple for detecting when we've > > allocated all the blocks to the workers, and also works nicely when > > wrapping back to the start of a relation when we started somewhere in > > the middle due to piggybacking with a synchronous scan. > Looks reasonable. I edited the comments and the variable names a bit, to my > liking, and committed. Thanks! Brief post-commit review: +* phs_nallocated tracks how many pages have been allocated to workers +* already. When phs_nallocated >= rs_nblocks, all blocks have been +* allocated. allocated seems a bit of a confusing terminology. @@ -1635,8 +1637,8 @@ heap_parallelscan_initialize(ParallelHeapScanDesc target, Relation relation, !RelationUsesLocalBuffers(relation) && target->phs_nblocks > NBuffers / 4; SpinLockInit(&target->phs_mutex); - target->phs_cblock = InvalidBlockNumber; target->phs_startblock = InvalidBlockNumber; + pg_atomic_write_u64(&target->phs_nallocated, 0); SerializeSnapshot(snapshot, target->phs_snapshot_data); } It's not ok to initialize an atomic with pg_atomic_write_u64 - works well enough for "plain" atomics, but the fallback implementation isn't ok with it. You're probably going to get a failure on the respective buildfarm animal soon. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 2017-08-16 14:09:08 -0400, Tom Lane wrote: > >> I'm not sure that that's good enough, and I'm damn sure that it > >> shouldn't be undocumented. > > > 8 byte alignment would be good enough, so BUFFERALIGN ought to be > > sufficient. But it'd be nicer to have a separate more descriptive knob. > > What I meant by possibly not good enough is that pg_atomic_uint64 used > in other places isn't going to be very safe. Well, it's not used otherwise in core so far, leaving test code aside. It's correctly aligned if part of a aligned struct - the atomics code itself can't really do anything about aligning that struct itself isn't aligned. > We might be effectively all right as long as we have a coding rule that > pg_atomic_uint64 can only be placed in memory handed out by ShmemAlloc > or shm_toc_allocate, which both have bigger-than-MAXALIGN alignment > practices. But this needs to be documented. Well, one could argue the alignment checks in every function are that :). But yea, we probably should mention it more than that. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 08/16/2017 09:00 PM, Tom Lane wrote: Robert Haas writes: On Wed, Aug 16, 2017 at 1:44 PM, Tom Lane wrote: I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a different reason: if the caller has specified the exact amount of space it needs, having shm_toc_create discard some could lead to an unexpected failure. Well, that's why Heikki also patched shm_toc_estimate. With the patch, if the size being used in shm_toc_create comes from shm_toc_estimate, it will always be aligned and nothing bad will happen. Sure. So the point is entirely about what should happen if someone doesn't use shm_toc_estimate. If the user invents another size out of whole cloth, then they might get a few bytes less than they expect, but that's what you get for not using shm_toc_estimate(). I don't buy that argument. A caller might think "Why do I need shm_toc_estimate, when I can compute the *exact* size I need?". And it would have worked, up till this proposed patch. Well, no. The size of the shm_toc struct is subtracted from the size that you give to shm_toc_create. As well as the sizes of the TOC entries. And those sizes are private to shm_toc.c, so a caller has no way to know what size it should pass to shm_toc_create(), in order to have N bytes of space actually usable. You really need to use shm_toc_estimate() if you want any guarantees on what will fit. I've pushed the patch to fix this, with some extra comments and reformatting shm_toc_estimate. 8 byte alignment would be good enough, so BUFFERALIGN ought to be sufficient. But it'd be nicer to have a separate more descriptive knob. What I meant by possibly not good enough is that pg_atomic_uint64 used in other places isn't going to be very safe. We might be effectively all right as long as we have a coding rule that pg_atomic_uint64 can only be placed in memory handed out by ShmemAlloc or shm_toc_allocate, which both have bigger-than-MAXALIGN alignment practices. But this needs to be documented. Yeah. We are implicitly also relying on ShmemAlloc() to return sufficiently-aligned memory. Palloc() too, although you probably wouldn't use atomic ops on a palloc'd struct. I think we should introduce a new ALIGNOF macro for that, and document that those functions return memory with enough alignment. GENUINEMAX_ALIGNOF? MAXSTRUCT_ALIGNOF? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Andres Freund writes: > On 2017-08-16 13:40:09 -0400, Tom Lane wrote: >> I was wondering why the shm_toc code was using BUFFERALIGN and not >> MAXALIGN, and I now suspect that the answer is "it's an entirely >> undocumented kluge to make the atomics code not crash on 32-bit >> machines, so long as nobody puts a pg_atomic_uint64 anywhere except in >> a shm_toc". > I don't think there were any atomics in affected code until earlier > today... And given it didn't work for shm_toc anyway, I'm not quite > following. Right, Robert pointed out that it's pre-existing code. My point should be read as "it's just blind luck that shm_toc is using bigger than MAXALIGN alignment, or this would never work on 32-bit machines". >> I'm not sure that that's good enough, and I'm damn sure that it >> shouldn't be undocumented. > 8 byte alignment would be good enough, so BUFFERALIGN ought to be > sufficient. But it'd be nicer to have a separate more descriptive knob. What I meant by possibly not good enough is that pg_atomic_uint64 used in other places isn't going to be very safe. We might be effectively all right as long as we have a coding rule that pg_atomic_uint64 can only be placed in memory handed out by ShmemAlloc or shm_toc_allocate, which both have bigger-than-MAXALIGN alignment practices. But this needs to be documented. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Robert Haas writes: > On Wed, Aug 16, 2017 at 1:44 PM, Tom Lane wrote: >> I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a >> different reason: if the caller has specified the exact amount of space it >> needs, having shm_toc_create discard some could lead to an unexpected >> failure. > Well, that's why Heikki also patched shm_toc_estimate. With the > patch, if the size being used in shm_toc_create comes from > shm_toc_estimate, it will always be aligned and nothing bad will > happen. Sure. So the point is entirely about what should happen if someone doesn't use shm_toc_estimate. > If the user invents another size out of whole cloth, then > they might get a few bytes less than they expect, but that's what you > get for not using shm_toc_estimate(). I don't buy that argument. A caller might think "Why do I need shm_toc_estimate, when I can compute the *exact* size I need?". And it would have worked, up till this proposed patch. I think the new API spec for such cases ought to be "if you supply an unaligned size, we'll error out", not "if you supply an unaligned size, we'll silently lose some of it". The former is going to behave a lot more predictably. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On August 16, 2017 10:47:23 AM PDT, Robert Haas wrote: >On Wed, Aug 16, 2017 at 1:40 PM, Tom Lane wrote: >> I was wondering why the shm_toc code was using BUFFERALIGN and not >> MAXALIGN, and I now suspect that the answer is "it's an entirely >> undocumented kluge to make the atomics code not crash on 32-bit >> machines, so long as nobody puts a pg_atomic_uint64 anywhere except >> in a shm_toc". > >Well, shm_toc considerably predates 64-bit atomics, so I think the >causality cannot run in that direction. shm_toc.c first appeared in >the tree in January of 2014. src/include/port/atomics didn't show up >until September of that year, and 64-bit atomics weren't actually >usable in practice until e8fdbd58fe564a29977f4331cd26f9697d76fc40 in >April of 2017. Well, not for core code. I certainly know about production code using it, because crusty platforms are considered irrelevant... Independent of that, a comment explaining what the BUFFERALIGN is intending would be good. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On Wed, Aug 16, 2017 at 1:44 PM, Tom Lane wrote: > I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a > different reason: if the caller has specified the exact amount of space it > needs, having shm_toc_create discard some could lead to an unexpected > failure. Well, that's why Heikki also patched shm_toc_estimate. With the patch, if the size being used in shm_toc_create comes from shm_toc_estimate, it will always be aligned and nothing bad will happen. If the user invents another size out of whole cloth, then they might get a few bytes less than they expect, but that's what you get for not using shm_toc_estimate(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 2017-08-16 13:44:28 -0400, Tom Lane wrote: > Andres Freund writes: > > Don't think we require BUFFERALIGN - MAXALIGN ought to be > > sufficient. > > Uh, see my other message just now. Yup, you're right. > > The use of BUFFERALIGN presumably is to space out things > > into different cachelines, but that doesn't really seem to be important > > with this. Then we can just avoid defining the new macro... > > I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a > different reason: if the caller has specified the exact amount of space it > needs, having shm_toc_create discard some could lead to an unexpected > failure. Well, that's why shm_toc_estimate() increases the size appropriately. > I wonder whether maybe shm_toc_create should just error out if the > number it's handed isn't aligned already. I think that's going to be harder atm, because it's not the size shm_toc computes, it's what the caller to shm_toc_estimate_chunk() provides. And that size is already guaranteed to be upsized by BUFFERALIGN in shm_toc_estimate_chunk(). It's just that the base-offset from where the allocations start isn't aligned. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On Wed, Aug 16, 2017 at 1:40 PM, Tom Lane wrote: > I was wondering why the shm_toc code was using BUFFERALIGN and not > MAXALIGN, and I now suspect that the answer is "it's an entirely > undocumented kluge to make the atomics code not crash on 32-bit > machines, so long as nobody puts a pg_atomic_uint64 anywhere except > in a shm_toc". Well, shm_toc considerably predates 64-bit atomics, so I think the causality cannot run in that direction. shm_toc.c first appeared in the tree in January of 2014. src/include/port/atomics didn't show up until September of that year, and 64-bit atomics weren't actually usable in practice until e8fdbd58fe564a29977f4331cd26f9697d76fc40 in April of 2017. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 2017-08-16 13:40:09 -0400, Tom Lane wrote: > I wrote: > > I can confirm that on dromedary, that regression test case is attempting > > to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes. > > ... although, on closer look, it still seems like we have a fundamental > bit of schizophrenia here, because on this machine > > $ grep ALIGN pg_config.h > #define ALIGNOF_DOUBLE 4 > #define ALIGNOF_INT 4 > #define ALIGNOF_LONG 4 > #define ALIGNOF_LONG_LONG_INT 4 > #define ALIGNOF_SHORT 2 > #define MAXIMUM_ALIGNOF 4 > > Basically, therefore, ISTM that it is not a good thing that the atomics > code thinks it can rely on 8-byte-aligned data when the entire rest of > the system believes that 4-byte alignment is enough for anything. That's a hardware requirement, we can't do much about it. Several [micro-]architectures don't support unaligned atomic 8 byte writes. > I was wondering why the shm_toc code was using BUFFERALIGN and not > MAXALIGN, and I now suspect that the answer is "it's an entirely > undocumented kluge to make the atomics code not crash on 32-bit > machines, so long as nobody puts a pg_atomic_uint64 anywhere except in > a shm_toc". I don't think there were any atomics in affected code until earlier today... And given it didn't work for shm_toc anyway, I'm not quite following. > I'm not sure that that's good enough, and I'm damn sure that it > shouldn't be undocumented. 8 byte alignment would be good enough, so BUFFERALIGN ought to be sufficient. But it'd be nicer to have a separate more descriptive knob. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Andres Freund writes: > Don't think we require BUFFERALIGN - MAXALIGN ought to be > sufficient. Uh, see my other message just now. > The use of BUFFERALIGN presumably is to space out things > into different cachelines, but that doesn't really seem to be important > with this. Then we can just avoid defining the new macro... I was feeling a bit uncomfortable with the BUFFERALIGN_DOWN() for a different reason: if the caller has specified the exact amount of space it needs, having shm_toc_create discard some could lead to an unexpected failure. I wonder whether maybe shm_toc_create should just error out if the number it's handed isn't aligned already. >> +return BUFFERALIGN( >> +add_size(offsetof(shm_toc, toc_entry), >> + add_size(mul_size(e->number_of_keys, >> sizeof(shm_toc_entry)), >> + e->space_for_chunks))); > I think splitting this into separate statements would be better. +1, it was too complicated already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
I wrote: > I can confirm that on dromedary, that regression test case is attempting > to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes. ... although, on closer look, it still seems like we have a fundamental bit of schizophrenia here, because on this machine $ grep ALIGN pg_config.h #define ALIGNOF_DOUBLE 4 #define ALIGNOF_INT 4 #define ALIGNOF_LONG 4 #define ALIGNOF_LONG_LONG_INT 4 #define ALIGNOF_SHORT 2 #define MAXIMUM_ALIGNOF 4 Basically, therefore, ISTM that it is not a good thing that the atomics code thinks it can rely on 8-byte-aligned data when the entire rest of the system believes that 4-byte alignment is enough for anything. I was wondering why the shm_toc code was using BUFFERALIGN and not MAXALIGN, and I now suspect that the answer is "it's an entirely undocumented kluge to make the atomics code not crash on 32-bit machines, so long as nobody puts a pg_atomic_uint64 anywhere except in a shm_toc". I'm not sure that that's good enough, and I'm damn sure that it shouldn't be undocumented. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Heikki Linnakangas writes: > On 08/16/2017 08:10 PM, Andres Freund wrote: >> Seems like we'd just have to add alignment of the total size to >> shm_toc_estimate()? > Yeah, that's the gist of it. I can confirm that on dromedary, that regression test case is attempting to create a TOC with a not-well-aligned size: 93268 = 0x16c54 bytes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Hi, On 2017-08-16 20:19:29 +0300, Heikki Linnakangas wrote: > diff --git a/src/backend/storage/ipc/shm_toc.c > b/src/backend/storage/ipc/shm_toc.c > index 9f259441f0..121d5a1ec9 100644 > --- a/src/backend/storage/ipc/shm_toc.c > +++ b/src/backend/storage/ipc/shm_toc.c > @@ -44,7 +44,7 @@ shm_toc_create(uint64 magic, void *address, Size nbytes) > Assert(nbytes > offsetof(shm_toc, toc_entry)); > toc->toc_magic = magic; > SpinLockInit(&toc->toc_mutex); > - toc->toc_total_bytes = nbytes; > + toc->toc_total_bytes = BUFFERALIGN_DOWN(nbytes); > toc->toc_allocated_bytes = 0; > toc->toc_nentry = 0; Don't think we require BUFFERALIGN - MAXALIGN ought to be sufficient. The use of BUFFERALIGN presumably is to space out things into different cachelines, but that doesn't really seem to be important with this. Then we can just avoid defining the new macro... > @@ -252,7 +252,8 @@ shm_toc_lookup(shm_toc *toc, uint64 key, bool noError) > Size > shm_toc_estimate(shm_toc_estimator *e) > { > - return add_size(offsetof(shm_toc, toc_entry), > - add_size(mul_size(e->number_of_keys, > sizeof(shm_toc_entry)), > - e->space_for_chunks)); > + return BUFFERALIGN( > + add_size(offsetof(shm_toc, toc_entry), > + add_size(mul_size(e->number_of_keys, > sizeof(shm_toc_entry)), > + e->space_for_chunks))); > } I think splitting this into separate statements would be better. I think we also should rather *ALLIGN the size without e->space_for_chunks. The latter is already aligned properly, and the important bit is that we have enough space for e->space_for_chunks afterwards, and only padding is in th eway of that... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On Wed, Aug 16, 2017 at 1:19 PM, Heikki Linnakangas wrote: > Yeah, that's the gist of it. > > The attached patch seems to fix this. I'm not too familiar with this DSM > stuff, but this seems right to me. Unless someone has a better idea soon, > I'll commit this to make the buildfarm happy. Mmm, clever. Yeah, that looks reasonable at first glance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 08/16/2017 08:10 PM, Andres Freund wrote: Afaict shm_create/shm_toc_allocate don't actually guarantee that the end of the toc's memory is suitably aligned. But I didn't yet have any coffee, so ... Robert, I'm not quite sure what the intended behaviour of shm_toc is wrt alignment. I see that individual chunks are BUFFERALIGNed (both during estimation, and allocation). But I don't see how the size of the entire toc is aligned, which seems a requirement, given we allocate from the end. Seems like we'd just have to add alignment of the total size to shm_toc_estimate()? Yeah, that's the gist of it. The attached patch seems to fix this. I'm not too familiar with this DSM stuff, but this seems right to me. Unless someone has a better idea soon, I'll commit this to make the buildfarm happy. - Heikki diff --git a/src/backend/storage/ipc/shm_toc.c b/src/backend/storage/ipc/shm_toc.c index 9f259441f0..121d5a1ec9 100644 --- a/src/backend/storage/ipc/shm_toc.c +++ b/src/backend/storage/ipc/shm_toc.c @@ -44,7 +44,7 @@ shm_toc_create(uint64 magic, void *address, Size nbytes) Assert(nbytes > offsetof(shm_toc, toc_entry)); toc->toc_magic = magic; SpinLockInit(&toc->toc_mutex); - toc->toc_total_bytes = nbytes; + toc->toc_total_bytes = BUFFERALIGN_DOWN(nbytes); toc->toc_allocated_bytes = 0; toc->toc_nentry = 0; @@ -252,7 +252,8 @@ shm_toc_lookup(shm_toc *toc, uint64 key, bool noError) Size shm_toc_estimate(shm_toc_estimator *e) { - return add_size(offsetof(shm_toc, toc_entry), - add_size(mul_size(e->number_of_keys, sizeof(shm_toc_entry)), - e->space_for_chunks)); + return BUFFERALIGN( + add_size(offsetof(shm_toc, toc_entry), + add_size(mul_size(e->number_of_keys, sizeof(shm_toc_entry)), + e->space_for_chunks))); } diff --git a/src/include/c.h b/src/include/c.h index 9066e3c578..af799dc1df 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -598,6 +598,7 @@ typedef NameData *Name; #define LONGALIGN_DOWN(LEN) TYPEALIGN_DOWN(ALIGNOF_LONG, (LEN)) #define DOUBLEALIGN_DOWN(LEN) TYPEALIGN_DOWN(ALIGNOF_DOUBLE, (LEN)) #define MAXALIGN_DOWN(LEN) TYPEALIGN_DOWN(MAXIMUM_ALIGNOF, (LEN)) +#define BUFFERALIGN_DOWN(LEN) TYPEALIGN_DOWN(ALIGNOF_BUFFER, (LEN)) /* * The above macros will not work with types wider than uintptr_t, like with -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 2017-08-16 09:57:35 -0700, Peter Geoghegan wrote: > On Wed, Aug 16, 2017 at 9:55 AM, Andres Freund wrote: > > On 2017-08-16 11:16:58 -0400, Tom Lane wrote: > >> Heikki Linnakangas writes: > >> > A couple of 32-bit x86 buildfarm members don't seem to be happy with > >> > this. I'll investigate, but if anyone has a clue, I'm all ears... > >> > >> dromedary's issue seems to be alignment: > >> > >> TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & > >> ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: > >> "../../../../src/include/port/atomics.h", Line: 452) > >> 2017-08-16 11:11:38.558 EDT [75693:3] LOG: server process (PID 76277) was > >> terminated by signal 6: Abort trap > >> 2017-08-16 11:11:38.558 EDT [75693:4] DETAIL: Failed process was running: > >> select count(*) from a_star; > >> > >> Not sure if this is your bug or if it's exposing a pre-existing > >> deficiency in the atomics code, viz, failure to ensure that > >> pg_atomic_uint64 is actually a 64-bit-aligned type. Andres? > > > > I suspect it's the former. Suspect that the shared memory that holds > > the "parallel desc" isn't properly aligned: > > Clang has an -fsanitize=alignment option that may be useful here. Given that we're crashing in an assert that does nothing but check for alignment, before the memory is actually used in a "dangerous" manner, I don't quite see how? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Robert, Tom, On 2017-08-16 09:55:15 -0700, Andres Freund wrote: > > Not sure if this is your bug or if it's exposing a pre-existing > > deficiency in the atomics code, viz, failure to ensure that > > pg_atomic_uint64 is actually a 64-bit-aligned type. Andres? > I suspect it's the former. Suspect that the shared memory that holds > the "parallel desc" isn't properly aligned: Or, well, a mixture of both, because it seems like a deficiency in the shm_toc, code, rather than the atomics code. > Afaict shm_create/shm_toc_allocate don't actually guarantee that the end > of the toc's memory is suitably aligned. But I didn't yet have any > coffee, so ... Robert, I'm not quite sure what the intended behaviour of shm_toc is wrt alignment. I see that individual chunks are BUFFERALIGNed (both during estimation, and allocation). But I don't see how the size of the entire toc is aligned, which seems a requirement, given we allocate from the end. Seems like we'd just have to add alignment of the total size to shm_toc_estimate()? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On Wed, Aug 16, 2017 at 9:55 AM, Andres Freund wrote: > On 2017-08-16 11:16:58 -0400, Tom Lane wrote: >> Heikki Linnakangas writes: >> > A couple of 32-bit x86 buildfarm members don't seem to be happy with >> > this. I'll investigate, but if anyone has a clue, I'm all ears... >> >> dromedary's issue seems to be alignment: >> >> TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & >> ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: >> "../../../../src/include/port/atomics.h", Line: 452) >> 2017-08-16 11:11:38.558 EDT [75693:3] LOG: server process (PID 76277) was >> terminated by signal 6: Abort trap >> 2017-08-16 11:11:38.558 EDT [75693:4] DETAIL: Failed process was running: >> select count(*) from a_star; >> >> Not sure if this is your bug or if it's exposing a pre-existing >> deficiency in the atomics code, viz, failure to ensure that >> pg_atomic_uint64 is actually a 64-bit-aligned type. Andres? > > I suspect it's the former. Suspect that the shared memory that holds > the "parallel desc" isn't properly aligned: Clang has an -fsanitize=alignment option that may be useful here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 2017-08-16 11:16:58 -0400, Tom Lane wrote: > Heikki Linnakangas writes: > > A couple of 32-bit x86 buildfarm members don't seem to be happy with > > this. I'll investigate, but if anyone has a clue, I'm all ears... > > dromedary's issue seems to be alignment: > > TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & > ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: > "../../../../src/include/port/atomics.h", Line: 452) > 2017-08-16 11:11:38.558 EDT [75693:3] LOG: server process (PID 76277) was > terminated by signal 6: Abort trap > 2017-08-16 11:11:38.558 EDT [75693:4] DETAIL: Failed process was running: > select count(*) from a_star; > > Not sure if this is your bug or if it's exposing a pre-existing > deficiency in the atomics code, viz, failure to ensure that > pg_atomic_uint64 is actually a 64-bit-aligned type. Andres? I suspect it's the former. Suspect that the shared memory that holds the "parallel desc" isn't properly aligned: void ExecSeqScanInitializeDSM(SeqScanState *node, ParallelContext *pcxt) { ... pscan = shm_toc_allocate(pcxt->toc, node->pscan_len); heap_parallelscan_initialize(pscan, node->ss.ss_currentRelation, estate->es_snapshot); /* ... * We allocate backwards from the end of the segment, so that the TOC entries * can grow forward from the start of the segment. */ extern void * shm_toc_allocate(shm_toc *toc, Size nbytes) { volatile shm_toc *vtoc = toc; Sizetotal_bytes; Sizeallocated_bytes; Sizenentry; Sizetoc_bytes; /* Make sure request is well-aligned. */ nbytes = BUFFERALIGN(nbytes); ... return ((char *) toc) + (total_bytes - allocated_bytes - nbytes); } /* * Initialize a region of shared memory with a table of contents. */ shm_toc * shm_toc_create(uint64 magic, void *address, Size nbytes) { shm_toc*toc = (shm_toc *) address; Assert(nbytes > offsetof(shm_toc, toc_entry)); toc->toc_magic = magic; SpinLockInit(&toc->toc_mutex); toc->toc_total_bytes = nbytes; toc->toc_allocated_bytes = 0; toc->toc_nentry = 0; return toc; } Afaict shm_create/shm_toc_allocate don't actually guarantee that the end of the toc's memory is suitably aligned. But I didn't yet have any coffee, so ... - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Heikki Linnakangas writes: > A couple of 32-bit x86 buildfarm members don't seem to be happy with > this. I'll investigate, but if anyone has a clue, I'm all ears... dromedary's issue seems to be alignment: TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: "../../../../src/include/port/atomics.h", Line: 452) 2017-08-16 11:11:38.558 EDT [75693:3] LOG: server process (PID 76277) was terminated by signal 6: Abort trap 2017-08-16 11:11:38.558 EDT [75693:4] DETAIL: Failed process was running: select count(*) from a_star; Not sure if this is your bug or if it's exposing a pre-existing deficiency in the atomics code, viz, failure to ensure that pg_atomic_uint64 is actually a 64-bit-aligned type. Andres? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 08/16/2017 04:20 PM, Heikki Linnakangas wrote: On 05/06/2017 04:57 PM, David Rowley wrote: Andres mentioned in [2] that it might be worth exploring using atomics to do the same job. So I went ahead and did that, and came up with the attached, which is a slight variation on what he mentioned in the thread. To keep things a bit more simple, and streamline, I ended up pulling out the logic for setting the startblock into another function, which we only call once before the first call to heap_parallelscan_nextpage(). I also ended up changing phs_cblock and replacing it with a counter that always starts at zero. The actual block is calculated based on that + the startblock modulo nblocks. This makes things a good bit more simple for detecting when we've allocated all the blocks to the workers, and also works nicely when wrapping back to the start of a relation when we started somewhere in the middle due to piggybacking with a synchronous scan. Looks reasonable. I edited the comments and the variable names a bit, to my liking, and committed. Thanks! A couple of 32-bit x86 buildfarm members don't seem to be happy with this. I'll investigate, but if anyone has a clue, I'm all ears... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 05/06/2017 04:57 PM, David Rowley wrote: Andres mentioned in [2] that it might be worth exploring using atomics to do the same job. So I went ahead and did that, and came up with the attached, which is a slight variation on what he mentioned in the thread. To keep things a bit more simple, and streamline, I ended up pulling out the logic for setting the startblock into another function, which we only call once before the first call to heap_parallelscan_nextpage(). I also ended up changing phs_cblock and replacing it with a counter that always starts at zero. The actual block is calculated based on that + the startblock modulo nblocks. This makes things a good bit more simple for detecting when we've allocated all the blocks to the workers, and also works nicely when wrapping back to the start of a relation when we started somewhere in the middle due to piggybacking with a synchronous scan. Looks reasonable. I edited the comments and the variable names a bit, to my liking, and committed. Thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On Sat, May 6, 2017 at 7:27 PM, David Rowley wrote: > > I've also noticed that both the atomics patch and unpatched master do > something that looks a bit weird with synchronous seq-scans. If the > parallel seq-scan piggybacked on another scan, then subsequent > parallel scans will start at the same non-zero block location, even > when no other concurrent scans exist. > That is what we expect. The same is true for non-parallel scans as well, refer below code for non-parallel scans. heapgettup_pagemode() { .. finished = (page == scan->rs_startblock) || (scan->rs_numblocks != InvalidBlockNumber ? --scan->rs_numblocks == 0 : false); /* * Report our new scan position for synchronization purposes. We * don't do that when moving backwards, however. That would just * mess up any other forward-moving scanners. * * Note: we do this before checking for end of scan so that the * final state of the position hint is back at the start of the * rel. That's not strictly necessary, but otherwise when you run * the same query multiple times the starting position would shift * a little bit backwards on every invocation, which is confusing. * We don't guarantee any specific ordering in general, though. */ if (scan->rs_syncscan) ss_report_location(scan->rs_rd, page); .. } -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Atomics for heap_parallelscan_nextpage()
Hi, A while back I did some benchmarking on a big 4 socket machine to explore a bit around the outer limits of parallel aggregates. I discovered along the way that, given enough workers, and a simple enough task, that seq-scan workers were held up waiting for the lock to be released in heap_parallelscan_nextpage(). I've since done a little work in this area to try to improve things. I ended up posting about it yesterday in [1]. My original patch used batching to solve the issue; instead of allocating 1 block at a time, the batching patch allocated a range of 10 blocks for the worker to process. However, the implementation still needed a bit of work around reporting sync-scan locations. Andres mentioned in [2] that it might be worth exploring using atomics to do the same job. So I went ahead and did that, and came up with the attached, which is a slight variation on what he mentioned in the thread. To keep things a bit more simple, and streamline, I ended up pulling out the logic for setting the startblock into another function, which we only call once before the first call to heap_parallelscan_nextpage(). I also ended up changing phs_cblock and replacing it with a counter that always starts at zero. The actual block is calculated based on that + the startblock modulo nblocks. This makes things a good bit more simple for detecting when we've allocated all the blocks to the workers, and also works nicely when wrapping back to the start of a relation when we started somewhere in the middle due to piggybacking with a synchronous scan. Performance: With parallel_workers=71, it looks something like: Query 1: 881 GB, ~6 billion row TPC-H lineitem table. tpch=# select count(*) from lineitem; count 589709 (1 row) -- Master Time: 123421.283 ms (02:03.421) Time: 118895.846 ms (01:58.896) Time: 118632.546 ms (01:58.633) -- Atomics patch Time: 74038.813 ms (01:14.039) Time: 73166.200 ms (01:13.166) Time: 72492.338 ms (01:12.492) -- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage() Time: 76364.215 ms (01:16.364) Time: 75808.900 ms (01:15.809) Time: 74927.756 ms (01:14.928) Query 2: Single int column table with 2 billion rows. tpch=# select count(*) from a; count 20 (1 row) -- Master Time: 5853.918 ms (00:05.854) Time: 5925.633 ms (00:05.926) Time: 5859.223 ms (00:05.859) -- Atomics patch Time: 5825.745 ms (00:05.826) Time: 5849.139 ms (00:05.849) Time: 5815.818 ms (00:05.816) -- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage() Time: 5789.237 ms (00:05.789) Time: 5837.395 ms (00:05.837) Time: 5821.492 ms (00:05.821) I've also attached a text file with the perf report for the lineitem query. You'll notice that the heap_parallelscan_nextpage() is very visible in master, but not on each of the two patches. With the 2nd query, heap_parallelscan_nextpage is fairly insignificant on master's profile, it's only showing up as 0.48%. Likely this must be due to more tuples being read from the page, and more aggregation work getting done before the next page is needed. I'm uncertain why I previously saw a speed up in this case in [1]. I've also noticed that both the atomics patch and unpatched master do something that looks a bit weird with synchronous seq-scans. If the parallel seq-scan piggybacked on another scan, then subsequent parallel scans will start at the same non-zero block location, even when no other concurrent scans exist. I'd have expected this should go back to block 0 again, but maybe I'm just failing to understand the reason for reporting the startblock to ss_report_location() at the end of the scan. I'll now add this to the first commitfest of pg11. I just wanted to note that I've done this, so that it's less likely someone else goes and repeats the same work. [1] https://www.postgresql.org/message-id/CAKJS1f-XhfQ2-%3D85wgYo5b3WtEs%3Dys%3D2Rsq%3DNuvnmaV4ZsM1XQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/20170505023646.3uhnmf2hbwtm63lc%40alap3.anarazel.de -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Unpatched select count(*) from lineitem; 71 workers Children Self Command Shared Object Symbol + 99.83% 0.00% postgres libpthread-2.17.so[.] __restore_rt + 99.83% 0.00% postgres postgres [.] sigusr1_handler + 99.83% 0.00% postgres postgres [.] maybe_start_bgworkers + 99.83% 0.00% postgres postgres [.] do_start_bgworker + 99.83% 0.93% postgres postgres [.] ExecProcNode + 99.83% 0.00% postgres postgres [.] StartBackgroundWorker + 99.83% 0.00% postgres postgres [.] ParallelWorkerMain + 99.83% 0.00% postgres postgres [.] ParallelQueryMain + 99.83% 0.00% postgres postgres [.] ExecutorRun + 99.83% 0.00% post