Re: ERROR: too many dynamic shared memory segments
Robert Haas writes: > On Fri, Feb 2, 2018 at 3:01 PM, Tom Lane wrote: >> Do you mind if I change that Assert to a run-time test? > Hrm, I guess I could have done something like > shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, (pcxt->nworkers == > 0)). OK, that'd work too. I'll make it so. regards, tom lane
Re: ERROR: too many dynamic shared memory segments
On Fri, Feb 2, 2018 at 3:01 PM, Tom Lane wrote: > Robert Haas writes: >> That turned out to produce more than one problem. I find that the >> select_parallel test then fails like this: >> ERROR: could not find key 18446744073709486082 in shm TOC at 0x10be98040 >> The fix for that problem seems to be: > >> /* Recreate error queues. */ >> error_queue_space = >> -shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, false); >> +shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, true); >> +Assert(pcxt->nworkers == 0 || error_queue_space != NULL); > > I happened across this patch while preparing the release notes, and > I'm quite distressed by it, because it completely breaks the point > of what I'd done in commit d46633506 (to wit, to not just blindly > assume, nor just Assert, that shm_toc_lookup always succeeds). > Do you mind if I change that Assert to a run-time test? Hrm, I guess I could have done something like shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, (pcxt->nworkers == 0)). I don't mind much if you change it, but I will note that for the record, before d46633506, we had a theoretical source of bugs, whereas after that commit, we had a bug. 445dbd82a fixed that; if you change this around again, please take care not to make it buggy again. Otherwise, I'll be the one who is quite distressed. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ERROR: too many dynamic shared memory segments
Robert Haas writes: > That turned out to produce more than one problem. I find that the > select_parallel test then fails like this: > ERROR: could not find key 18446744073709486082 in shm TOC at 0x10be98040 > The fix for that problem seems to be: > /* Recreate error queues. */ > error_queue_space = > -shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, false); > +shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, true); > +Assert(pcxt->nworkers == 0 || error_queue_space != NULL); I happened across this patch while preparing the release notes, and I'm quite distressed by it, because it completely breaks the point of what I'd done in commit d46633506 (to wit, to not just blindly assume, nor just Assert, that shm_toc_lookup always succeeds). Do you mind if I change that Assert to a run-time test? regards, tom lane
Re: ERROR: too many dynamic shared memory segments
On Tue, Nov 28, 2017 at 9:45 AM, Dilip Kumar wrote: >> I haven't checked whether this fixes the bug, but if it does, we can >> avoid introducing an extra branch in BitmapHeapNext. > > With my test it's fixing the problem. I tested it some more and found that, for me, it PARTIALLY fixes the problem. I tested like this: --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -279,7 +279,7 @@ InitializeParallelDSM(ParallelContext *pcxt) * parallelism than to fail outright. */ segsize = shm_toc_estimate(&pcxt->estimator); -if (pcxt->nworkers > 0) +if (pcxt->nworkers > 0 && false) pcxt->seg = dsm_create(segsize, DSM_CREATE_NULL_IF_MAXSEGMENTS); if (pcxt->seg != NULL) pcxt->toc = shm_toc_create(PARALLEL_MAGIC, That turned out to produce more than one problem. I find that the select_parallel test then fails like this: ERROR: could not find key 18446744073709486082 in shm TOC at 0x10be98040 The fix for that problem seems to be: --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -430,7 +430,8 @@ ReinitializeParallelDSM(ParallelContext *pcxt) /* Recreate error queues. */ error_queue_space = -shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, false); +shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, true); +Assert(pcxt->nworkers == 0 || error_queue_space != NULL); for (i = 0; i < pcxt->nworkers; ++i) { char *start; With that fix in place, I then hit a crash in parallel bitmap heap scan. After applying no-pstate.patch, which I just committed and back-patched to v10, then things look OK. I'm going to apply the fix for the error_queue_space problem also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ERROR: too many dynamic shared memory segments
On Tue, Nov 28, 2017 at 7:13 PM, Robert Haas wrote: > On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar > wrote: > > I think BitmapHeapScan check whether dsa is valid or not if DSA is not > > valid then it should assume it's non-parallel plan. > > > > Attached patch should fix the issue. > > So, create the pstate and then pretend we didn't? Why not just avoid > creating it in the first place, like this? > This is better way to fix it. > > I haven't checked whether this fixes the bug, but if it does, we can > avoid introducing an extra branch in BitmapHeapNext. With my test it's fixing the problem. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: ERROR: too many dynamic shared memory segments
On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar wrote: > I think BitmapHeapScan check whether dsa is valid or not if DSA is not > valid then it should assume it's non-parallel plan. > > Attached patch should fix the issue. So, create the pstate and then pretend we didn't? Why not just avoid creating it in the first place, like this? I haven't checked whether this fixes the bug, but if it does, we can avoid introducing an extra branch in BitmapHeapNext. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company no-pstate.patch Description: Binary data
Re: ERROR: too many dynamic shared memory segments
On Tue, Nov 28, 2017 at 4:18 AM, Thomas Munro wrote: > On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa > wrote: > > As for the crash. I dug up the initial log and it looks like a > segmentation > > fault... > > > > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR: > too > > many dynamic shared memory segments > > I think there are two failure modes: one of your sessions showed the > "too many ..." error (that's good, ran out of slots and said so and > our error machinery worked as it should), and another crashed with a > segfault, because it tried to use a NULL "area" pointer (bad). I > think this is a degenerate case where we completely failed to launch > parallel query, but we ran the parallel query plan anyway and this > code thinks that the DSA is available. Oops. > I think BitmapHeapScan check whether dsa is valid or not if DSA is not valid then it should assume it's non-parallel plan. Attached patch should fix the issue. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com bug_fix_in_pbhs_when_dsa_not_initialized.patch Description: Binary data