Re: ERROR: too many dynamic shared memory segments

2018-02-02 Thread Tom Lane
Robert Haas  writes:
> On Fri, Feb 2, 2018 at 3:01 PM, Tom Lane  wrote:
>> Do you mind if I change that Assert to a run-time test?

> Hrm, I guess I could have done something like
> shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, (pcxt->nworkers ==
> 0)).

OK, that'd work too.  I'll make it so.

regards, tom lane



Re: ERROR: too many dynamic shared memory segments

2018-02-02 Thread Robert Haas
On Fri, Feb 2, 2018 at 3:01 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> That turned out to produce more than one problem.  I find that the
>> select_parallel test then fails like this:
>> ERROR:  could not find key 18446744073709486082 in shm TOC at 0x10be98040
>> The fix for that problem seems to be:
>
>>  /* Recreate error queues. */
>>  error_queue_space =
>> -shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, false);
>> +shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, true);
>> +Assert(pcxt->nworkers == 0 || error_queue_space != NULL);
>
> I happened across this patch while preparing the release notes, and
> I'm quite distressed by it, because it completely breaks the point
> of what I'd done in commit d46633506 (to wit, to not just blindly
> assume, nor just Assert, that shm_toc_lookup always succeeds).
> Do you mind if I change that Assert to a run-time test?

Hrm, I guess I could have done something like
shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, (pcxt->nworkers ==
0)).

I don't mind much if you change it, but I will note that for the
record, before d46633506, we had a theoretical source of bugs, whereas
after that commit, we had a bug. 445dbd82a fixed that; if you change
this around again, please take care not to make it buggy again.
Otherwise, I'll be the one who is quite distressed.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: ERROR: too many dynamic shared memory segments

2018-02-02 Thread Tom Lane
Robert Haas  writes:
> That turned out to produce more than one problem.  I find that the
> select_parallel test then fails like this:
> ERROR:  could not find key 18446744073709486082 in shm TOC at 0x10be98040
> The fix for that problem seems to be:

>  /* Recreate error queues. */
>  error_queue_space =
> -shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, false);
> +shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, true);
> +Assert(pcxt->nworkers == 0 || error_queue_space != NULL);

I happened across this patch while preparing the release notes, and
I'm quite distressed by it, because it completely breaks the point
of what I'd done in commit d46633506 (to wit, to not just blindly
assume, nor just Assert, that shm_toc_lookup always succeeds).
Do you mind if I change that Assert to a run-time test?

regards, tom lane



Re: ERROR: too many dynamic shared memory segments

2017-11-28 Thread Robert Haas
On Tue, Nov 28, 2017 at 9:45 AM, Dilip Kumar  wrote:
>> I haven't checked whether this fixes the bug, but if it does, we can
>> avoid introducing an extra branch in BitmapHeapNext.
>
> With my test it's fixing the problem.

I tested it some more and found that, for me, it PARTIALLY fixes the
problem.  I tested like this:

--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -279,7 +279,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  * parallelism than to fail outright.
  */
 segsize = shm_toc_estimate(>estimator);
-if (pcxt->nworkers > 0)
+if (pcxt->nworkers > 0 && false)
 pcxt->seg = dsm_create(segsize, DSM_CREATE_NULL_IF_MAXSEGMENTS);
 if (pcxt->seg != NULL)
 pcxt->toc = shm_toc_create(PARALLEL_MAGIC,

That turned out to produce more than one problem.  I find that the
select_parallel test then fails like this:

ERROR:  could not find key 18446744073709486082 in shm TOC at 0x10be98040

The fix for that problem seems to be:

--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -430,7 +430,8 @@ ReinitializeParallelDSM(ParallelContext *pcxt)

 /* Recreate error queues. */
 error_queue_space =
-shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, false);
+shm_toc_lookup(pcxt->toc, PARALLEL_KEY_ERROR_QUEUE, true);
+Assert(pcxt->nworkers == 0 || error_queue_space != NULL);
 for (i = 0; i < pcxt->nworkers; ++i)
 {
 char   *start;

With that fix in place, I then hit a crash in parallel bitmap heap
scan.  After applying no-pstate.patch, which I just committed and
back-patched to v10, then things look OK.  I'm going to apply the fix
for the error_queue_space problem also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: ERROR: too many dynamic shared memory segments

2017-11-28 Thread Dilip Kumar
On Tue, Nov 28, 2017 at 7:13 PM, Robert Haas  wrote:

> On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar 
> wrote:
> >  I think BitmapHeapScan check whether dsa is valid or not if DSA is not
> > valid then it should assume it's non-parallel plan.
> >
> > Attached patch should fix the issue.
>
> So, create the pstate and then pretend we didn't?  Why not just avoid
> creating it in the first place, like this?
>

This is better way to fix it.

>
> I haven't checked whether this fixes the bug, but if it does, we can
> avoid introducing an extra branch in BitmapHeapNext.


With my test it's fixing the problem.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: ERROR: too many dynamic shared memory segments

2017-11-28 Thread Robert Haas
On Tue, Nov 28, 2017 at 2:32 AM, Dilip Kumar  wrote:
>  I think BitmapHeapScan check whether dsa is valid or not if DSA is not
> valid then it should assume it's non-parallel plan.
>
> Attached patch should fix the issue.

So, create the pstate and then pretend we didn't?  Why not just avoid
creating it in the first place, like this?

I haven't checked whether this fixes the bug, but if it does, we can
avoid introducing an extra branch in BitmapHeapNext.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


no-pstate.patch
Description: Binary data


Re: ERROR: too many dynamic shared memory segments

2017-11-27 Thread Dilip Kumar
On Tue, Nov 28, 2017 at 4:18 AM, Thomas Munro  wrote:

> On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa 
> wrote:
> > As for the crash. I dug up the initial log and it looks like a
> segmentation
> > fault...
> >
> > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR:
> too
> > many dynamic shared memory segments
>
> I think there are two failure modes: one of your sessions showed the
> "too many ..." error (that's good, ran out of slots and said so and
> our error machinery worked as it should), and another crashed with a
> segfault, because it tried to use a NULL "area" pointer (bad).  I
> think this is a degenerate case where we completely failed to launch
> parallel query, but we ran the parallel query plan anyway and this
> code thinks that the DSA is available.  Oops.
>

 I think BitmapHeapScan check whether dsa is valid or not if DSA is not
valid then it should assume it's non-parallel plan.

Attached patch should fix the issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


bug_fix_in_pbhs_when_dsa_not_initialized.patch
Description: Binary data