Re: [HACKERS] crashes due to setting max_parallel_workers=0

Rushabh Lathia Mon, 27 Mar 2017 04:41:54 -0700

On Mon, Mar 27, 2017 at 10:59 AM, Rushabh Lathia <[email protected]>
wrote:


>
>
> On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <
> [email protected]> wrote:
>
>> On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
>>
>>>
>>>
>>> On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
>>> <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>     On 3/25/17 09:01, David Rowley wrote:
>>>     > On 25 March 2017 at 23:09, Rushabh Lathia <
>>> [email protected] <mailto:[email protected]>> wrote:
>>>     >> Also another point which I think we should fix is, when someone
>>> set
>>>     >> max_parallel_workers = 0, we should also set the
>>>     >> max_parallel_workers_per_gather
>>>     >> to zero. So that way it we can avoid generating the gather path
>>> with
>>>     >> max_parallel_worker = 0.
>>>     > I see that it was actually quite useful that it works the way it
>>> does.
>>>     > If it had worked the same as max_parallel_workers_per_gather, then
>>>     > likely Tomas would never have found this bug.
>>>
>>>     Another problem is that the GUC system doesn't really support cases
>>>     where the validity of one setting depends on the current value of
>>>     another setting.  So each individual setting needs to be robust
>>> against
>>>     cases of related settings being nonsensical.
>>>
>>>
>>> Okay.
>>>
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>>
>>> PFA simple patch to fix the problem.
>>>
>>>
>> I think there are two issues at play, here - the first one is that we
>> still produce parallel plans even with max_parallel_workers=0, and the
>> second one is the crash in GatherMerge when nworkers=0.
>>
>> Your patch fixes the latter (thanks for looking into it), which is
>> obviously a good thing - getting 0 workers on a busy system is quite
>> possible, because all the parallel workers can be already chewing on some
>> other query.
>>
>>
> Thanks.
>
>

I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.

create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;

This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.

Currently gather_merge_clear_slots() clear out the tuple table slots for
each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.



> But it seems a bit futile to produce the parallel plan in the first place,
>> because with max_parallel_workers=0 we can't possibly get any parallel
>> workers ever. I wonder why compute_parallel_worker() only looks at
>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>>
>>    parallel_workers = Min(parallel_workers, max_parallel_workers);
>>
>>
> I agree with you here. Producing the parallel plan when
> max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
> think that we should do something like:
>
>     /*
>      * In no case use more than max_parallel_workers_per_gather or
>      * max_parallel_workers.
>      */
>     parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
> max_parallel_workers_per_gather));
>
>
>
>> Perhaps this was discussed and is actually intentional, though.
>>
>>
> Yes, I am not quite sure about this.
>
> Regarding handling this at the GUC level - I agree with Peter that that's
>> not a good idea. I suppose we could deal with checking the values in the
>> GUC check/assign hooks, but what we don't have is a way to undo the changes
>> in all the GUCs. That is, if I do
>>
>>    SET max_parallel_workers = 0;
>>    SET max_parallel_workers = 16;
>>
>> I expect to end up with just max_parallel_workers GUC changed and nothing
>> else.
>>
>> regards
>>
>> --
>> Tomas Vondra                  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>
>
> --
> Rushabh Lathia
>



Regards,
Rushabh Lathia
www.EnterpriseDB.com

diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 3f0c3ee..62c399e 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -419,10 +419,9 @@ reread:
 }
 
 /*
- * Clear out the tuple table slots for each gather merge input,
- * and return a cleared slot.
+ * Clear out the tuple table slots for each gather merge input.
  */
-static TupleTableSlot *
+static void
 gather_merge_clear_slots(GatherMergeState *gm_state)
 {
 	int			i;
@@ -437,9 +436,6 @@ gather_merge_clear_slots(GatherMergeState *gm_state)
 	pfree(gm_state->gm_tuple_buffers);
 	/* Free the binaryheap, which was created for sort */
 	binaryheap_free(gm_state->gm_heap);
-
-	/* return any clear slot */
-	return gm_state->gm_slots[0];
 }
 
 /*
@@ -479,7 +475,8 @@ gather_merge_getnext(GatherMergeState *gm_state)
 	if (binaryheap_empty(gm_state->gm_heap))
 	{
 		/* All the queues are exhausted, and so is the heap */
-		return gather_merge_clear_slots(gm_state);
+		gather_merge_clear_slots(gm_state);
+		return NULL;
 	}
 	else
 	{

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] crashes due to setting max_parallel_workers=0

Reply via email to