From: Robert Haas <robertmh...@gmail.com>
> On Wed, May 5, 2021 at 10:54 PM tsunakawa.ta...@fujitsu.com
> <tsunakawa.ta...@fujitsu.com> wrote:
> > As proposed in this thread and/or "Parallel INSERT SELECT take 2", we
> thought of detecting parallel unsafe function execution during SQL statement
> execution, instead of imposing much overhead to check parallel safety during
> query planning.  Specifically, we add parallel safety check in fmgr_info()
> and/or FunctionCallInvoke().
> 
> I haven't read that thread, but I don't understand how that can work.
> The reason we need to detect it at plan time is because we might need
> to use a different plan. At execution time it's too late for that.

(I forgot to say this in my previous email.  Robert-san, thank you very much 
for taking time to look at this and giving feedback.  It was sad that we had to 
revert our parallel INSERT SELECT for redesign at the very end of the last CF.  
We need advice and suggestions from knowledgeable and thoughtful people like 
Tom-san, Andres-san and you in early stages to not repeat the tragedy.)

I'd really like you to have a look at the first mail in [1], and to get your 
feedback like "this part should be like ... instead" and "this part would 
probably work, I think."  Without feedback from leading developers, I'm 
somewhat at a loss if and how we can proceed with the proposed approach.

To put it shortly, we found that it can take non-negligible time for the 
planner to check the parallel safety of the target table of INSERT SELECT when 
it has many (hundreds or thousands of) partitions.  The check also added much 
complicated code, too.  So, we got inclined to take Tom-san's suggestion -- let 
the user specify the parallel safety of the target table with CREATE/ALTER 
TABLE and the planner just decides a query plan based on it.  Caching the 
results of parallel safety checks in relcache or a new shared hash table didn't 
seem to work well to me, or it should be beyond my brain at least.

We may think that it's okay to just believe the user-specified parallel safety. 
 But I thought we could step up and do our best to check the parallel safety 
during statement execution, if it's not very invasive in terms of performance 
and code complexity.  The aforementioned idea is that if the parallel processes 
find the called functions parallel unsafe, they error out.  All ancillary 
objects of the target table, data types, constraints, indexes, triggers, etc., 
come down to some UDF, so it should be enough to check the parallel safety when 
the UDF is called.


> Also, it seems potentially quite expensive. A query may be planned
> once and executed many times. Also, a single query execution may call
> the same SQL function many times. I think we don't want to incur the
> overhead of an extra syscache lookup every time anyone calls any
> function. A very simple expression like a+b+c+d+e involves four
> function calls, and + need not be a built-in, if the data type is
> user-defined. And that might be happening for every row in a table
> with millions of rows.

We (optimistically) expect that the overhead won't be serious, because the 
parallel safety information is already at hand in the FmgrInfo struct when the 
function is called.  We don't have to look up the syscache every time the 
function is called.

Of course, adding even a single if statement may lead to a disaster in a 
critical path, so we need to assess the performance.  I'd also appreciate if 
you could suggest some good workload we should experiment in the thread above.



[1]
Parallel INSERT SELECT take 2
https://www.postgresql.org/message-id/tyapr01mb29905a9ab82cc8ba50ab0f80fe...@tyapr01mb2990.jpnprd01.prod.outlook.com


Regards
Takayuki Tsunakawa

Reply via email to