On Mon, Dec 28, 2020 at 10:45 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > I have reviewed part of v15-0001 patch, I have a few comments, I will > > > continue to review this. > > > > Thanks a lot. > > > > > 1. > > > Why is this temporary hack? and what is the plan for removing this hack? > > > > The changes in xact.c, xact.h and heapam.c are common to all the > > parallel insert patches - COPY, INSERT INTO SELECT. That was the > > initial comment, I forgot to keep it in sync with the other patches. > > Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan > > was to have these code in all the parallel inserts patch, whichever > > gets to review and commit first, others will update their patches > > accordingly. > > > > > 2. > > > +/* > > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > > + * insertion is possible, if yes set the parallel insert state i.e. push > > > down > > > + * the dest receiver to the Gather nodes. > > > + */ > > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > > +{ > > > + if (!IS_CTAS(into)) > > > + return; > > > > > > When will this hit? The functtion name suggest that it is from CTAS > > > but now you have a check that if it is > > > not for CTAS then return, can you add the comment that when do you > > > expect this case? > > > > Yes it will hit for explain cases, but I choose to remove this and > > check outside in the explain something like: > > if (into) > > ChooseParallelInsertsInCTAS() > > > > > Also the function name should start in a new line > > > i.e > > > void > > > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > > > Ah, missed that. Modified now. > > > > > 3. > > > +/* > > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > > + * insertion is possible, if yes set the parallel insert state i.e. push > > > down > > > + * the dest receiver to the Gather nodes. > > > + */ > > > > > > Push down to the Gather nodes? I think the right statement will be > > > push down below the Gather node. > > > > Modified. > > > > > 4. > > > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) > > > { > > > DR_intorel *myState = (DR_intorel *) self; > > > > > > -- Comment ->in parallel worker we don't need to crease dest recv > > > blah blah > > > + if (myState->is_parallel_worker) > > > { > > > --parallel worker handling-- > > > return; > > > } > > > > > > --non-parallel worker code stay right there, instead of moving to else > > > > Done. > > > > > 5. > > > +/* > > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel > > > + * insertion is possible, if yes set the parallel insert state i.e. push > > > down > > > + * the dest receiver to the Gather nodes. > > > + */ > > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc) > > > +{ > > > > > > From function name and comments it appeared that this function will > > > return boolean saying whether > > > Parallel insert should be selected or not. I think name/comment > > > should be better for this > > > > Yeah that function can still return void because no point in returning > > bool there, since the intention is to see if parallel inserts can be > > performed, if yes, set the state otherwise exit. I changed the > > function name to TryParallelizingInsertsInCTAS(). Let me know your > > suggestions if that doesn't work out. > > > > > 6. > > > /* > > > + * For parallelizing inserts in CTAS i.e. making each parallel > > > worker > > > + * insert the tuples, we must send information such as into > > > clause (for > > > + * each worker to build separate dest receiver), object id (for > > > each > > > + * worker to open the created table). > > > > > > Comment is saying we need to pass object id but the code under this > > > comment is not doing so. > > > > Improved the comment. > > > > > 7. > > > + /* > > > + * Since there are no rows that are transferred from workers to > > > Gather > > > + * node, so we set it to 0 to be visible in estimated row count > > > of > > > + * explain plans. > > > + */ > > > + queryDesc->planstate->plan->plan_rows = 0; > > > > > > This seems a bit hackies Why it is done after the planning, I mean > > > plan must know that it is returning a 0 rows? > > > > This exists to show up the estimated row count(in case of EXPLAIN CTAS > > without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples > > are shown correctly as 0 because Gather doesn't receive any tuples. > > if (es->costs) > > { > > if (es->format == EXPLAIN_FORMAT_TEXT) > > { > > appendStringInfo(es->str, " (cost=%.2f..%.2f rows=%.0f > > width=%d)", > > plan->startup_cost, plan->total_cost, > > plan->plan_rows, plan->plan_width); > > > > Since it's an estimated row count(which may not be always correct), we > > will let the EXPLAIN plan show that and I think we can remove that > > part. Thoughts? > > > > I removed it in v6 patch set. > > > > > 8. > > > + char *intoclause_space = shm_toc_allocate(pcxt->toc, > > > + intoclause_len); > > > + memcpy(intoclause_space, intoclausestr, intoclause_len); > > > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, > > > intoclause_space); > > > > > > One blank line between variable declaration and next code segment, > > > take care at other places as well. > > > > Done. > > > > I'm attaching the v16 patch set. Please note that I added the > > documentation saying that parallel insertions can happen and a sample > > output of the explain to 0003 patch as discussed in [1]. But I didn't > > move the explain output related code to a separate patch because it's > > a small snippet in explain.c. I hope that's okay. > > > > [1] - > > https://www.postgresql.org/message-id/CAA4eK1JqwXGYoGa1%2B3-f0T50dBGufvKaKQOee_AfFhygZ6QKtA%40mail.gmail.com > > > > Thanks for working on this, I will have a look at the updated patches soon.
I have completed reviewing 0001, I don't have more comments, just one question. Soon I will review the remaining patches. + /* If parallel inserts are to be allowed, set a few extra information. */ + if (myState->is_parallel) + { + myState->object_id = intoRelationAddr.objectId; + + /* + * We don't need to skip contacting FSM while inserting tuples for + * parallel mode, while extending the relations, workers instead of + * blocking on a page while another worker is inserting, can check the + * FSM for another page that can accommodate the tuples. This results + * in major benefit for parallel inserts. + */ + myState->ti_options = 0; Is there any performance data for this or just theoretical analysis? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com