Hi, It looks like the parsing of newly introduced "PARALLEL" option for COPY FROM command has an issue(in the 0002-Framework-for-leader-worker-in-parallel-copy.patch), Mentioning ....PARALLEL '4ar2eteid'); would pass with 4 workers since atoi() is being used for converting string to integer which just returns 4, ignoring other strings.
I used strtol(), added error checks and introduced the error " improper use of argument to option "parallel"" for the above cases. parallel '4ar2eteid'); ERROR: improper use of argument to option "parallel" LINE 5: parallel '1\'); Along with the updated patch 0002-Framework-for-leader-worker-in-parallel-copy.patch, also attaching all the latest patches from [1]. [1] - https://www.postgresql.org/message-id/CALj2ACW94icER3WrWapon7JkcX8j0TGRue5ycWMTEvgA3X7fOg%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com On Tue, Jun 23, 2020 at 12:22 PM vignesh C <vignes...@gmail.com> wrote: > > On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignes...@gmail.com> wrote: > > I have attached the patch for the same with the fixes. > > The patches were not applying on the head, attached the patches that can be > applied on head. > I have added a commitfest entry[1] for this feature. > > [1] - https://commitfest.postgresql.org/28/2610/ > > > On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignes...@gmail.com> wrote: >> >> Thanks Ashutosh For your review, my comments are inline. >> On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu.coe...@gmail.com> >> wrote: >> > >> > Hi, >> > >> > I just got some time to review the first patch in the list i.e. >> > 0001-Copy-code-readjustment-to-support-parallel-copy.patch. As the patch >> > name suggests, it is just trying to reshuffle the existing code for COPY >> > command here and there. There is no extra changes added in the patch as >> > such, but still I do have some review comments, please have a look: >> > >> > 1) Can you please add some comments atop the new function >> > PopulateAttributes() describing its functionality in detail. Further, this >> > new function contains the code from BeginCopy() to set attribute level >> > options used with COPY FROM such as FORCE_QUOTE, FORCE_NOT_NULL, >> > FORCE_NULL etc. in cstate and along with that it also copies the code from >> > BeginCopy() to set other infos such as client encoding type, encoding >> > conversion etc. Hence, I think it would be good to give it some better >> > name, basically something that matches with what actually it is doing. >> > >> >> There is no new code added in this function, some part of code from >> BeginCopy was made in to a new function as this part of code will also >> be required for the parallel copy workers before the workers start the >> actual copy operation. This code was made into a function to avoid >> duplication. Changed the function name to PopulateGlobalsForCopyFrom & >> added few comments. >> >> > 2) Again, the name for the new function CheckCopyFromValidity() doesn't >> > look good to me. From the function name it appears as if it does the >> > sanity check of the entire COPY FROM command, but actually it is just >> > doing the sanity check for the target relation specified with COPY FROM. >> > So, probably something like CheckTargetRelValidity would look more >> > sensible, I think? TBH, I am not good at naming the functions so you can >> > always ignore my suggestions about function and variable names :) >> > >> >> Changed as suggested. >> > 3) Any reason for not making CheckCopyFromValidity as a macro instead of a >> > new function. It is just doing the sanity check for the target relation. >> > >> >> I felt there is reasonable number of lines in the function & it is not >> in performance intensive path, so I preferred function over macro. >> Your thoughts? >> >> > 4) Earlier in CopyReadLine() function while trying to clear the EOL marker >> > from cstate->line_buf.data (copied data), we were not checking if the line >> > read by CopyReadLineText() function is a header line or not, but I can see >> > that your patch checks that before clearing the EOL marker. Any reason for >> > this extra check? >> > >> >> If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does >> nothing for the header line, server basically calls CopyReadLine >> again, it is a kind of small optimization. Anyway server is not going >> to do anything with header line, I felt no need to clear EOL marker >> for header lines. >> /* on input just throw the header line away */ >> if (cstate->cur_lineno == 0 && cstate->header_line) >> { >> cstate->cur_lineno++; >> if (CopyReadLine(cstate)) >> return false; /* done */ >> } >> >> cstate->cur_lineno++; >> >> /* Actually read the line into memory here */ >> done = CopyReadLine(cstate); >> I think no need to make a fix for this. Your thoughts? >> >> > 5) I noticed the below spurious line removal in the patch. >> > >> > @@ -3839,7 +3953,6 @@ static bool >> > CopyReadLine(CopyState cstate) >> > { >> > bool result; >> > - >> > >> >> Fixed. >> I have attached the patch for the same with the fixes. >> Thoughts? >> >> Regards, >> Vignesh >> EnterpriseDB: http://www.enterprisedb.com