On 2020-04-15 10:12:14 -0400, Robert Haas wrote: > On Wed, Apr 15, 2020 at 7:15 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > As I understand this, it needs to parse the lines twice (second time > > in phase-3) and till the first two phases are over, we can't start the > > tuple processing work which is done in phase-3. So even if the > > tokenization is done a bit faster but we will lose some on processing > > the tuples which might not be an overall win and in fact, it can be > > worse as compared to the single reader approach being discussed. > > Now, if the work done in tokenization is a major (or significant) > > portion of the copy then thinking of such a technique might be useful > > but that is not the case as seen in the data shared above (the > > tokenize time is very less as compared to data processing time) in > > this email. > > It seems to me that a good first step here might be to forget about > parallelism for a minute and just write a patch to make the line > splitting as fast as possible.
+1 Compared to all the rest of the efforts during COPY a fast "split rows" implementation should not be a bottleneck anymore.