Re: [PATCH] Performance Improvement For Copy From Binary Files

Amit Langote Sun, 12 Jul 2020 19:33:15 -0700

On Mon, Jul 13, 2020 at 10:19 AM Bharath Rupireddy
<[email protected]> wrote:
> > On Mon, Jul 13, 2020 at 4:06 AM Thomas Munro <[email protected]> wrote:
> >
> > This error showed up when cfbot tried it:
> >
> >  COPY BINARY stud_emp FROM
> > '/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/stud_emp.data';
> > +ERROR:  could not read from COPY file: Bad address
>
> This is due to the recent commit
> cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb, in which we restricted the
> raw_buf and line_buf allocations for binary files. Since we are using
> raw_buf for this performance improvement feature, now, it's enough to
> restrict only line_buf for binary files. I made the changes
> accordingly in the v3 patch attached here.
>
> Regression tests(make check & make check-world) ran cleanly with the v3 patch.


Thank you Bharath.  I was a bit surprised that you had also submitted
a patch to NOT allocate raw_buf for COPY FROM ... BINARY. :-)

> Please also find my responses for:
>
> Vignesh's comment:
>
> > On Sun, Jul 12, 2020 at 6:43 PM vignesh C <[email protected]> wrote:
> > I had reviewed the changes. I felt one minor change required:
> > + * CopyReadFromRawBuf
> > + *             Reads 'nbytes' bytes from cstate->copy_file via
> > cstate->raw_buf and
> > + *             writes then to 'saveTo'
> > + *
> > + * Useful when reading binary data from the file.
> > Should "writes then to 'saveTo'" be "writes them to 'dest'"?
> >
>
> Thanks Vignesh for reviewing the patch. Modified 'saveTo' to 'dest' in v3 
> patch.

My bad.

> Amit's comment:
>
> >
> > > For the case where required nbytes may not fit into the buffer in
> > > CopyReadFromRawBuf, I'm sure this can happen only for field data,
> > > (field count , and field size are of fixed length and can fit in the
> > > buffer), instead of reading them in parts of buff size into the buffer
> > > (using CopyLoadRawBuf) and then DRAIN_COPY_RAW_BUF() to the
> > > destination, I think we can detect this condition using requested
> > > bytes and the buffer size and directly read from the file to the
> > > destination buffer and then reload the raw_buffer for further
> > > processing. I think this way, it will be good.
> >
> > Hmm, I'm afraid that this will make the code more complex for
> > apparently small benefit.  Is this really that much of a problem
> > performance wise?
> >
>
> Yes it makes CopyReadFromRawBuf(), code a bit complex from a
> readability perspective. I'm convinced not to have the
> abovementioned(by me) change, due to 3 reasons,1)  the
> readability/understandability 2)  how many use cases can we have where
> requested field size greater than RAW_BUF_SIZE(64KB)? I think very few
> cases. I may be wrong here. 3) Performance wise it may not be much as
> we do one extra memcpy only in situations where field sizes are
> greater than 64KB(as we have already seen and observed by you as well
> in one of the response [1]) that memcpy cost for this case may be
> negligible.

Actually, an extra memcpy is incurred on every call of
CopyReadFromRawBuf(), but I haven't seen it to be very problematic.

By the way, considering the rebase over cd22d3cdb9b, it seemed to me
that we needed to update the comments in CopyStateData struct
definition a bit more.  While doing that, I realized
CopyReadFromRawBuf as a name for the new function might be misleading
as long as we are only using it for binary data.  Maybe
CopyReadBinaryData is more appropriate?  See attached v4 with these
and a few other cosmetic changes.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

CopyFrom-binary-buffering_v4.patch
Description: Binary data

Re: [PATCH] Performance Improvement For Copy From Binary Files

Reply via email to