On Mon, Jul 13, 2020 at 10:19 AM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Mon, Jul 13, 2020 at 4:06 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > > > > This error showed up when cfbot tried it: > > > > COPY BINARY stud_emp FROM > > '/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/stud_emp.data'; > > +ERROR: could not read from COPY file: Bad address > > This is due to the recent commit > cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb, in which we restricted the > raw_buf and line_buf allocations for binary files. Since we are using > raw_buf for this performance improvement feature, now, it's enough to > restrict only line_buf for binary files. I made the changes > accordingly in the v3 patch attached here. > > Regression tests(make check & make check-world) ran cleanly with the v3 patch.
Thank you Bharath. I was a bit surprised that you had also submitted a patch to NOT allocate raw_buf for COPY FROM ... BINARY. :-) > Please also find my responses for: > > Vignesh's comment: > > > On Sun, Jul 12, 2020 at 6:43 PM vignesh C <vignes...@gmail.com> wrote: > > I had reviewed the changes. I felt one minor change required: > > + * CopyReadFromRawBuf > > + * Reads 'nbytes' bytes from cstate->copy_file via > > cstate->raw_buf and > > + * writes then to 'saveTo' > > + * > > + * Useful when reading binary data from the file. > > Should "writes then to 'saveTo'" be "writes them to 'dest'"? > > > > Thanks Vignesh for reviewing the patch. Modified 'saveTo' to 'dest' in v3 > patch. My bad. > Amit's comment: > > > > > > For the case where required nbytes may not fit into the buffer in > > > CopyReadFromRawBuf, I'm sure this can happen only for field data, > > > (field count , and field size are of fixed length and can fit in the > > > buffer), instead of reading them in parts of buff size into the buffer > > > (using CopyLoadRawBuf) and then DRAIN_COPY_RAW_BUF() to the > > > destination, I think we can detect this condition using requested > > > bytes and the buffer size and directly read from the file to the > > > destination buffer and then reload the raw_buffer for further > > > processing. I think this way, it will be good. > > > > Hmm, I'm afraid that this will make the code more complex for > > apparently small benefit. Is this really that much of a problem > > performance wise? > > > > Yes it makes CopyReadFromRawBuf(), code a bit complex from a > readability perspective. I'm convinced not to have the > abovementioned(by me) change, due to 3 reasons,1) the > readability/understandability 2) how many use cases can we have where > requested field size greater than RAW_BUF_SIZE(64KB)? I think very few > cases. I may be wrong here. 3) Performance wise it may not be much as > we do one extra memcpy only in situations where field sizes are > greater than 64KB(as we have already seen and observed by you as well > in one of the response [1]) that memcpy cost for this case may be > negligible. Actually, an extra memcpy is incurred on every call of CopyReadFromRawBuf(), but I haven't seen it to be very problematic. By the way, considering the rebase over cd22d3cdb9b, it seemed to me that we needed to update the comments in CopyStateData struct definition a bit more. While doing that, I realized CopyReadFromRawBuf as a name for the new function might be misleading as long as we are only using it for binary data. Maybe CopyReadBinaryData is more appropriate? See attached v4 with these and a few other cosmetic changes. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
CopyFrom-binary-buffering_v4.patch
Description: Binary data