subject:"Improvements in Copy From"

Re: Improvements in Copy From

2020-09-11 Thread Kyotaro Horiguchi

At Fri, 11 Sep 2020 18:44:13 +1000, Peter Smith  wrote 
in 
> On Thu, Sep 10, 2020 at 9:21 PM vignesh C  wrote:
> > > Whether such a micro-optimisation is worth doing is another question.
> > Yes, what you suggested can also be done, but even I have the same
> > question as you. Because we will reduce just one function call, the
> > eof check is present immediately in the function, Should we include
> > this or not?
> 
> I expect the difference from my suggestion is too small to be measured.
> 
> Probably it is not worth changing the already complicated code unless
> those changes can achieve something observable.
> 
> ~~
> 
> FYI, I ran a few performance tests BEFORE/AFTER applying your patch.
> 
> Perf results for \COPY 5GB CSV file to UNLOGGED table.
> 
> perf -a –g 
> psql -d test -c "\copy tbl from '/my/path/data_5GB.csv' with (format csv);”
> perf report –g
> 
> BEFORE
> #1 CopyReadLineText = 12.70%, CopyLoadRawBuf = 0.81%
> #2 CopyReadLineText = 12.54%, CopyLoadRawBuf = 0.81%
> #3 CopyReadLineText = 12.52%, CopyLoadRawBuf = 0.81%
> 
> AFTER
> #1 CopyReadLineText = 12.55%, CopyLoadRawBuf = 1.20%
> #2 CopyReadLineText = 12.15%, CopyLoadRawBuf = 1.10%
> #3 CopyReadLineText = 13.11%, CopyLoadRawBuf = 1.24%
> #4 CopyReadLineText = 12.86%, CopyLoadRawBuf = 1.18%
> 
> I didn't quite know how to interpret those results. It was opposite
> what I expected. Perhaps the original excessive CopyLoadRawBuf calls
> were so brief they could often avoid being sampled? Anyway, I hope you
> have a better understanding of perf than I do and can explain it.
> 
> I then repeated/times same tests but without perf
> 
> BEFORE
> #1 4min.36s
> #2 4min.45s
> #3 4min.43s
> #4 4min.34s
> 
> AFTER
> #1 4min.41s
> #2 4min.37s
> #3 4min.34s
> 
> As you can see, unfortunately, the patch gave no observable benefit
> for my test case.

That observation agrees with my assumption.

At Fri, 11 Sep 2020 15:58:04 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
me> we should do that. On the contrary, if incoming data were
me> intermittently delayed for some reasons (heavy load of client or
me> in-between network), this patch would make things worse by waiting for
me> delayed bits before processing already received bits.

It seems that a slow network is enough to cause that behavior even
without any trouble,

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Improvements in Copy From

2020-09-11 Thread Peter Smith

On Thu, Sep 10, 2020 at 9:21 PM vignesh C  wrote:
> > Whether such a micro-optimisation is worth doing is another question.
> Yes, what you suggested can also be done, but even I have the same
> question as you. Because we will reduce just one function call, the
> eof check is present immediately in the function, Should we include
> this or not?

I expect the difference from my suggestion is too small to be measured.

Probably it is not worth changing the already complicated code unless
those changes can achieve something observable.

~~

FYI, I ran a few performance tests BEFORE/AFTER applying your patch.

Perf results for \COPY 5GB CSV file to UNLOGGED table.

perf -a –g 
psql -d test -c "\copy tbl from '/my/path/data_5GB.csv' with (format csv);”
perf report –g

BEFORE
#1 CopyReadLineText = 12.70%, CopyLoadRawBuf = 0.81%
#2 CopyReadLineText = 12.54%, CopyLoadRawBuf = 0.81%
#3 CopyReadLineText = 12.52%, CopyLoadRawBuf = 0.81%

AFTER
#1 CopyReadLineText = 12.55%, CopyLoadRawBuf = 1.20%
#2 CopyReadLineText = 12.15%, CopyLoadRawBuf = 1.10%
#3 CopyReadLineText = 13.11%, CopyLoadRawBuf = 1.24%
#4 CopyReadLineText = 12.86%, CopyLoadRawBuf = 1.18%

I didn't quite know how to interpret those results. It was opposite
what I expected. Perhaps the original excessive CopyLoadRawBuf calls
were so brief they could often avoid being sampled? Anyway, I hope you
have a better understanding of perf than I do and can explain it.

I then repeated/times same tests but without perf

BEFORE
#1 4min.36s
#2 4min.45s
#3 4min.43s
#4 4min.34s

AFTER
#1 4min.41s
#2 4min.37s
#3 4min.34s

As you can see, unfortunately, the patch gave no observable benefit
for my test case.

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Improvements in Copy From

2020-09-10 Thread Kyotaro Horiguchi

At Thu, 10 Sep 2020 21:55:27 +0300, Surafel Temesgen  
wrote in 
> On Thu, Sep 10, 2020 at 1:17 PM vignesh C  wrote:
> 
> >
> > >
> > > We have a patch for column matching feature [1] that may need a header
> > line to be further processed. Even without that I think it is preferable to
> > process the header line for nothing than adding those checks to the loop,
> > performance-wise.
> >
> > I had seen that patch, I feel that change to match the header if the
> > header is specified can be addressed in this patch if that patch gets
> > committed first or vice versa. We are doing a lot of processing for
> > the data which we need not do anything. Shouldn't this be skipped if
> > not required. Similar check is present in NextCopyFromRawFields also
> > to skip header.
> >
> 
> The existing check is unavoidable but we can live better without the checks
> added by the patch. For very large files the loop may iterate millions of
> times if it is not in billion and I am sure doing the check that many times
> will incur noticeable performance degradation than further processing a
> single line.

FWIW, I thought the same thing seeing the additional if-conditions. It
gives more loss than gain.

For the first part, the patch reveals COPY_NEW_FE, which I don't think
to be a knowledge for the function, to CopyGetData. Considering that
that doesn't seem to offer noticeable performance gain, I don't think
we should do that. On the contrary, if incoming data were
intermittently delayed for some reasons (heavy load of client or
in-between network), this patch would make things worse by waiting for
delayed bits before processing already received bits.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Improvements in Copy From

2020-09-10 Thread Surafel Temesgen

On Thu, Sep 10, 2020 at 1:17 PM vignesh C  wrote:

>
> >
> > We have a patch for column matching feature [1] that may need a header
> line to be further processed. Even without that I think it is preferable to
> process the header line for nothing than adding those checks to the loop,
> performance-wise.
>
> I had seen that patch, I feel that change to match the header if the
> header is specified can be addressed in this patch if that patch gets
> committed first or vice versa. We are doing a lot of processing for
> the data which we need not do anything. Shouldn't this be skipped if
> not required. Similar check is present in NextCopyFromRawFields also
> to skip header.
>

The existing check is unavoidable but we can live better without the checks
added by the patch. For very large files the loop may iterate millions of
times if it is not in billion and I am sure doing the check that many times
will incur noticeable performance degradation than further processing a
single line.

regards

Surafel

Re: Improvements in Copy From

2020-09-10 Thread vignesh C

On Wed, Sep 9, 2020 at 12:24 PM Peter Smith  wrote:
>
> My basic understanding of first part of your patch is that by
> adjusting the "minread" it now allows it to loop multiple times
> internally within the CopyGetData rather than calling CopyLoadRawBuf
> for every N lines. There doesn't seem to be much change to what other
> code gets executed so the saving is essentially whatever is the cost
> of making 2 x function calls (CopyLoadRawBuff + CopyGetData) x N. Is
> that understanding correct?

Yes you are right, we will avoid the function calls and try to get as
many records as possible from the buffer & insert it to the relation.

> But with that change there seems to be opportunity for yet another
> tiny saving possible. IIUC, now you are processing a lot more data
> within the CopyGetData so it is now very likely that you will also
> gobble the COPY_NEW_FE's 'c' marker. So cstate->reached_eof will be
> set. So this means the calling code of CopyReadLineText may no longer
> need to call the CopyLoadRawBuf one last time just to discover there
> are no more bytes to read - something that it already knows if
> cstate->reached_eof == true.
>
> For example, with your change can't you also modify CopyReadLineText like 
> below:
>
> BEFORE
> if (!CopyLoadRawBuf(cstate))
> hit_eof = true;
>
> AFTER
> if (cstate->reached_eof)
> {
> cstate->raw_buf[0] = '\0';
> cstate->raw_buf_index = cstate->raw_buf_len = 0;
> hit_eof = true;
> }
> else if (!CopyLoadRawBuf(cstate))
> {
> hit_eof = true;
> }
>
> Whether such a micro-optimisation is worth doing is another question.
Yes, what you suggested can also be done, but even I have the same
question as you. Because we will reduce just one function call, the
eof check is present immediately in the function, Should we include
this or not?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

Re: Improvements in Copy From

2020-09-10 Thread vignesh C

On Mon, Sep 7, 2020 at 1:19 PM Surafel Temesgen  wrote:
>
>
> Hi Vignesh
>
> On Wed, Jul 1, 2020 at 3:46 PM vignesh C  wrote:
>>
>> Hi,
>>
>> While reviewing copy from I identified few  improvements for copy from
>> that can be done :
>> a) copy from stdin copies lesser amount of data to buffer even though
>> space is available in buffer because minread was passed as 1 to
>> CopyGetData, Hence it only reads until the data read from libpq is
>> less than minread. This can be fixed by passing the actual space
>> available in buffer, this reduces the unnecessary frequent calls to
>> CopyGetData.
>
>
> why not applying the same optimization on file read ?

For file read this is already taken care as you can see from below code:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from COPY file: %m")));
if (bytesread == 0)
cstate->reached_eof = true;
break;

We do not have any condition to break unlike the case of stdin, we
read 1 * maxread size of data, So no need to do anything for it.

>
>>
>> c) Copy from reads header line and do nothing for the header line, we
>> need not clear EOL & need not convert to server encoding for the
>> header line.
>
>
> We have a patch for column matching feature [1] that may need a header line 
> to be further processed. Even without that I think it is preferable to 
> process the header line for nothing than adding those checks to the loop, 
> performance-wise.

I had seen that patch, I feel that change to match the header if the
header is specified can be addressed in this patch if that patch gets
committed first or vice versa. We are doing a lot of processing for
the data which we need not do anything. Shouldn't this be skipped if
not required. Similar check is present in NextCopyFromRawFields also
to skip header.

Thoughts?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

Re: Improvements in Copy From

2020-09-08 Thread Peter Smith

My basic understanding of first part of your patch is that by
adjusting the "minread" it now allows it to loop multiple times
internally within the CopyGetData rather than calling CopyLoadRawBuf
for every N lines. There doesn't seem to be much change to what other
code gets executed so the saving is essentially whatever is the cost
of making 2 x function calls (CopyLoadRawBuff + CopyGetData) x N. Is
that understanding correct?

But with that change there seems to be opportunity for yet another
tiny saving possible. IIUC, now you are processing a lot more data
within the CopyGetData so it is now very likely that you will also
gobble the COPY_NEW_FE's 'c' marker. So cstate->reached_eof will be
set. So this means the calling code of CopyReadLineText may no longer
need to call the CopyLoadRawBuf one last time just to discover there
are no more bytes to read - something that it already knows if
cstate->reached_eof == true.

For example, with your change can't you also modify CopyReadLineText like below:

BEFORE
if (!CopyLoadRawBuf(cstate))
hit_eof = true;

AFTER
if (cstate->reached_eof)
{
cstate->raw_buf[0] = '\0';
cstate->raw_buf_index = cstate->raw_buf_len = 0;
hit_eof = true;
}
else if (!CopyLoadRawBuf(cstate))
{
hit_eof = true;
}

Whether such a micro-optimisation is worth doing is another question.

---

Kind Regards,
Peter Smith.
Fujitsu Australia

On Sun, Aug 30, 2020 at 5:25 PM vignesh C  wrote:
>
> On Thu, Aug 27, 2020 at 11:02 AM Peter Smith  wrote:
> >
> > Hello.
> >
> > FYI - that patch has conflicts when applied.
> >
>
> Thanks for letting me know. Attached new patch which is rebased on top of 
> head.
>
> Regards,
> VIgnesh
> EnterpriseDB: http://www.enterprisedb.com

Re: Improvements in Copy From

2020-09-07 Thread Surafel Temesgen

Hi Vignesh

On Wed, Jul 1, 2020 at 3:46 PM vignesh C  wrote:

> Hi,
>
> While reviewing copy from I identified few  improvements for copy from
> that can be done :
> a) copy from stdin copies lesser amount of data to buffer even though
> space is available in buffer because minread was passed as 1 to
> CopyGetData, Hence it only reads until the data read from libpq is
> less than minread. This can be fixed by passing the actual space
> available in buffer, this reduces the unnecessary frequent calls to
> CopyGetData.
>

why not applying the same optimization on file read ?

> c) Copy from reads header line and do nothing for the header line, we
> need not clear EOL & need not convert to server encoding for the
> header line.
>

We have a patch for column matching feature [1] that may need a header line
to be further processed. Even without that I think it is preferable to
process the header line for nothing than adding those checks to the loop,
performance-wise.

[1].
https://www.postgresql.org/message-id/flat/caf1-j-0ptcwmeltswwgv2m70u26n4g33gpe1rckqqe6wvqd...@mail.gmail.com

regards

Surafel

Re: Improvements in Copy From

2020-08-30 Thread vignesh C

On Thu, Aug 27, 2020 at 11:02 AM Peter Smith  wrote:
>
> Hello.
>
> FYI - that patch has conflicts when applied.
>

Thanks for letting me know. Attached new patch which is rebased on top of head.

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com
From a343fe1f8fdf4293d2ef6841e243390b99f29e28 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 30 Aug 2020 12:31:12 +0530
Subject: [PATCH v2] Improvements in copy from.

There are couple of improvements for copy from in this patch which is detailed
below:
a) copy from stdin copies lesser amount of data to buffer even though space is
available in buffer because minread was passed as 1 to CopyGetData, fixed it by
passing the actual space available in buffer, this reduces the frequent call to
CopyGetData.
b) Copy from reads header line and does nothing for the read line, we need not
clear EOL & need not convert to server encoding for the header line.
---
 src/backend/commands/copy.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..c688baa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -801,14 +801,18 @@ CopyLoadRawBuf(CopyState cstate)
 {
 	int			nbytes = RAW_BUF_BYTES(cstate);
 	int			inbytes;
+	int			minread = 1;
 
 	/* Copy down the unprocessed data if any. */
 	if (nbytes > 0)
 		memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
 nbytes);
 
+	if (cstate->copy_dest == COPY_NEW_FE)
+		minread = RAW_BUF_SIZE - nbytes;
+
 	inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
-		  1, RAW_BUF_SIZE - nbytes);
+		  minread, RAW_BUF_SIZE - nbytes);
 	nbytes += inbytes;
 	cstate->raw_buf[nbytes] = '\0';
 	cstate->raw_buf_index = 0;
@@ -3917,7 +3921,7 @@ CopyReadLine(CopyState cstate)
 			} while (CopyLoadRawBuf(cstate));
 		}
 	}
-	else
+	else if (!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		/*
 		 * If we didn't hit EOF, then we must have transferred the EOL marker
@@ -3951,8 +3955,9 @@ CopyReadLine(CopyState cstate)
 		}
 	}
 
-	/* Done reading the line.  Convert it to server encoding. */
-	if (cstate->need_transcoding)
+	/* Done reading the line.  Convert it to server encoding if not header. */
+	if (cstate->need_transcoding &&
+		!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		char	   *cvt;
 
-- 
1.8.3.1

Re: Improvements in Copy From

2020-08-26 Thread Peter Smith

Hello.

FYI - that patch has conflicts when applied.

Kind Regards
Peter Smith
Fujitsu Australia.

On Thu, Aug 27, 2020 at 3:11 PM vignesh C  wrote:
>
> On Tue, Jul 14, 2020 at 12:17 PM vignesh C  wrote:
> >
> > On Tue, Jul 14, 2020 at 11:13 AM David Rowley  wrote:
> > >
> > > On Tue, 14 Jul 2020 at 17:22, David Rowley  wrote:
> > > >
> > > > On Thu, 2 Jul 2020 at 00:46, vignesh C  wrote:
> > > > > b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
> > > > > that is not being used, it can be removed.
> > > >
> > > > This was raised in [1]. We decided not to remove it.
> > >
> > > I just added a comment to the function to mention why we want to keep
> > > the parameter. I hope that will save any wasted time proposing its
> > > removal in the future.
> > >
> > > FWIW, the function is inlined.  Removing it will gain us nothing
> > > performance-wise anyway.
> > >
> > > David
> > >
> > > > [1] 
> > > > https://www.postgresql.org/message-id/flat/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ%40mail.gmail.com#3bae9a84be253c527b0e621add0fbaef
> >
> > Thanks David for pointing it out, as this has been discussed and
> > concluded no point in discussing the same thing again. This patch has
> > a couple of other improvements which can still be taken forward. I
> > will remove this change and post a new patch to retain the other
> > issues that were fixed.
> >
>
> I have removed the changes that david had pointed out and retained the
> remaining changes. Attaching the patch for the same.
> Thoughts?
>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com

Re: Improvements in Copy From

2020-07-14 Thread vignesh C

On Tue, Jul 14, 2020 at 12:17 PM vignesh C  wrote:
>
> On Tue, Jul 14, 2020 at 11:13 AM David Rowley  wrote:
> >
> > On Tue, 14 Jul 2020 at 17:22, David Rowley  wrote:
> > >
> > > On Thu, 2 Jul 2020 at 00:46, vignesh C  wrote:
> > > > b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
> > > > that is not being used, it can be removed.
> > >
> > > This was raised in [1]. We decided not to remove it.
> >
> > I just added a comment to the function to mention why we want to keep
> > the parameter. I hope that will save any wasted time proposing its
> > removal in the future.
> >
> > FWIW, the function is inlined.  Removing it will gain us nothing
> > performance-wise anyway.
> >
> > David
> >
> > > [1] 
> > > https://www.postgresql.org/message-id/flat/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ%40mail.gmail.com#3bae9a84be253c527b0e621add0fbaef
>
> Thanks David for pointing it out, as this has been discussed and
> concluded no point in discussing the same thing again. This patch has
> a couple of other improvements which can still be taken forward. I
> will remove this change and post a new patch to retain the other
> issues that were fixed.
>

I have removed the changes that david had pointed out and retained the
remaining changes. Attaching the patch for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From fbafa5eaaa84028b3bbfb7cde0cbcc3963fd033a Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Tue, 14 Jul 2020 12:21:37 +0530
Subject: [PATCH] Improvements in copy from.

There are couple of improvements for copy from in this patch which is detailed
below:
a) copy from stdin copies lesser amount of data to buffer even though space is
available in buffer because minread was passed as 1 to CopyGetData, fixed it by
passing the actual space available in buffer, this reduces the frequent call to
CopyGetData.
b) Copy from reads header line and does nothing for the read line, we need not
clear EOL & need not convert to server encoding for the header line.
---
 src/backend/commands/copy.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c..bc27dfc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -796,6 +796,7 @@ CopyLoadRawBuf(CopyState cstate)
 {
 	int			nbytes;
 	int			inbytes;
+	int			minread = 1;
 
 	if (cstate->raw_buf_index < cstate->raw_buf_len)
 	{
@@ -807,8 +808,11 @@ CopyLoadRawBuf(CopyState cstate)
 	else
 		nbytes = 0;/* no data need be saved */
 
+	if (cstate->copy_dest == COPY_NEW_FE)
+		minread = RAW_BUF_SIZE - nbytes;
+
 	inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
-		  1, RAW_BUF_SIZE - nbytes);
+		  minread, RAW_BUF_SIZE - nbytes);
 	nbytes += inbytes;
 	cstate->raw_buf[nbytes] = '\0';
 	cstate->raw_buf_index = 0;
@@ -3869,7 +3873,7 @@ CopyReadLine(CopyState cstate)
 			} while (CopyLoadRawBuf(cstate));
 		}
 	}
-	else
+	else if (!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		/*
 		 * If we didn't hit EOF, then we must have transferred the EOL marker
@@ -3903,8 +3907,9 @@ CopyReadLine(CopyState cstate)
 		}
 	}
 
-	/* Done reading the line.  Convert it to server encoding. */
-	if (cstate->need_transcoding)
+	/* Done reading the line.  Convert it to server encoding if not header. */
+	if (cstate->need_transcoding &&
+		!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		char	   *cvt;
 
-- 
1.8.3.1

Re: Improvements in Copy From

2020-07-13 Thread vignesh C

On Tue, Jul 14, 2020 at 11:13 AM David Rowley  wrote:
>
> On Tue, 14 Jul 2020 at 17:22, David Rowley  wrote:
> >
> > On Thu, 2 Jul 2020 at 00:46, vignesh C  wrote:
> > > b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
> > > that is not being used, it can be removed.
> >
> > This was raised in [1]. We decided not to remove it.
>
> I just added a comment to the function to mention why we want to keep
> the parameter. I hope that will save any wasted time proposing its
> removal in the future.
>
> FWIW, the function is inlined.  Removing it will gain us nothing
> performance-wise anyway.
>
> David
>
> > [1] 
> > https://www.postgresql.org/message-id/flat/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ%40mail.gmail.com#3bae9a84be253c527b0e621add0fbaef

Thanks David for pointing it out, as this has been discussed and
concluded no point in discussing the same thing again. This patch has
a couple of other improvements which can still be taken forward. I
will remove this change and post a new patch to retain the other
issues that were fixed.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Improvements in Copy From

2020-07-13 Thread David Rowley

On Tue, 14 Jul 2020 at 17:22, David Rowley  wrote:
>
> On Thu, 2 Jul 2020 at 00:46, vignesh C  wrote:
> > b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
> > that is not being used, it can be removed.
>
> This was raised in [1]. We decided not to remove it.

I just added a comment to the function to mention why we want to keep
the parameter. I hope that will save any wasted time proposing its
removal in the future.

FWIW, the function is inlined.  Removing it will gain us nothing
performance-wise anyway.

David

> [1] 
> https://www.postgresql.org/message-id/flat/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ%40mail.gmail.com#3bae9a84be253c527b0e621add0fbaef

Re: Improvements in Copy From

2020-07-13 Thread David Rowley

On Thu, 2 Jul 2020 at 00:46, vignesh C  wrote:
> b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
> that is not being used, it can be removed.

This was raised in [1]. We decided not to remove it.

David

[1] 
https://www.postgresql.org/message-id/flat/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ%40mail.gmail.com#3bae9a84be253c527b0e621add0fbaef

Re: Improvements in Copy From

2020-07-13 Thread vignesh C

On Wed, Jul 1, 2020 at 6:16 PM vignesh C  wrote:
> Attached patch has the changes for the same.
> Thoughts?
>

Added a commitfest entry for this:
https://commitfest.postgresql.org/29/2642/

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Improvements in Copy From

2020-07-01 Thread vignesh C

Hi,

While reviewing copy from I identified few  improvements for copy from
that can be done :
a) copy from stdin copies lesser amount of data to buffer even though
space is available in buffer because minread was passed as 1 to
CopyGetData, Hence it only reads until the data read from libpq is
less than minread. This can be fixed by passing the actual space
available in buffer, this reduces the unnecessary frequent calls to
CopyGetData.
b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
that is not being used, it can be removed.
c) Copy from reads header line and do nothing for the header line, we
need not clear EOL & need not convert to server encoding for the
header line.

Attached patch has the changes for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From b86edd9bf4f4350598a0518996e64f40f13aea21 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Wed, 1 Jul 2020 17:51:51 +0530
Subject: [PATCH] Improvements in copy from.

There are 3 improvements for copy from in this patch which is detailed below:
a) copy from stdin copies lesser amount of data to buffer even though space is
available in buffer because minread was passed as 1 to CopyGetData, fixed it by
passing the actual space available in buffer, this reduces the frequent call to
CopyGetData.
b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter, this is not
being used, it has been removed.
c) Copy from reads header line and does nothing for the read line, we need not
clear EOL & need not convert to server encoding for the header line.
---
 src/backend/commands/copy.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3e199bd..dfb7d92 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -793,6 +793,7 @@ CopyLoadRawBuf(CopyState cstate)
 {
 	int			nbytes;
 	int			inbytes;
+	int minread = 1;
 
 	if (cstate->raw_buf_index < cstate->raw_buf_len)
 	{
@@ -804,8 +805,11 @@ CopyLoadRawBuf(CopyState cstate)
 	else
 		nbytes = 0;/* no data need be saved */
 
+	if (cstate->copy_dest == COPY_NEW_FE)
+		minread = RAW_BUF_SIZE - nbytes;
+
 	inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
-		  1, RAW_BUF_SIZE - nbytes);
+		  minread, RAW_BUF_SIZE - nbytes);
 	nbytes += inbytes;
 	cstate->raw_buf[nbytes] = '\0';
 	cstate->raw_buf_index = 0;
@@ -2603,8 +2607,7 @@ CopyMultiInsertInfoCleanup(CopyMultiInsertInfo *miinfo)
  * Callers must ensure that the buffer is not full.
  */
 static inline TupleTableSlot *
-CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
-ResultRelInfo *rri)
+CopyMultiInsertInfoNextFreeSlot(ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
@@ -2969,8 +2972,7 @@ CopyFrom(CopyState cstate)
 			Assert(resultRelInfo == target_resultRelInfo);
 			Assert(insertMethod == CIM_MULTI);
 
-			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-	 resultRelInfo);
+			myslot = CopyMultiInsertInfoNextFreeSlot(resultRelInfo);
 		}
 
 		/*
@@ -3117,8 +3119,7 @@ CopyFrom(CopyState cstate)
 /* no other path available for partitioned table */
 Assert(insertMethod == CIM_MULTI_CONDITIONAL);
 
-batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-			resultRelInfo);
+batchslot = CopyMultiInsertInfoNextFreeSlot(resultRelInfo);
 
 if (map != NULL)
 	myslot = execute_attr_map_slot(map->attrMap, myslot,
@@ -3856,7 +3857,7 @@ CopyReadLine(CopyState cstate)
 			} while (CopyLoadRawBuf(cstate));
 		}
 	}
-	else
+	else if (!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		/*
 		 * If we didn't hit EOF, then we must have transferred the EOL marker
@@ -3890,8 +3891,9 @@ CopyReadLine(CopyState cstate)
 		}
 	}
 
-	/* Done reading the line.  Convert it to server encoding. */
-	if (cstate->need_transcoding)
+	/* Done reading the line.  Convert it to server encoding if not header. */
+	if (cstate->need_transcoding &&
+		!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		char	   *cvt;
 
-- 
1.8.3.1

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Re: Improvements in Copy From

Improvements in Copy From

16 matches

Site Navigation

Mail list logo

Footer information