Hi,
On Thu, 12 Mar 2026 at 20:37, Nathan Bossart <[email protected]> wrote:
>
> Here is what I have staged for commit, which I'm planning to do tomorrow.
> Please review and/or test if you are able.
Thank you!
Unfortunately, v15 causes a regression for a 'csv & wide & 1/3' case
on my end. v14 was taking 8000ms but v15 took ~9100ms. If we add the
tmp_hit_eof variable then the regression disappears. Also, if I use a
struct like below, regression disappears again.
typedef struct CopyReadLineSIMDResult
{
int input_buf_ptr;
bool hit_eof;
bool result;
} CopyReadLineSIMDResult;
When I removed the tmp_hit_eof variable on v14, I didn't encounter any
regression. I really don't understand why this is happening on my end.
Manni didn't encounter any regression on the benchmark [1].
I benchmarked v15 and both of the cases above:
------------------------------------------------------------
Results for default_toast_compression = 'lz4':
+--------------------------------------------------+
| Optimization: -O2 |
+-------------------+--------------+---------------+
| | Text | CSV |
+-------------------+------+-------+-------+-------+
| WIDE | None | 1/3 | None | 1/3 |
+-------------------+------+-------+-------+-------+
| Old master | 4260 | 4789 | 5930 | 8276 |
+-------------------+------+-------+-------+-------+
| v14 | 2489 | 4439 | 2529 | 8098 |
+-------------------+------+-------+-------+-------+
| v15 | 2494 | 4235 | 2490 | 9140 |
+-------------------+------+-------+-------+-------+
| v15 + tmp_hit_eof | 2487 | 4539 | 2478 | 8041 |
+-------------------+------+-------+-------+-------+
| v15 + struct | 2490 | 4531 | 2483 | 7756 |
+-------------------+------+-------+-------+-------+
| | | | | |
+-------------------+------+-------+-------+-------+
| | | | | |
+-------------------+------+-------+-------+-------+
| | Text | CSV |
+-------------------+------+-------+-------+-------+
| NARROW | None | 1/3 | None | 1/3 |
+-------------------+------+-------+-------+-------+
| Old master | 9955 | 10056 | 10329 | 10872 |
+-------------------+------+-------+-------+-------+
| v14 | 9917 | 10080 | 10104 | 10510 |
+-------------------+------+-------+-------+-------+
| v15 | 9898 | 10062 | 10232 | 10483 |
+-------------------+------+-------+-------+-------+
| v15 + tmp_hit_eof | 9847 | 10004 | 10192 | 10437 |
+-------------------+------+-------+-------+-------+
| v15 + struct | 9877 | 10008 | 10107 | 10521 |
+-------------------+------+-------+-------+-------+
------------------------------------------------------------
Results for default_toast_compression = 'pglz':
+---------------------------------------------------+
| Optimization: -O2 |
+-------------------+---------------+---------------+
| | Text | CSV |
+-------------------+-------+-------+-------+-------+
| WIDE | None | 1/3 | None | 1/3 |
+-------------------+-------+-------+-------+-------+
| Old master | 10579 | 10927 | 12276 | 14488 |
+-------------------+-------+-------+-------+-------+
| v14 | 8832 | 10646 | 8815 | 14352 |
+-------------------+-------+-------+-------+-------+
| v15 | 8859 | 10489 | 8835 | 15414 |
+-------------------+-------+-------+-------+-------+
| v15 + tmp_hit_eof | 8828 | 10829 | 8840 | 14297 |
+-------------------+-------+-------+-------+-------+
| v15 + struct | 8847 | 10829 | 8846 | 14003 |
+-------------------+-------+-------+-------+-------+
| | | | | |
+-------------------+-------+-------+-------+-------+
| | | | | |
+-------------------+-------+-------+-------+-------+
| | Text | CSV |
+-------------------+-------+-------+-------+-------+
| NARROW | None | 1/3 | None | 1/3 |
+-------------------+-------+-------+-------+-------+
| Old master | 9952 | 10342 | 10112 | 10861 |
+-------------------+-------+-------+-------+-------+
| v14 | 9907 | 10344 | 10103 | 10492 |
+-------------------+-------+-------+-------+-------+
| v15 | 9897 | 10261 | 10126 | 10490 |
+-------------------+-------+-------+-------+-------+
| v15 + tmp_hit_eof | 9848 | 10218 | 10184 | 10425 |
+-------------------+-------+-------+-------+-------+
| v15 + struct | 9858 | 10150 | 10116 | 10464 |
+-------------------+-------+-------+-------+-------+
------------------------------------------------------------
It can be seen that the 'csv & wide & 1/3' case is much better on 'v15
+ struct' and 'v15 + tmp_hit_eof' but 'text & wide & 1/3' case is a
bit worse but still better than master.
Regardless of the issues above, I encountered a compiler warning on
the v15, if 'USE_NO_SIMD' is defined, then this warning appears:
copyfromparse.c:1780:1: warning: label ‘out’ defined but not used
[-Wunused-label]
Rest of the changes look good to me. v16 is attached, it fixes the
warning by protecting 'out' with '#ifndef USE_NO_SIMD', no other
changes. In addition to that, I put 'using CopyReadLineSIMDResult
struct' as a 0002 to get an opinion.
[1]
https://postgr.es/m/CAKWEB6pMbdMDvhfaX1Z0eSULVQFYhEhssaRHdOxAX_5OYubxKw%40mail.gmail.com
--
Regards,
Nazir Bilal Yavuz
Microsoft
From 49e82abfc752032fb10e2c144f7656f6fdf78366 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Thu, 12 Mar 2026 12:32:23 -0500
Subject: [PATCH v16 1/2] Optimize COPY FROM (FORMAT {text,csv}) using SIMD.
Presently, such commands scan the input buffer one byte at a time
looking for special characters. This commit adds a new path that
uses SIMD instructions to skip over chunks of data without any
special characters. This can be much faster.
To avoid regressions, SIMD processing is disabled for the remainder
of the COPY FROM command as soon as we encounter a short line or a
special character (except for end-of-line characters, else we'd
always disable it after the first line). This is perhaps too
conservative, but it could probably be made more lenient in the
future via fine-tuned heuristics.
Author: Nazir Bilal Yavuz <[email protected]>
Co-authored-by: Shinya Kato <[email protected]>
Reviewed-by: Ayoub Kazar <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Tested-by: Manni Wood <[email protected]>
Tested-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
src/backend/commands/copyfrom.c | 1 +
src/backend/commands/copyfromparse.c | 182 ++++++++++++++++++++++-
src/include/commands/copyfrom_internal.h | 1 +
3 files changed, 181 insertions(+), 3 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0ece40557c8..95f6cb416a9 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1746,6 +1746,7 @@ BeginCopyFrom(ParseState *pstate,
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
cstate->relname_only = false;
+ cstate->simd_enabled = true;
/*
* Allocate buffers for the input pipeline.
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..bae3bf6fb0d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
#include "miscadmin.h"
#include "pgstat.h"
#include "port/pg_bswap.h"
+#include "port/simd.h"
#include "utils/builtins.h"
#include "utils/rel.h"
#include "utils/wait_event.h"
@@ -1311,6 +1312,152 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
return result;
}
+#ifndef USE_NO_SIMD
+/*
+ * Helper function for CopyReadLineText() that uses SIMD instructions to scan
+ * the input buffer for special characters. This can be much faster.
+ *
+ * Note that we disable SIMD for the remainder of the COPY FROM command upon
+ * encountering a special character (except for end-of-line characters) or a
+ * short line. This is perhaps too conservative, but it should help avoid
+ * regressions. It could probably be made more lenient in the future via
+ * fine-tuned heuristics.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+ bool *hit_eof_p, int *input_buf_ptr_p)
+{
+ char *copy_input_buf;
+ int input_buf_ptr;
+ int copy_buf_len;
+ bool unique_esc_char; /* for csv, do quote/esc chars differ? */
+ bool first = true;
+ bool result = false;
+ const Vector8 nl_vec = vector8_broadcast('\n');
+ const Vector8 cr_vec = vector8_broadcast('\r');
+ Vector8 bs_or_quote_vec; /* '\' for text, quote for csv */
+ Vector8 esc_vec; /* only for csv */
+
+ if (is_csv)
+ {
+ char quote = cstate->opts.quote[0];
+ char esc = cstate->opts.escape[0];
+
+ bs_or_quote_vec = vector8_broadcast(quote);
+ esc_vec = vector8_broadcast(esc);
+ unique_esc_char = (quote != esc);
+ }
+ else
+ {
+ bs_or_quote_vec = vector8_broadcast('\\');
+ unique_esc_char = false;
+ }
+
+ /*
+ * For a little extra speed within the loop, we copy some state members
+ * into local variables. Note that we need to use a separate local
+ * variable for input_buf_ptr so that the REFILL_LINEBUF macro works. We
+ * copy its value into the input_buf_ptr_p argument before returning.
+ */
+ copy_input_buf = cstate->input_buf;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * See the corresponding loop in CopyReadLineText() for more information
+ * about the purpose of this loop. This one does the same thing using
+ * SIMD instructions, although we are quick to bail out to the scalar path
+ * if we encounter a special character.
+ */
+ for (;;)
+ {
+ Vector8 chunk;
+ Vector8 match;
+
+ /* Load more data if needed. */
+ if (copy_buf_len - input_buf_ptr < sizeof(Vector8))
+ {
+ REFILL_LINEBUF;
+
+ CopyLoadInputBuf(cstate);
+ /* update our local variables */
+ *hit_eof_p = cstate->input_reached_eof;
+ input_buf_ptr = cstate->input_buf_index;
+ copy_buf_len = cstate->input_buf_len;
+
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
+ if (INPUT_BUF_BYTES(cstate) <= 0)
+ {
+ result = true;
+ break;
+ }
+ }
+
+ /*
+ * If we still don't have enough data for the SIMD path, fall back to
+ * the scalar code. Note that this doesn't necessarily mean we
+ * encountered a short line, so we leave cstate->simd_enabled set to
+ * true.
+ */
+ if (copy_buf_len - input_buf_ptr < sizeof(Vector8))
+ break;
+
+ /*
+ * If we made it here, we have at least enough data to fit in a
+ * Vector8, so we can use SIMD instructions to scan for special
+ * characters.
+ */
+ vector8_load(&chunk, (const uint8 *) ©_input_buf[input_buf_ptr]);
+
+ /*
+ * Check for \n, \r, \\ (for text), quotes (for csv), and escapes (for
+ * csv, if different from quotes).
+ */
+ match = vector8_eq(chunk, nl_vec);
+ match = vector8_or(match, vector8_eq(chunk, cr_vec));
+ match = vector8_or(match, vector8_eq(chunk, bs_or_quote_vec));
+ if (unique_esc_char)
+ match = vector8_or(match, vector8_eq(chunk, esc_vec));
+
+ /*
+ * If we found a special character, advance to it and hand off to the
+ * scalar path. Except for end-of-line characters, we also disable
+ * SIMD processing for the remainder of the COPY FROM command.
+ */
+ if (vector8_is_highbit_set(match))
+ {
+ uint32 mask;
+ char c;
+
+ mask = vector8_highbit_mask(match);
+ input_buf_ptr += pg_rightmost_one_pos32(mask);
+
+ /*
+ * Don't disable SIMD if we found \n or \r, else we'd stop using
+ * SIMD instructions after the first line. As an exception, we do
+ * disable it if this is the first vector we processed, as that
+ * means the line is too short for SIMD.
+ */
+ c = copy_input_buf[input_buf_ptr];
+ if (first || (c != '\n' && c != '\r'))
+ cstate->simd_enabled = false;
+
+ break;
+ }
+
+ /* That chunk was clear of special characters, so we can skip it. */
+ input_buf_ptr += sizeof(Vector8);
+ first = false;
+ }
+
+ *input_buf_ptr_p = input_buf_ptr;
+ return result;
+}
+#endif /* ! USE_NO_SIMD */
+
/*
* CopyReadLineText - inner loop of CopyReadLine for text mode
*/
@@ -1361,11 +1508,36 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
* input_buf_ptr have been determined to be part of the line, but not yet
* transferred to line_buf.
*
- * For a little extra speed within the loop, we copy input_buf and
- * input_buf_len into local variables.
+ * For a little extra speed within the loop, we copy some state
+ * information into local variables. input_buf_ptr could be changed in
+ * the SIMD path, so we must set that one before it. The others are set
+ * afterwards.
*/
- copy_input_buf = cstate->input_buf;
input_buf_ptr = cstate->input_buf_index;
+
+ /*
+ * We first try to use SIMD for the task described above, falling back to
+ * the scalar path (i.e., the loop below) if needed.
+ */
+#ifndef USE_NO_SIMD
+ if (cstate->simd_enabled)
+ {
+ /*
+ * Using a temporary variable seems to encourage the compiler to keep
+ * it in a register, which is beneficial for performance.
+ */
+ int tmp_input_buf_ptr;
+
+ result = CopyReadLineTextSIMDHelper(cstate, is_csv, &hit_eof,
+ &tmp_input_buf_ptr);
+ input_buf_ptr = tmp_input_buf_ptr;
+
+ if (result)
+ goto out;
+ }
+#endif /* ! USE_NO_SIMD */
+
+ copy_input_buf = cstate->input_buf;
copy_buf_len = cstate->input_buf_len;
for (;;)
@@ -1605,6 +1777,10 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
}
} /* end of outer loop */
+#ifndef USE_NO_SIMD
+out:
+#endif /* ! USE_NO_SIMD */
+
/*
* Transfer any still-uncopied data to line_buf.
*/
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..9d3e244ee55 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -108,6 +108,7 @@ typedef struct CopyFromStateData
* att */
bool *defaults; /* if DEFAULT marker was found for
* corresponding att */
+ bool simd_enabled; /* use SIMD to scan for special chars? */
/*
* True if the corresponding attribute's is a constrained domain. This
--
2.47.3
From a32d853e020b1660510f960e7ba52707bbd6afe3 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Mar 2026 14:25:45 +0300
Subject: [PATCH v16 2/2] Use CopyReadLineSIMDResult struct
---
src/backend/commands/copyfromparse.c | 44 +++++++++++++++++-----------
src/tools/pgindent/typedefs.list | 1 +
2 files changed, 28 insertions(+), 17 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index bae3bf6fb0d..3e3358af9e0 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1313,6 +1313,17 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
}
#ifndef USE_NO_SIMD
+/*
+ * Result of CopyReadLineTextSIMDHelper, returned by value to avoid
+ * pointer parameters that could inhibit register allocation in the caller.
+ */
+typedef struct CopyReadLineSIMDResult
+{
+ int input_buf_ptr;
+ bool hit_eof;
+ bool result;
+} CopyReadLineSIMDResult;
+
/*
* Helper function for CopyReadLineText() that uses SIMD instructions to scan
* the input buffer for special characters. This can be much faster.
@@ -1323,21 +1334,23 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
* regressions. It could probably be made more lenient in the future via
* fine-tuned heuristics.
*/
-static bool
-CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
- bool *hit_eof_p, int *input_buf_ptr_p)
+static CopyReadLineSIMDResult
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv)
{
+ CopyReadLineSIMDResult ret;
char *copy_input_buf;
int input_buf_ptr;
int copy_buf_len;
bool unique_esc_char; /* for csv, do quote/esc chars differ? */
bool first = true;
- bool result = false;
const Vector8 nl_vec = vector8_broadcast('\n');
const Vector8 cr_vec = vector8_broadcast('\r');
Vector8 bs_or_quote_vec; /* '\' for text, quote for csv */
Vector8 esc_vec; /* only for csv */
+ ret.hit_eof = false;
+ ret.result = false;
+
if (is_csv)
{
char quote = cstate->opts.quote[0];
@@ -1357,7 +1370,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
* For a little extra speed within the loop, we copy some state members
* into local variables. Note that we need to use a separate local
* variable for input_buf_ptr so that the REFILL_LINEBUF macro works. We
- * copy its value into the input_buf_ptr_p argument before returning.
+ * copy its value into the return struct before returning.
*/
copy_input_buf = cstate->input_buf;
input_buf_ptr = cstate->input_buf_index;
@@ -1381,7 +1394,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
CopyLoadInputBuf(cstate);
/* update our local variables */
- *hit_eof_p = cstate->input_reached_eof;
+ ret.hit_eof = cstate->input_reached_eof;
input_buf_ptr = cstate->input_buf_index;
copy_buf_len = cstate->input_buf_len;
@@ -1391,7 +1404,7 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
*/
if (INPUT_BUF_BYTES(cstate) <= 0)
{
- result = true;
+ ret.result = true;
break;
}
}
@@ -1453,8 +1466,8 @@ CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
first = false;
}
- *input_buf_ptr_p = input_buf_ptr;
- return result;
+ ret.input_buf_ptr = input_buf_ptr;
+ return ret;
}
#endif /* ! USE_NO_SIMD */
@@ -1522,15 +1535,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
#ifndef USE_NO_SIMD
if (cstate->simd_enabled)
{
- /*
- * Using a temporary variable seems to encourage the compiler to keep
- * it in a register, which is beneficial for performance.
- */
- int tmp_input_buf_ptr;
+ CopyReadLineSIMDResult simd_result;
- result = CopyReadLineTextSIMDHelper(cstate, is_csv, &hit_eof,
- &tmp_input_buf_ptr);
- input_buf_ptr = tmp_input_buf_ptr;
+ simd_result = CopyReadLineTextSIMDHelper(cstate, is_csv);
+ hit_eof = simd_result.hit_eof;
+ input_buf_ptr = simd_result.input_buf_ptr;
+ result = simd_result.result;
if (result)
goto out;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0de55183793..2acc40533c6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -538,6 +538,7 @@ CopyMethod
CopyMultiInsertBuffer
CopyMultiInsertInfo
CopyOnErrorChoice
+CopyReadLineSIMDResult
CopySeqResult
CopySource
CopyStmt
--
2.47.3