Hi,

On Mon, 9 Mar 2026 at 21:25, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Mar 04, 2026 at 06:15:53PM +0300, Nazir Bilal Yavuz wrote:
> > +#ifndef USE_NO_SIMD
> > +static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
> > +                                                                        
> > bool *temp_hit_eof, int *temp_input_buf_ptr);
> > +#endif
>
> Should we inline this, too?

I think there is no need to inline this function. In the previous
version, SIMD code was in the main for loop which loops for every
character in the data. This means there was branching for every
character in the data. In the current version, SIMD code is outside of
this loop so there is no branching.


> > +                             /*
> > +                              * Do not disable SIMD when we hit EOL or EOF 
> > characters. In
> > +                              * practice, it does not matter for EOF 
> > because parsing ends
> > +                              * there, but we keep the behavior consistent.
> > +                              */
> > +                             if (!(simd_hit_eof || simd_hit_eol))
> > +                                     cstate->simd_enabled = false;
>
> nitpick: I would personally avoid disabling it for EOF.  It probably
> doesn't amount to much, but I don't see any point in the extra
> complexity/work solely for consistency.

Done. I thought that was a small change but this removed more
complexity than I thought.


>
> > +                             /*
> > +                              * We encountered a EOL or EOF on the first 
> > vector. This means
> > +                              * lines are not long enough to skip fully 
> > sized vector. If
> > +                              * this happens two times consecutively, then 
> > disable the
> > +                              * SIMD.
> > +                              */
> > +                             if (first_vector)
> > +                             {
> > +                                     if (cstate->simd_failed_first_vector)
> > +                                             cstate->simd_enabled = false;
> > +
> > +                                     cstate->simd_failed_first_vector = 
> > true;
> > +                             }
>
> The first time I saw this, my mind immediately went to the extreme case
> where this likely regresses: alternating long and short lines.  We might
> just want to disable it the first time we see a short line, like we do for
> special characters.  This is another thing that we can improve
> independently later on.

I agree with you, done.


>
> > +     /* First try to run SIMD, then continue with the scalar path */
> > +     if (cstate->simd_enabled)
> > +     {
> > +             int                     temp_input_buf_ptr = input_buf_ptr;
> > +             bool            temp_hit_eof = false;
> > +
> > +             result = CopyReadLineTextSIMDHelper(cstate, is_csv, 
> > &temp_hit_eof,
> > +                                                                           
> >           &temp_input_buf_ptr);
> > +             input_buf_ptr = temp_input_buf_ptr;
> > +             hit_eof = temp_hit_eof;
>
> Given CopyReadLineTextSIMDHelper() doesn't have too much duplicated code,
> moving the SIMD stuff to its own function is nice.  The temp variables seem
> a bit too magical to me, though.  If those really make a difference, IMHO
> there ought to be a big comment explaining why.

I added a comment, please let me know if you wouldn't like it.


--
Regards,
Nazir Bilal Yavuz
Microsoft
From de695aaf5c7ceeb4f62d2352fabbb111047a4434 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Wed, 4 Mar 2026 17:28:54 +0300
Subject: [PATCH v12] Speed up COPY FROM text/CSV parsing using SIMD

COPY FROM text and CSV parsing previously scanned the input buffer
one byte at a time looking for special characters (newline, carriage
return, backslash, and in CSV mode, quote and escape characters).
This patch adds a SIMD-accelerated fast path that processes a full
vector-width chunk per iteration, significantly reducing the number
of iterations needed on inputs with long lines.

A new helper function, CopyReadLineTextSIMDHelper(), loads chunks
of the input buffer into vector registers and checks for any special
characters using SIMD comparisons. When no special characters are
found, the entire chunk is skipped at once. When a special character
is found, the helper advances to that position and hands off to the
existing scalar code to handle it correctly.

To avoid a regression on inputs that contain many special characters,
SIMD is disabled for the remainder of the current input once a
non-EOL special character is encountered. SIMD is also disabled when
the processed line is shorter than a full vector, since SIMD can't
provide a benefit there.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 206 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 205 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2f42f55e229..fe18bd70890 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1747,6 +1747,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 84c8809a889..166b1c4c415 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 #include "utils/wait_event.h"
@@ -159,6 +160,12 @@ static pg_attribute_always_inline bool NextCopyFromRawFieldsInternal(CopyFromSta
 																	 int *nfields,
 																	 bool is_csv);
 
+/* SIMD functions */
+#ifndef USE_NO_SIMD
+static bool CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv,
+									   bool *temp_hit_eof, int *temp_input_buf_ptr);
+#endif
+
 
 /* Low-level communications functions */
 static int	CopyGetData(CopyFromState cstate, void *databuf,
@@ -1311,6 +1318,155 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	return result;
 }
 
+#ifndef USE_NO_SIMD
+/*
+ * Use SIMD instructions to efficiently scan the input buffer for special
+ * characters (e.g., newline, carriage return, quote, and escape). This is
+ * faster than byte-by-byte iteration, especially on large buffers.
+ *
+ * Note that, SIMD may become slower when the input contains many special
+ * characters. To avoid this regression, we disable SIMD for the rest of the
+ * input once we encounter a special character which isn't EOL.
+ * Also, SIMD is disabled when it encounters a short line that SIMD can't
+ * create a full sized Vector, too.
+ */
+static bool
+CopyReadLineTextSIMDHelper(CopyFromState cstate, bool is_csv, bool *temp_hit_eof, int *temp_input_buf_ptr)
+{
+	char		quotec = '\0';
+	char		escapec = '\0';
+	char	   *copy_input_buf;
+	int			input_buf_ptr;
+	int			copy_buf_len;
+	bool		result = false;
+	bool		unique_escapec = false;
+	bool		first_vector = true;
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+
+	if (is_csv)
+	{
+		quotec = cstate->opts.quote[0];
+		escapec = cstate->opts.escape[0];
+
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+		{
+			unique_escapec = true;
+			escape = vector8_broadcast(escapec);
+		}
+	}
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	input_buf_ptr = cstate->input_buf_index;
+	copy_buf_len = cstate->input_buf_len;
+
+	while (true)
+	{
+		/* Load more data if needed */
+		if (sizeof(Vector8) > copy_buf_len - input_buf_ptr)
+		{
+			REFILL_LINEBUF;
+
+			CopyLoadInputBuf(cstate);
+			/* update our local variables */
+			*temp_hit_eof = cstate->input_reached_eof;
+			input_buf_ptr = cstate->input_buf_index;
+			copy_buf_len = cstate->input_buf_len;
+
+			/*
+			 * If we are completely out of data, break out of the loop,
+			 * reporting EOF.
+			 */
+			if (INPUT_BUF_BYTES(cstate) <= 0)
+			{
+				result = true;
+				break;
+			}
+		}
+
+		if (copy_buf_len - input_buf_ptr >= sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (unique_escapec)
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			if (vector8_is_highbit_set(match))
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				uint32		mask;
+				int			advance;
+				char		c;
+
+				mask = vector8_highbit_mask(match);
+				advance = pg_rightmost_one_pos32(mask);
+
+				input_buf_ptr += advance;
+				c = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * We encountered a special character in the first vector.
+				 * This means line is not long enough to skip fully sized
+				 * vector. To be cautios, disable SIMD for the rest.
+				 *
+				 * Otherwise, do not disable SIMD when we hit EOL characters.
+				 * We don't check for EOF because parsing ends there.
+				 */
+				if (first_vector || !(c == '\r' || c == '\n'))
+					cstate->simd_enabled = false;
+
+				break;
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				first_vector = false;
+			}
+		}
+
+		/*
+		 * Although we refill linebuf, there is not enough character to fill
+		 * full sized vector. This doesn't mean that we encountered a line
+		 * that is not enough to fill a full sized vector.
+		 *
+		 * Scalar code will handle the rest for this line. Then, SIMD will
+		 * continue from the next line.
+		 */
+		else
+		{
+			break;
+		}
+	}
+
+	*temp_input_buf_ptr = input_buf_ptr;
+	return result;
+}
+#endif
+
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
@@ -1339,6 +1495,49 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			escapec = '\0';
 	}
 
+	/*
+	 * input_buf_ptr might be updated in the SIMD Helper function, so it needs
+	 * to be set before calling CopyReadLineTextSIMDHelper().
+	 */
+	input_buf_ptr = cstate->input_buf_index;
+
+#ifndef USE_NO_SIMD
+	/* First try to run SIMD, then continue with the scalar path */
+	if (cstate->simd_enabled)
+	{
+		/*
+		 * Temporary variables are used here instead of passing the actual
+		 * variables (especially input_buf_ptr) directly to the helper. Taking
+		 * the address of a local variable might force the compiler to
+		 * allocate it on the stack rather than in a register.  Because
+		 * input_buf_ptr is used heavily in the hot scalar path below, keeping
+		 * it in a register is important for performance.
+		 */
+		int			temp_input_buf_ptr;
+		bool		temp_hit_eof = hit_eof;
+
+		result = CopyReadLineTextSIMDHelper(cstate, is_csv, &temp_hit_eof,
+											&temp_input_buf_ptr);
+		input_buf_ptr = temp_input_buf_ptr;
+		hit_eof = temp_hit_eof;
+
+		/* Short exit from SIMD */
+		if (result)
+		{
+			/*
+			 * Transfer any still-uncopied data to line_buf.
+			 */
+			REFILL_LINEBUF;
+
+			return result;
+		}
+	}
+#endif
+
+	/* For a little extra speed we copy these into local variables */
+	copy_input_buf = cstate->input_buf;
+	copy_buf_len = cstate->input_buf_len;
+
 	/*
 	 * The objective of this loop is to transfer the entire next input line
 	 * into line_buf.  Hence, we only care for detecting newlines (\r and/or
@@ -1360,14 +1559,7 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	 * character to examine; any characters from input_buf_index to
 	 * input_buf_ptr have been determined to be part of the line, but not yet
 	 * transferred to line_buf.
-	 *
-	 * For a little extra speed within the loop, we copy input_buf and
-	 * input_buf_len into local variables.
 	 */
-	copy_input_buf = cstate->input_buf;
-	input_buf_ptr = cstate->input_buf_index;
-	copy_buf_len = cstate->input_buf_len;
-
 	for (;;)
 	{
 		int			prev_raw_ptr;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f892c343157..5b020bf4d0b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3

Reply via email to