Re: Speed up COPY FROM text/CSV parsing using SIMD

Nazir Bilal Yavuz Fri, 13 Feb 2026 03:46:00 -0800

Hi,

Thanks for the review!


On Thu, 12 Feb 2026 at 01:39, Nathan Bossart <[email protected]> wrote:
>
> On Wed, Feb 11, 2026 at 04:27:50PM +0300, Nazir Bilal Yavuz wrote:
> > I am sharing a v6 which implements (1). My benchmark results show
> > almost no difference for the special-character cases and a nice
> > improvement for the no-special-character cases.
>
> Thanks!
>
> > +     /* Initialize SIMD variables */
> > +     cstate->simd_enabled = false;
> > +     cstate->simd_initialized = false;
>
> > +     /* Initialize SIMD on the first read */
> > +     if (unlikely(!cstate->simd_initialized))
> > +     {
> > +             cstate->simd_initialized = true;
> > +             cstate->simd_enabled = true;
> > +     }
>
> Why do we do this initialization in CopyReadLine() as opposed to setting
> simd_enabled to true when initializing cstate in BeginCopyFrom()?  If we
> can initialize it in BeginCopyFrom, we could probably remove
> simd_initialized.

Correct, I guess this is left over from the earlier versions.

> > +     if (cstate->simd_enabled)
> > +             result = CopyReadLineText(cstate, is_csv, true);
> > +     else
> > +             result = CopyReadLineText(cstate, is_csv, false);
>
> I know we discussed this upthread, but I'd like to take a closer look at
> this to see whether/why it makes such a big difference.  It's a bit awkward
> that CopyReadLineText() needs to manage both its local simd_enabled and
> cstate->simd_enabled.

I extensively benchmarked this with the new v6 version. If I change
this to either of:

CopyReadLineText(cstate, is_csv);
or
CopyReadLineText(cstate, is_csv, cstate->simd_enabled);

then there is %5-%10 regression for the scalar path. I ran my
benchmarks with both "meson --buildtype=debugoptimized" and "meson
--buildtype=release" but the result is the same.

Also, if I change this code to:

    if (cstate->simd_enabled)
    {
        if (is_csv)
            result = CopyReadLineText(cstate, true, true);
        else
            result = CopyReadLineText(cstate, false, true);
    }
    else
    {
        if (is_csv)
            result = CopyReadLineText(cstate, true, false);
        else
            result = CopyReadLineText(cstate, false, false);
    }

then I see ~%5 performance improvement in scalar path compared to master.

> +                       /* Load a chunk of data into a vector register */
> +                       vector8_load(&chunk, (const uint8 *) 
> &copy_input_buf[input_buf_ptr]);
>
> As mentioned upthread [0], I think it's worth testing whether processing
> multiple vectors worth of data in each loop iteration is worthwhile.
>
> [0] https://postgr.es/m/aSTVOe6BIe5f1l3i%40nathan

There are multiple keys in CopyReadLineText() compared to
pg_lfind32(). I am not sure if I correctly used multiple vectors but I
attached what I did as 0002, could you please look at it? I didn't see
any performance benefit in my benchmarks, though.

--
Regards,
Nazir Bilal Yavuz
Microsoft

From c4b29849ad9f87f51022b947a9a0ab695dd1cde2 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Feb 2026 13:28:55 +0300
Subject: [PATCH v7 1/2] Speed up COPY FROM text/CSV parsing using SIMD

This patch disables SIMD when SIMD encounters a special character which
is neither EOF nor EOL.

Author: Shinya Kato <[email protected]>
Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Kazar Ayoub <[email protected]>
Reviewed-by: Nathan Bossart <[email protected]>
Reviewed-by: Neil Conway <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Manni Wood <[email protected]>
Reviewed-by: Mark Wong <[email protected]>
Discussion: https://postgr.es/m/CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig%40mail.gmail.com
---
 src/backend/commands/copyfrom.c          |   3 +
 src/backend/commands/copyfromparse.c     | 125 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |   3 +
 3 files changed, 126 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 25ee20b23db..40dae0bdacc 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1721,6 +1721,9 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->cur_attval = NULL;
 	cstate->relname_only = false;
 
+	/* Initialize SIMD */
+	cstate->simd_enabled = true;
+
 	/*
 	 * Allocate buffers for the input pipeline.
 	 *
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 94d6f415a06..4a127d1af90 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -72,6 +72,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/pg_bswap.h"
+#include "port/simd.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
 
@@ -141,12 +142,14 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate, bool is_csv);
-static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int	CopyReadAttributesText(CopyFromState cstate);
 static int	CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 									 Oid typioparam, int32 typmod,
 									 bool *isnull);
+static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate,
+														bool is_csv,
+														bool simd_enabled);
 static pg_attribute_always_inline bool CopyFromTextLikeOneRow(CopyFromState cstate,
 															  ExprContext *econtext,
 															  Datum *values,
@@ -1173,8 +1176,14 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 	resetStringInfo(&cstate->line_buf);
 	cstate->line_buf_valid = false;
 
-	/* Parse data and transfer into line_buf */
-	result = CopyReadLineText(cstate, is_csv);
+	/*
+	 * Parse data and transfer into line_buf. To benefit from inlining, call
+	 * CopyReadLineText() with constant boolean arguments.
+	 */
+	if (cstate->simd_enabled)
+		result = CopyReadLineText(cstate, is_csv, true);
+	else
+		result = CopyReadLineText(cstate, is_csv, false);
 
 	if (result)
 	{
@@ -1241,8 +1250,8 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate, bool is_csv)
+static pg_attribute_always_inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 {
 	char	   *copy_input_buf;
 	int			input_buf_ptr;
@@ -1257,6 +1266,14 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 	char		quotec = '\0';
 	char		escapec = '\0';
 
+#ifndef USE_NO_SIMD
+	Vector8		nl = vector8_broadcast('\n');
+	Vector8		cr = vector8_broadcast('\r');
+	Vector8		bs = vector8_broadcast('\\');
+	Vector8		quote = vector8_broadcast(0);
+	Vector8		escape = vector8_broadcast(0);
+#endif
+
 	if (is_csv)
 	{
 		quotec = cstate->opts.quote[0];
@@ -1264,6 +1281,12 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 		/* ignore special escape processing if it's the same as quotec */
 		if (quotec == escapec)
 			escapec = '\0';
+
+#ifndef USE_NO_SIMD
+		quote = vector8_broadcast(quotec);
+		if (quotec != escapec)
+			escape = vector8_broadcast(escapec);
+#endif
 	}
 
 	/*
@@ -1330,6 +1353,98 @@ CopyReadLineText(CopyFromState cstate, bool is_csv)
 			need_data = false;
 		}
 
+#ifndef USE_NO_SIMD
+
+		/*
+		 * Use SIMD instructions to efficiently scan the input buffer for
+		 * special characters (e.g., newline, carriage return, quote, and
+		 * escape). This is faster than byte-by-byte iteration, especially on
+		 * large buffers.
+		 *
+		 * We do not apply the SIMD fast path in either of the following
+		 * cases: - When the previously processed character was an escape
+		 * character (last_was_esc), since the next byte must be examined
+		 * sequentially. - When the remaining buffer is smaller than one
+		 * vector width (sizeof(Vector8)), since SIMD operates on fixed-size
+		 * chunks.
+		 *
+		 * Note that, SIMD may become slower when the input contains many
+		 * special characters. To avoid this regression, we disable SIMD for
+		 * the rest of the input once we encounter a special character which
+		 * is neither EOF nor EOL.
+		 */
+		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		{
+			Vector8		chunk;
+			Vector8		match = vector8_broadcast(0);
+			uint32		mask;
+
+			/* Load a chunk of data into a vector register */
+			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+
+			if (is_csv)
+			{
+				/* \n and \r are not special inside quotes */
+				if (!in_quote)
+					match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+
+				match = vector8_or(match, vector8_eq(chunk, quote));
+				if (escapec != '\0')
+					match = vector8_or(match, vector8_eq(chunk, escape));
+			}
+			else
+			{
+				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				match = vector8_or(match, vector8_eq(chunk, bs));
+			}
+
+			/* Check if we found any special characters */
+			mask = vector8_highbit_mask(match);
+			if (mask != 0)
+			{
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				int			advance = pg_rightmost_one_pos32(mask);
+				char		c1,
+							c2;
+				bool		simd_hit_eol,
+							simd_hit_eof;
+
+				input_buf_ptr += advance;
+				c1 = copy_input_buf[input_buf_ptr];
+
+				/*
+				 * Since we stopped within the chunk and ((copy_buf_len -
+				 * input_buf_ptr) > sizeof(Vector8)) is true,
+				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
+				 * readable.
+				 */
+				c2 = copy_input_buf[input_buf_ptr + 1];
+				simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote);
+				simd_hit_eof = c1 == '\\' && c2 == '.' && !is_csv;
+
+				/*
+				 * Do not disable SIMD when we hit EOL or EOF characters. In
+				 * practice, it does not matter for EOF because parsing ends
+				 * there, but we keep the behavior consistent.
+				 */
+				if (!(simd_hit_eof || simd_hit_eol))
+				{
+					simd_enabled = false;
+					cstate->simd_enabled = false;
+				}
+			}
+			else
+			{
+				/* No special characters found, so skip the entire chunk */
+				input_buf_ptr += sizeof(Vector8);
+				continue;
+			}
+		}
+#endif
+
 		/* OK to fetch a character */
 		prev_raw_ptr = input_buf_ptr;
 		c = copy_input_buf[input_buf_ptr++];
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 822ef33cf69..73ce777c52b 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -89,6 +89,9 @@ typedef struct CopyFromStateData
 	const char *cur_attval;		/* current att value for error messages */
 	bool		relname_only;	/* don't output line number, att, etc. */
 
+	/* SIMD variables */
+	bool		simd_enabled;
+
 	/*
 	 * Working state
 	 */
-- 
2.47.3

From 2de9b5bc18bfa169b3ba3507b6bdf79d277c0ad4 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <[email protected]>
Date: Fri, 13 Feb 2026 13:36:34 +0300
Subject: [PATCH v7 2/2] Use 4 vectors in CopyReadLineText() SIMD

---
 src/backend/commands/copyfromparse.c | 116 +++++++++++++++++++++------
 1 file changed, 92 insertions(+), 24 deletions(-)

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 4a127d1af90..caadc40cc8b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1361,6 +1361,9 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 		 * escape). This is faster than byte-by-byte iteration, especially on
 		 * large buffers.
 		 *
+		 * For better instruction-level parallelism, we try to process four
+		 * vectors at a time.
+		 *
 		 * We do not apply the SIMD fast path in either of the following
 		 * cases: - When the previously processed character was an escape
 		 * character (last_was_esc), since the next byte must be examined
@@ -1373,53 +1376,118 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 		 * the rest of the input once we encounter a special character which
 		 * is neither EOF nor EOL.
 		 */
-		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr > sizeof(Vector8))
+		if (simd_enabled && !last_was_esc && copy_buf_len - input_buf_ptr >= 4 * sizeof(Vector8))
 		{
-			Vector8		chunk;
-			Vector8		match = vector8_broadcast(0);
-			uint32		mask;
-
-			/* Load a chunk of data into a vector register */
-			vector8_load(&chunk, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+			Vector8		chunk1,
+						chunk2,
+						chunk3,
+						chunk4;
+			Vector8		match1,
+						match2,
+						match3,
+						match4;
+			Vector8		tmp1,
+						tmp2,
+						result;
+
+			/* Load four chunks of data into vector registers */
+			vector8_load(&chunk1, (const uint8 *) &copy_input_buf[input_buf_ptr]);
+			vector8_load(&chunk2, (const uint8 *) &copy_input_buf[input_buf_ptr + sizeof(Vector8)]);
+			vector8_load(&chunk3, (const uint8 *) &copy_input_buf[input_buf_ptr + 2 * sizeof(Vector8)]);
+			vector8_load(&chunk4, (const uint8 *) &copy_input_buf[input_buf_ptr + 3 * sizeof(Vector8)]);
 
 			if (is_csv)
 			{
 				/* \n and \r are not special inside quotes */
 				if (!in_quote)
-					match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
+				{
+					match1 = vector8_or(vector8_eq(chunk1, nl), vector8_eq(chunk1, cr));
+					match2 = vector8_or(vector8_eq(chunk2, nl), vector8_eq(chunk2, cr));
+					match3 = vector8_or(vector8_eq(chunk3, nl), vector8_eq(chunk3, cr));
+					match4 = vector8_or(vector8_eq(chunk4, nl), vector8_eq(chunk4, cr));
+				}
+				else
+				{
+					match1 = vector8_broadcast(0);
+					match2 = vector8_broadcast(0);
+					match3 = vector8_broadcast(0);
+					match4 = vector8_broadcast(0);
+				}
 
-				match = vector8_or(match, vector8_eq(chunk, quote));
+				match1 = vector8_or(match1, vector8_eq(chunk1, quote));
+				match2 = vector8_or(match2, vector8_eq(chunk2, quote));
+				match3 = vector8_or(match3, vector8_eq(chunk3, quote));
+				match4 = vector8_or(match4, vector8_eq(chunk4, quote));
 				if (escapec != '\0')
-					match = vector8_or(match, vector8_eq(chunk, escape));
+				{
+					match1 = vector8_or(match1, vector8_eq(chunk1, escape));
+					match2 = vector8_or(match2, vector8_eq(chunk2, escape));
+					match3 = vector8_or(match3, vector8_eq(chunk3, escape));
+					match4 = vector8_or(match4, vector8_eq(chunk4, escape));
+				}
 			}
 			else
 			{
-				match = vector8_or(vector8_eq(chunk, nl), vector8_eq(chunk, cr));
-				match = vector8_or(match, vector8_eq(chunk, bs));
+				match1 = vector8_or(vector8_eq(chunk1, nl), vector8_eq(chunk1, cr));
+				match2 = vector8_or(vector8_eq(chunk2, nl), vector8_eq(chunk2, cr));
+				match3 = vector8_or(vector8_eq(chunk3, nl), vector8_eq(chunk3, cr));
+				match4 = vector8_or(vector8_eq(chunk4, nl), vector8_eq(chunk4, cr));
+
+				match1 = vector8_or(match1, vector8_eq(chunk1, bs));
+				match2 = vector8_or(match2, vector8_eq(chunk2, bs));
+				match3 = vector8_or(match3, vector8_eq(chunk3, bs));
+				match4 = vector8_or(match4, vector8_eq(chunk4, bs));
 			}
 
-			/* Check if we found any special characters */
-			mask = vector8_highbit_mask(match);
-			if (mask != 0)
+			/* Combine results to check if any chunk has special characters */
+			tmp1 = vector8_or(match1, match2);
+			tmp2 = vector8_or(match3, match4);
+			result = vector8_or(tmp1, tmp2);
+
+			if (vector8_is_highbit_set(result))
 			{
 				/*
-				 * Found a special character. Advance up to that point and let
-				 * the scalar code handle it.
+				 * Found a special character somewhere in the four chunks.
+				 * Identify the first chunk containing it.
 				 */
-				int			advance = pg_rightmost_one_pos32(mask);
+				uint32		mask;
+				int			advance;
 				char		c1,
 							c2;
 				bool		simd_hit_eol,
 							simd_hit_eof;
 
+				mask = vector8_highbit_mask(match1);
+				if (mask == 0)
+				{
+					input_buf_ptr += sizeof(Vector8);
+					mask = vector8_highbit_mask(match2);
+				}
+				if (mask == 0)
+				{
+					input_buf_ptr += sizeof(Vector8);
+					mask = vector8_highbit_mask(match3);
+				}
+				if (mask == 0)
+				{
+					input_buf_ptr += sizeof(Vector8);
+					mask = vector8_highbit_mask(match4);
+				}
+				Assert(mask != 0);
+
+				/*
+				 * Found a special character. Advance up to that point and let
+				 * the scalar code handle it.
+				 */
+				advance = pg_rightmost_one_pos32(mask);
 				input_buf_ptr += advance;
 				c1 = copy_input_buf[input_buf_ptr];
 
 				/*
-				 * Since we stopped within the chunk and ((copy_buf_len -
-				 * input_buf_ptr) > sizeof(Vector8)) is true,
-				 * copy_input_buf[input_buf_ptr + 1] is guaranteed to be
-				 * readable.
+				 * Since we stopped within the block and ((copy_buf_len -
+				 * input_buf_ptr) >= 4 * sizeof(Vector8)) was true at the
+				 * start, copy_input_buf[input_buf_ptr + 1] is guaranteed to
+				 * be readable.
 				 */
 				c2 = copy_input_buf[input_buf_ptr + 1];
 				simd_hit_eol = (c1 == '\r' || c1 == '\n') && (!is_csv || !in_quote);
@@ -1438,8 +1506,8 @@ CopyReadLineText(CopyFromState cstate, bool is_csv, bool simd_enabled)
 			}
 			else
 			{
-				/* No special characters found, so skip the entire chunk */
-				input_buf_ptr += sizeof(Vector8);
+				/* No special characters found, so skip the entire block */
+				input_buf_ptr += 4 * sizeof(Vector8);
 				continue;
 			}
 		}
-- 
2.47.3

Re: Speed up COPY FROM text/CSV parsing using SIMD

Reply via email to