Hi,

While looking at a profile I randomly noticed that we spend a surprising
amount of time in snprintf() and its subsidiary functions. That turns
out to be
                    if (strcmp(portal->commandTag, "SELECT") == 0)
                        snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
                                 "SELECT " UINT64_FORMAT, nprocessed);

in PortalRun().  That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here.  A quick benchmark
replacing it with:

                       memcpy(completionTag, "SELECT ", sizeof("SELECT "));
                       pg_lltoa(nprocessed, completionTag + 7);

yields nearly a ~2% increase in TPS. Larger than I expected.  The code
is obviously less pretty, but it's also not actually that bad.

Attached is the patch I used for benchmarking. I wonder if I just hit
some specific version of glibc that regressed snprintf performance, or
whether others can reproduce this.

If it actually reproducible, I think we should go for it. But update the
rest of the completionTag writes in the same file too.

Greetings,

Andres Freund
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 66cc5c35c68..88179e5754a 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -24,6 +24,7 @@
 #include "pg_trace.h"
 #include "tcop/pquery.h"
 #include "tcop/utility.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -780,8 +781,15 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
 				if (completionTag && portal->commandTag)
 				{
 					if (strcmp(portal->commandTag, "SELECT") == 0)
+					{
+#if 0
 						snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
 								 "SELECT " UINT64_FORMAT, nprocessed);
+#else
+						memcpy(completionTag, "SELECT ", sizeof("SELECT "));
+						pg_lltoa(nprocessed, completionTag + 7);
+#endif
+					}
 					else
 						strcpy(completionTag, portal->commandTag);
 				}

Reply via email to