Hi,
While looking at a profile I randomly noticed that we spend a surprising
amount of time in snprintf() and its subsidiary functions. That turns
out to be
if (strcmp(portal->commandTag, "SELECT") == 0)
snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
"SELECT " UINT64_FORMAT, nprocessed);
in PortalRun(). That's actually fairly trivial to optimize - we don't
need the full blown snprintf machinery here. A quick benchmark
replacing it with:
memcpy(completionTag, "SELECT ", sizeof("SELECT "));
pg_lltoa(nprocessed, completionTag + 7);
yields nearly a ~2% increase in TPS. Larger than I expected. The code
is obviously less pretty, but it's also not actually that bad.
Attached is the patch I used for benchmarking. I wonder if I just hit
some specific version of glibc that regressed snprintf performance, or
whether others can reproduce this.
If it actually reproducible, I think we should go for it. But update the
rest of the completionTag writes in the same file too.
Greetings,
Andres Freund
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 66cc5c35c68..88179e5754a 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -24,6 +24,7 @@
#include "pg_trace.h"
#include "tcop/pquery.h"
#include "tcop/utility.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
@@ -780,8 +781,15 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
if (completionTag && portal->commandTag)
{
if (strcmp(portal->commandTag, "SELECT") == 0)
+ {
+#if 0
snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
"SELECT " UINT64_FORMAT, nprocessed);
+#else
+ memcpy(completionTag, "SELECT ", sizeof("SELECT "));
+ pg_lltoa(nprocessed, completionTag + 7);
+#endif
+ }
else
strcpy(completionTag, portal->commandTag);
}