Hi, While looking at a profile I randomly noticed that we spend a surprising amount of time in snprintf() and its subsidiary functions. That turns out to be if (strcmp(portal->commandTag, "SELECT") == 0) snprintf(completionTag, COMPLETION_TAG_BUFSIZE, "SELECT " UINT64_FORMAT, nprocessed);
in PortalRun(). That's actually fairly trivial to optimize - we don't need the full blown snprintf machinery here. A quick benchmark replacing it with: memcpy(completionTag, "SELECT ", sizeof("SELECT ")); pg_lltoa(nprocessed, completionTag + 7); yields nearly a ~2% increase in TPS. Larger than I expected. The code is obviously less pretty, but it's also not actually that bad. Attached is the patch I used for benchmarking. I wonder if I just hit some specific version of glibc that regressed snprintf performance, or whether others can reproduce this. If it actually reproducible, I think we should go for it. But update the rest of the completionTag writes in the same file too. Greetings, Andres Freund
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c index 66cc5c35c68..88179e5754a 100644 --- a/src/backend/tcop/pquery.c +++ b/src/backend/tcop/pquery.c @@ -24,6 +24,7 @@ #include "pg_trace.h" #include "tcop/pquery.h" #include "tcop/utility.h" +#include "utils/builtins.h" #include "utils/memutils.h" #include "utils/snapmgr.h" @@ -780,8 +781,15 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once, if (completionTag && portal->commandTag) { if (strcmp(portal->commandTag, "SELECT") == 0) + { +#if 0 snprintf(completionTag, COMPLETION_TAG_BUFSIZE, "SELECT " UINT64_FORMAT, nprocessed); +#else + memcpy(completionTag, "SELECT ", sizeof("SELECT ")); + pg_lltoa(nprocessed, completionTag + 7); +#endif + } else strcpy(completionTag, portal->commandTag); }