On Thu, Jan 26, 2023 at 09:37:13AM +0100, Peter Eisentraut wrote:
> Ok, the documentation make sense now.  I wonder what the performance impact
> is.  Probably, nobody cares about microoptimizing CREATE TABLE statements.
> But BEGIN/COMMIT could matter.  However, whatever you do in between the
> BEGIN and COMMIT will already be jumbled, so you're already paying the
> overhead.  Hopefully, jumbling such simple commands will have no noticeable
> overhead.
> 
> In other words, we should test this and hopefully get rid of the 'string'
> method.

Yep.  I have mentioned a few numbers upthread, and this deserves
discussion.

FYI, I have done more micro-benchmarking to compare both methods for
utility queries by hijacking JumbleQuery() to run the computation in a
tight loop run N times (could not come up with a better idea to avoid
the repeated palloc/pfree overhead), as the path to stress is
_jumbleNode().  See the attached, that should be able to apply on top
of the latest patch set (named as .txt to not feed it to the CF bot,
and need to recompile to switch the iteration).

Using that, I can compile the following results for various cases (-O2
and compute_query_id=on):
          query          |  mode  | iterations | avg_runtime_ns | avg_jumble_ns 
-------------------------+--------+------------+----------------+---------------
 begin                   | string |   50000000 |        4.53116 |          4.54
 begin                   | jumble |   50000000 |       30.94578 |         30.94
 commit                  | string |   50000000 |        4.76004 |          4.74
 commit                  | jumble |   50000000 |        31.4791 |         31.48
 create table 1 column   | string |   50000000 |        7.22836 |          7.08
 create table 1 column   | jumble |   50000000 |      152.10852 |        151.96
 create table 5 columns  | string |   50000000 |       12.43412 |         12.28
 create table 5 columns  | jumble |   50000000 |      352.88976 |         349.1
 create table 20 columns | string |    5000000 |         49.591 |          48.2
 create table 20 columns | jumble |    5000000 |      2272.4066 |          2271
 drop table 1 column     | string |   50000000 |        6.70538 |          6.56
 drop table 1 column     | jumble |   50000000 |          50.38 |         50.24
 drop table 5 columns    | string |   50000000 |        6.88256 |          6.74
 drop table 5 columns    | jumble |   50000000 |       50.02898 |          49.9
 SET work_mem            | string |   50000000 |        7.28752 |          7.28
 SET work_mem            | jumble |   50000000 |       91.66588 |         91.64
(16 rows)

avg_runtime_ns is (query runtime / iterations) and avg_jumble_ns is
the same with the difference between the start/end logs in the txt
patch attached.  The overhead to run the query does not matter much if
you compare both.  The time it takes to run a jumble is correlated to
the number of nodes to go through for each query, and there is a
larger gap for more nodes to go through.  Well, a simple "begin" or
"commit" query has its computation time increase from 4ns to 30ns in
average which would be unnoticeable.  The gap is larger for larger
nodes, like SET, still we jump from 7ns to 90ns in this case.  DDLs
take the most hit with this method, where a 20-column CREATE TABLE
jumps from 50ns to 2us (note that the iteration is 10 times lower
here).

At the end, that would be unnoticeable for the average user, I guess,
but here are the numbers I get on my laptop :)
--
Michael
From ee47c4e1137adc857184d76ab62232e9c3c95451 Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Fri, 27 Jan 2023 10:58:34 +0900
Subject: [PATCH 1/1] Add benchmark tweaks to stress jumbling code

---
 src/backend/nodes/queryjumblefuncs.c | 36 +++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/src/backend/nodes/queryjumblefuncs.c 
b/src/backend/nodes/queryjumblefuncs.c
index d55adf5020..9c134cc5ae 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -101,32 +101,48 @@ CleanQuerytext(const char *query, int *location, int *len)
 JumbleState *
 JumbleQuery(Query *query, const char *querytext)
 {
+#define MAX_QUERY_COMPUTE 5000000
+
        JumbleState *jstate = NULL;
 
        Assert(IsQueryIdEnabled());
 
+       elog(WARNING, "start JumbleQuery");
+
        if (query->utilityStmt &&
                utility_query_id == UTILITY_QUERY_ID_STRING)
        {
-               query->queryId = compute_utility_query_id(querytext,
-                                                                               
                  query->stmt_location,
-                                                                               
                  query->stmt_len);
+               for (int i = 0; i < MAX_QUERY_COMPUTE ; i++)
+               {
+                       query->queryId = compute_utility_query_id(querytext,
+                                                                               
                          query->stmt_location,
+                                                                               
                          query->stmt_len);
+               }
        }
        else
        {
                jstate = (JumbleState *) palloc(sizeof(JumbleState));
 
-               /* Set up workspace for query jumbling */
                jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-               jstate->jumble_len = 0;
                jstate->clocations_buf_size = 32;
                jstate->clocations = (LocationLen *)
                        palloc(jstate->clocations_buf_size * 
sizeof(LocationLen));
-               jstate->clocations_count = 0;
-               jstate->highest_extern_param_id = 0;
 
-               /* Compute query ID and mark the Query node with it */
-               _jumbleNode(jstate, (Node *) query);
+               for (int i = 0; i < MAX_QUERY_COMPUTE ; i++)
+               {
+                       memset(jstate->jumble, 0, sizeof(JUMBLE_SIZE));
+                       /* Set up workspace for query jumbling */
+                       jstate->jumble_len = 0;
+                       jstate->clocations_buf_size = 32;
+                       memset(jstate->clocations, 0,
+                                  jstate->clocations_buf_size * 
sizeof(LocationLen));
+                       jstate->clocations_count = 0;
+                       jstate->highest_extern_param_id = 0;
+
+                       /* Compute query ID and mark the Query node with it */
+                       _jumbleNode(jstate, (Node *) query);
+               }
+
                query->queryId = 
DatumGetUInt64(hash_any_extended(jstate->jumble,
                                                                                
                                  jstate->jumble_len,
                                                                                
                                  0));
@@ -139,6 +155,8 @@ JumbleQuery(Query *query, const char *querytext)
                        query->queryId = UINT64CONST(1);
        }
 
+       elog(WARNING, "end JumbleQuery");
+
        return jstate;
 }
 
-- 
2.39.0

Attachment: signature.asc
Description: PGP signature

Reply via email to