David Rowley писал 2021-04-25 16:36:
On Sun, 25 Apr 2021 at 18:48, Yura Sokolov <y.soko...@postgrespro.ru> wrote:
If you read comments in SH_START_ITERATE, you'll see:

* Search for the first empty element. As deletions during iterations
are
   * supported, we want to start/end at an element that cannot be
affected
   * by elements being shifted.

   * Iterate backwards, that allows the current element to be deleted,
even
   * if there are backward shifts

Therefore, it is safe to delete during iteration, and it doesn't lead
nor
require loop restart.

I had only skimmed that with a pre-loaded assumption that it wouldn't
be safe. I didn't do a very good job of reading it as I failed to
notice the lack of guarantees were about deleting items other than the
current one. I didn't consider the option of finding a free bucket
then looping backwards to avoid missing entries that are moved up
during a delete.

With that, I changed the patch to get rid of the SH_TRUNCATE and got
rid of the smgrclose_internal which skips the hash delete.  The code
is much more similar to how it was now.

In regards to the hashing stuff.  I added a new function to
pg_bitutils.h to rotate left and I'm using that instead of the other
expression that was taken from nodeHash.c

For the hash function, I've done some further benchmarking with:

1) The attached v2 patch
2) The attached + plus use_hash_combine.patch.txt which uses
hash_combine() instead of pg_rotate_left32()ing the hashkey each time.
3) The attached v2 with use_hash_bytes.patch.txt applied.
4) Master

I've also included the hash stats from each version of the hash function.

I hope the numbers help indicate the reason I picked the hash function
that I did.

1) v2 patch.
Median = 113.375 s

LOG:  size: 4096, members: 2032, filled: 0.496094, total chain: 997,
max chain: 5, avg chain: 0.490650, total_collisions: 422,
max_collisions: 3, avg_collisions: 0.207677

2) v2 patch + use_hash_combine.patch.txt
Median = 119.355 s

LOG:  size: 4096, members: 2032, filled: 0.496094, total chain: 971,
max chain: 6, avg chain: 0.477854, total_collisions: 433,
max_collisions: 4, avg_collisions: 0.213091

3) v2 patch + use_hash_bytes.patch.txt
Median = 117.685 s

LOG:  size: 4096, members: 2032, filled: 0.496094, total chain: 900,
max chain: 4, avg chain: 0.442913, total_collisions: 415,
max_collisions: 4, avg_collisions: 0.204232

4) master
Median = 124.49 s

You can see that the bare v2 patch is quite a bit faster than any of
the alternatives.  We'd be better of with hash_bytes than using
hash_combine() both for performance and for the seemingly better job
the hash function does at reducing the hash collisions.

David

Looks much better! Simpler is almost always better.

Minor remarks:

Comment for SH_ENTRY_INITIALIZER e. May be like:
- SH_ENTRY_INITIALIZER(a) - if defined, this macro is called for new entries
  before key or hash is stored in. For example, it can be used to make
  necessary memory allocations.

`pg_rotate_left32(x, 1) == pg_rotate_right(x, 31)`, therefore there's
no need to add `pg_rotate_left32` at all. More over, for hash combining
there's no much difference between `pg_rotate_left32(x, 1)` and
`pg_rotate_right(x, 1)`. (To be honestly, there is a bit of difference
due to murmur construction, but it should not be very big.)

If your test so sensitive to hash function speed, then I'd suggest
to try something even simpler:

static inline uint32
relfilenodebackend_hash(RelFileNodeBackend *rnode)
{
        uint32          h = 0;
#define step(x) h ^= (uint32)(x) * 0x85ebca6b; h = pg_rotate_right(h, 11); h *= 9;
        step(rnode->node.relNode);
step(rnode->node.spcNode); // spcNode could be different for same relNode only // during table movement. Does it pay to hash it?
        step(rnode->node.dbNode);
        step(rnode->backend); // does it matter to hash backend?
// It equals to InvalidBackendId for non-temporary relations // and temporary relations in same database never have same
                              // relNode (have they?).
        return murmurhash32(hashkey);
}

I'd like to see benchmark code. It quite interesting this place became
measurable at all.

regards,
Yura Sokolov.
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 8310f73212..33e5cadd03 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -34,8 +34,6 @@ typedef struct SMgrEntry
        SMgrRelation data;                      /* Pointer to the 
SMgrRelationData */
 } SMgrEntry;
 
-static inline uint32 relfilenodebackend_hash(RelFileNodeBackend *rnode);
-
 /*
  * Because simplehash.h does not provide a stable pointer to hash table
  * entries, we don't make the element type a SMgrRelation directly, instead we
@@ -51,7 +49,7 @@ static inline uint32 
relfilenodebackend_hash(RelFileNodeBackend *rnode);
 #define SH_ELEMENT_TYPE        SMgrEntry
 #define SH_KEY_TYPE            RelFileNodeBackend
 #define        SH_KEY                  data->smgr_rnode
-#define SH_HASH_KEY(tb, key)   relfilenodebackend_hash(&key)
+#define SH_HASH_KEY(tb, key)   hash_bytes((const unsigned char *) &key, 
sizeof(RelFileNodeBackend))
 #define SH_EQUAL(tb, a, b)             (memcmp(&a, &b, 
sizeof(RelFileNodeBackend)) == 0)
 #define        SH_SCOPE                static inline
 #define SH_STORE_HASH
@@ -133,37 +131,6 @@ static dlist_head unowned_relns;
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
 
-/*
- * relfilenodebackend_hash
- *             Custom rolled hash function for simplehash table.
- *
- * smgropen() is often a bottleneck in CPU bound workloads during crash
- * recovery.  We make use of this custom hash function rather than using
- * hash_bytes as it gives us a little bit more performance.
- *
- * XXX What if sizeof(Oid) is not 4?
- */
-static inline uint32
-relfilenodebackend_hash(RelFileNodeBackend *rnode)
-{
-       uint32          hashkey;
-
-       hashkey = murmurhash32((uint32) rnode->node.spcNode);
-
-       /* rotate hashkey left 1 bit at each step */
-       hashkey = pg_rotate_left32(hashkey, 1);
-       hashkey ^= murmurhash32((uint32) rnode->node.dbNode);
-
-       hashkey = pg_rotate_left32(hashkey, 1);
-       hashkey ^= murmurhash32((uint32) rnode->node.relNode);
-
-       hashkey = pg_rotate_left32(hashkey, 1);
-       hashkey ^= murmurhash32((uint32) rnode->backend);
-
-       return hashkey;
-}
-
-
 /*
  *     smgrinit(), smgrshutdown() -- Initialize or shut down storage
  *                                                               managers.

Attachment: v2-0001-Use-simplehash.h-hashtables-in-SMgr.patch
Description: Binary data

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 8310f73212..f291ded795 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -147,18 +147,19 @@ static inline uint32
 relfilenodebackend_hash(RelFileNodeBackend *rnode)
 {
        uint32          hashkey;
+       uint32          hashkey2;
 
        hashkey = murmurhash32((uint32) rnode->node.spcNode);
 
        /* rotate hashkey left 1 bit at each step */
-       hashkey = pg_rotate_left32(hashkey, 1);
-       hashkey ^= murmurhash32((uint32) rnode->node.dbNode);
+       hashkey2 = murmurhash32((uint32) rnode->node.dbNode);
+       hashkey = hash_combine(hashkey, hashkey2);
 
-       hashkey = pg_rotate_left32(hashkey, 1);
-       hashkey ^= murmurhash32((uint32) rnode->node.relNode);
+       hashkey2 = murmurhash32((uint32) rnode->node.relNode);
+       hashkey = hash_combine(hashkey, hashkey2);
 
-       hashkey = pg_rotate_left32(hashkey, 1);
-       hashkey ^= murmurhash32((uint32) rnode->backend);
+       hashkey2 = murmurhash32((uint32) rnode->backend);
+       hashkey = hash_combine(hashkey, hashkey2);
 
        return hashkey;
 }
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index adfc6f67e2..5d34e06eab 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7615,7 +7615,8 @@ StartupXLOG(void)
                                ereport(LOG,
                                                (errmsg("last completed 
transaction was at log time %s",
                                                                
timestamptz_to_str(xtime))));
-
+                       smgrstats();
+                       elog(PANIC, "recovery PANIC");
                        InRedo = false;
                }
                else
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 64a26e06c6..850b51e316 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -183,6 +183,12 @@ smgr_entry_cleanup(SMgrRelation reln)
        pfree(reln);
 }
 
+void
+smgrstats(void)
+{
+       if (SMgrRelationHash != NULL)
+               smgrtable_stat(SMgrRelationHash);
+}
 
 /*
  *     smgrinit(), smgrshutdown() -- Initialize or shut down storage
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index a6fbf7b6a6..ac010af74a 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -77,6 +77,7 @@ typedef SMgrRelationData *SMgrRelation;
 #define SmgrIsTemp(smgr) \
        RelFileNodeBackendIsTemp((smgr)->smgr_rnode)
 
+extern void smgrstats(void);
 extern void smgrinit(void);
 extern SMgrRelation smgropen(RelFileNode rnode, BackendId backend);
 extern bool smgrexists(SMgrRelation reln, ForkNumber forknum);

Reply via email to