Comment simplehash/dynahash trade-offs

James Coleman Thu, 30 Apr 2020 18:54:13 -0700

In another thread [1] I'd mused that "there might be some value in a
README or comments
addition that would be a guide to what the various hash
implementations are useful for...so that we have something to make the
code base a bit more discoverable."


I'd solicited feedback from Andres (as the author of the simplehash
implementation) and gotten further explanation from Tomas (both cc'd
here) and have tried to condense that into the comment changes in this
patch series.

v1-0001-Summarize-trade-offs-between-simplehash-and-dynah.patch
Contains the summaries mentioned above.

v1-0002-Improve-simplehash-usage-notes.patch
I'd noticed while adding a simplehash implementation in that other
thread that the facts that:
- The element type requires a status member.
- Why storing a hash in the element type is useful (i.e., when to
define SH_STORE_HASH/SH_GET_HASH).
- The availability of  private_data member for metadata access from callbacks.
are either undocumented or hard to discover, and so I've added the
information directly to the usage notes section.

v1-0003-Show-sample-simplehash-method-signatures.patch
I find it hard to read the macro code "templating" particularly for
seeing what the available API is and so added sample method signatures
in comments to the macro generated method signature defines.

James

[1]: 
https://www.postgresql.org/message-id/CAAaqYe956a-zbm1qR8pqz%3DiLbF8LW5vBrQGrzXVHXdLk3at5_Q%40mail.gmail.com

From 36f6b670b30194efb4b7d0e622afc3085ecd49b6 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc...@gmail.com>
Date: Thu, 30 Apr 2020 21:39:04 -0400
Subject: [PATCH v1 2/3] Improve simplehash usage notes

---
 src/include/lib/simplehash.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index e5101f2f1d..8daf7c4d1a 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -48,6 +48,19 @@
  *	  - SH_STORE_HASH - if defined the hash is stored in the elements
  *	  - SH_GET_HASH(tb, a) - return the field to store the hash in
  *
+ *    The element type is required to contain a "uint32 status" member.
+ *
+ *    While SH_STORE_HASH (and subsequently SH_GET_HASH) are optional, because
+ *    the hash table implementation needs to compare hashes to move elements
+ *    (particularly when growing the hash), it's preferable, if possible, to
+ *    store the element's hash in the element's data type. If the hash is so
+ *    stored, the hash table will also compare hashes before calling SH_EQUAL
+ *    when comparing two keys.
+ *
+ *    For convenience the hash table create functions accept a void pointer
+ *    will be stored in the hash table type's member private_data. This allows
+ *    callbacks to reference caller provided data.
+ *
  *	  For examples of usage look at tidbitmap.c (file local definition) and
  *	  execnodes.h/execGrouping.c (exposed declaration, file local
  *	  implementation).
-- 
2.17.1

From 9116329e51419cb71f00d31f6819294cf2608e74 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc...@gmail.com>
Date: Thu, 30 Apr 2020 21:40:44 -0400
Subject: [PATCH v1 3/3] Show sample simplehash method signatures

---
 src/include/lib/simplehash.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 8daf7c4d1a..d7f7134541 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -176,24 +176,47 @@ typedef struct SH_ITERATOR
 
 /* externally visible function prototypes */
 #ifdef SH_RAW_ALLOCATOR
+/* <prefix>_hash <prefix>_create(uint32 nelements, void *private_data) */
 SH_SCOPE	SH_TYPE *SH_CREATE(uint32 nelements, void *private_data);
 #else
+/*
+ * <prefix>_hash <prefix>_create(MemoryContext ctx, uint32 nelements,
+ *								 void *private_data)
+ */
 SH_SCOPE	SH_TYPE *SH_CREATE(MemoryContext ctx, uint32 nelements,
 							   void *private_data);
 #endif
+/* void <prefix>_destroy(<prefix>_hash *tb) */
 SH_SCOPE void SH_DESTROY(SH_TYPE * tb);
+/* void <prefix>_reset(<prefix>_hash *tb) */
 SH_SCOPE void SH_RESET(SH_TYPE * tb);
+/* void <prefix>_grow(<prefix>_hash *tb) */
 SH_SCOPE void SH_GROW(SH_TYPE * tb, uint32 newsize);
+/* <element> <prefix>_insert(<prefix>_hash *tb, <key> key, bool *found) */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found);
+/*
+ * <element> <prefix>_insert_hash(<prefix>_hash *tb, <key> key, uint32 hash,
+ * 								  bool *found)
+ */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_INSERT_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash, bool *found);
+/* <element> <prefix>_lookup(<prefix>_hash *tb, <key> key) */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
+/* <element> <prefix>_lookup_hash(<prefix>_hash *tb, <key> key, uint32 hash) */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
 											uint32 hash);
+/* bool <prefix>_delete(<prefix>_hash *tb, <key> key) */
 SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
+/* void <prefix>_start_iterate(<prefix>_hash *tb, <prefix>_iterator *iter) */
 SH_SCOPE void SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
+/*
+ * void <prefix>_start_iterate_at(<prefix>_hash *tb, <prefix>_iterator *iter,
+ *								  uint32 at)
+ */
 SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at);
+/* <element> <prefix>_iterate(<prefix>_hash *tb, <prefix>_iterator *iter) */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
+/* void <prefix>_stat(<prefix>_hash *tb */
 SH_SCOPE void SH_STAT(SH_TYPE * tb);
 
 #endif							/* SH_DECLARE */
-- 
2.17.1

From 6033d30015c7c217d6e4d79591cf81ae63e7e9e2 Mon Sep 17 00:00:00 2001
From: James Coleman <jtc...@gmail.com>
Date: Thu, 30 Apr 2020 21:36:07 -0400
Subject: [PATCH v1 1/3] Summarize trade-offs between simplehash and dynahash

When reading the code it's not obvious when one should prefer dynahash
over simplehash and vice-versa, so, for programmer-friendliness,
add comments to inform that decision.
---
 src/backend/utils/hash/dynahash.c | 10 +++++++++-
 src/include/lib/simplehash.h      | 22 ++++++++++++++++++----
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 5b4b9e487f..fc22b76223 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * dynahash.c
- *	  dynamic hash tables
+ *	  dynamic chained hash tables
  *
  * dynahash.c supports both local-to-a-backend hash tables and hash tables in
  * shared memory.  For shared hash tables, it is the caller's responsibility
@@ -41,6 +41,14 @@
  * function must be supplied; comparison defaults to memcmp() and key copying
  * to memcpy() when a user-defined hashing function is selected.
  *
+ * Compared to simplehash, dynahash has the following benefits:
+ *
+ * - It supports partitioning, which is useful for shared memory access using
+ *   locks.
+ * - Because entries don't need to be moved in the case of hash conflicts, has
+ *   better performance for large entries
+ * - Guarantees stable pointers to entries.
+ *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index f7af921f5a..e5101f2f1d 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -1,10 +1,24 @@
 /*
  * simplehash.h
  *
- *	  Hash table implementation which will be specialized to user-defined
- *	  types, by including this file to generate the required code.  It's
- *	  probably not worthwhile to do so for hash tables that aren't performance
- *	  or space sensitive.
+ *	  When included this file generates a "templated" (by way of macros)
+ *	  open-addressing hash table implementation specialized to user-defined
+ *	  types.
+ *
+ *	  It's probably not worthwhile to generate such a specialized implementation
+ *	  for hash tables that aren't performance or space sensitive.
+ *
+ *	  Compared to dynahash, simplehash has the following benefits:
+ *
+ *	  - Due to the "templated" code generation has known structure sizes and no
+ *	    indirect function calls (which show up substantially in dynahash
+ *	    profiles). These features considerably increase speed for small
+ *	    entries.
+ *	  - Open addressing has better CPU cache behavior than dynahash's chained
+ *	    hashtables.
+ *	  - The generated interface is type-safe and easier to use than dynahash,
+ *	    though at the cost of more complex setup.
+ *	  - Does not require the overhead of a separate memory context.
  *
  * Usage notes:
  *
-- 
2.17.1

Comment simplehash/dynahash trade-offs

Reply via email to