On Tue, Sep 8, 2020 at 11:17 PM Jakub Wartak <jakub.war...@tomtom.com> wrote:
> I agree with both, I just thought it might be interesting finding as this 
> idiv might be (?) present in other common paths like ReadBuffer*() / 
> PinBuffer() (some recent discussions, maybe on NUMA boxes), not just WAL 
> recovery as it seems relatively easy to improve.

I wrote a draft commit message for Jakub's proposed change (attached),
and did a little bit of testing, but I haven't seen a case where it
wins yet; I need to work on something else now but thought I'd share
this much anyway.  One observation is that the rule in init_htab had
better agree with the expansion rule in hash_search_with_hash_value,
otherwise you can size your hash table perfectly for n elements and
then it still has to expand when you insert the nth element, which is
why I changed >= to >.  Does this make sense?  Oh man, I don't like
the mash-up of int, long, uint32, Size dynahash uses in various places
and that are brought into relief by that comparison...
From e5d890bc1178861dc1a5b1f430289a5ee2b4a2fa Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 10 Sep 2020 12:27:25 +1200
Subject: [PATCH] Remove custom fill factor support from dynahash.c.

Since ancient times we have had support for a fill factor (maximum load
factor) to be set for a dynahash hash table, but:

1.  It had to be an integer value >= 1, whereas for in memory hash
tables interesting load factor targets are probably somewhere near the
0.75-1.0 range.

2.  It was implemented in a way that performed an expensive division
operation that regularly showed up in profiles.

3.  We are not aware of anyone ever having used a non-default value.

Therefore, remove support, making fill factor 1 as the implicit value.

Author: Jakub Wartak <jakub.war...@tomtom.com>
Reviewed-by: Alvaro Herrera <alvhe...@2ndquadrant.com>
Reviewed-by: Tomas Vondra <tomas.von...@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.mu...@gmail.com>
Discussion: https://postgr.es/m/VI1PR0701MB696044FC35013A96FECC7AC8F62D0%40VI1PR0701MB6960.eurprd07.prod.outlook.com
---
 src/backend/utils/hash/dynahash.c | 15 ++++-----------
 src/include/utils/hsearch.h       |  2 --
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index f4fbccdd7e..4aed0114fb 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -191,7 +191,6 @@ struct HASHHDR
 	Size		keysize;		/* hash key length in bytes */
 	Size		entrysize;		/* total user element size in bytes */
 	long		num_partitions; /* # partitions (must be power of 2), or 0 */
-	long		ffactor;		/* target fill factor */
 	long		max_dsize;		/* 'dsize' limit if directory is fixed size */
 	long		ssize;			/* segment size --- must be power of 2 */
 	int			sshift;			/* segment shift = log2(ssize) */
@@ -497,8 +496,6 @@ hash_create(const char *tabname, long nelem, HASHCTL *info, int flags)
 		/* ssize had better be a power of 2 */
 		Assert(hctl->ssize == (1L << hctl->sshift));
 	}
-	if (flags & HASH_FFACTOR)
-		hctl->ffactor = info->ffactor;
 
 	/*
 	 * SHM hash tables have fixed directory size passed by the caller.
@@ -603,8 +600,6 @@ hdefault(HTAB *hashp)
 
 	hctl->num_partitions = 0;	/* not partitioned */
 
-	hctl->ffactor = DEF_FFACTOR;
-
 	/* table has no fixed maximum size */
 	hctl->max_dsize = NO_MAX_DSIZE;
 
@@ -670,11 +665,10 @@ init_htab(HTAB *hashp, long nelem)
 			SpinLockInit(&(hctl->freeList[i].mutex));
 
 	/*
-	 * Divide number of elements by the fill factor to determine a desired
-	 * number of buckets.  Allocate space for the next greater power of two
-	 * number of buckets
+	 * Allocate space for the next greater power of two number of buckets,
+	 * assuming a desired maximum load factor of 1.
 	 */
-	nbuckets = next_pow2_int((nelem - 1) / hctl->ffactor + 1);
+	nbuckets = next_pow2_int(nelem);
 
 	/*
 	 * In a partitioned table, nbuckets must be at least equal to
@@ -733,7 +727,6 @@ init_htab(HTAB *hashp, long nelem)
 			"DIRECTORY SIZE  ", hctl->dsize,
 			"SEGMENT SIZE    ", hctl->ssize,
 			"SEGMENT SHIFT   ", hctl->sshift,
-			"FILL FACTOR     ", hctl->ffactor,
 			"MAX BUCKET      ", hctl->max_bucket,
 			"HIGH MASK       ", hctl->high_mask,
 			"LOW  MASK       ", hctl->low_mask,
@@ -975,7 +968,7 @@ hash_search_with_hash_value(HTAB *hashp,
 		 * order of these tests is to try to check cheaper conditions first.
 		 */
 		if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
-			hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+			hctl->freeList[0].nentries > (long) (hctl->max_bucket + 1) &&
 			!has_seq_scans(hashp))
 			(void) expand_table(hashp);
 	}
diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
index f1deb9beab..bebf89b3c4 100644
--- a/src/include/utils/hsearch.h
+++ b/src/include/utils/hsearch.h
@@ -68,7 +68,6 @@ typedef struct HASHCTL
 	long		ssize;			/* segment size */
 	long		dsize;			/* (initial) directory size */
 	long		max_dsize;		/* limit to dsize if dir size is limited */
-	long		ffactor;		/* fill factor */
 	Size		keysize;		/* hash key length in bytes */
 	Size		entrysize;		/* total user element size in bytes */
 	HashValueFunc hash;			/* hash function */
@@ -83,7 +82,6 @@ typedef struct HASHCTL
 #define HASH_PARTITION	0x0001	/* Hashtable is used w/partitioned locking */
 #define HASH_SEGMENT	0x0002	/* Set segment size */
 #define HASH_DIRSIZE	0x0004	/* Set directory size (initial and max) */
-#define HASH_FFACTOR	0x0008	/* Set fill factor */
 #define HASH_ELEM		0x0010	/* Set keysize and entrysize */
 #define HASH_BLOBS		0x0020	/* Select support functions for binary keys */
 #define HASH_FUNCTION	0x0040	/* Set user defined hash function */
-- 
2.20.1

Reply via email to