Changeset: beb7298583cc for MonetDB
URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=beb7298583cc
Modified Files:
        gdk/gdk_search.c
Branch: Oct2014
Log Message:

BAThash(): avoid potential 4x over allocation for dynamic hash mask

In case we choose to set the hash mask size dynamically accoding to the
number of hash collisions, the old code could (and did in case of TPCH)
overallocate a hash mask by (up to) 4x.
This happend (theoretically) in 1/3 of all cases, i.e., whenever
HASHmask() chooses a hash mask that is (just) smaller than cnt.
Then, HASHmask(cnt>>6) is (just) smaller than cnt/64,
and quadrupeling the mask 3 times (4*4*4=64) would still be (just) < cnt,
leading to a 4th quadrupeling, and thus a mask that is just smaller then 4*cnt.

We now start with mask = HASHmask(cnt)>>6, and ensure the mask does not
exceed HASHmask(cnt).

OPEN / TODO:
In case of "too many" (?) collisions with HASHmask(cnt) < cnt,
should we consider growing the mask size to 2*HASHmask(cnt),
i.e., HASHmask(cnt) < cnt < 2*HASHmask(cnt) < 2*cnt ?


diffs (38 lines):

diff --git a/gdk/gdk_search.c b/gdk/gdk_search.c
--- a/gdk/gdk_search.c
+++ b/gdk/gdk_search.c
@@ -252,7 +252,7 @@ BAThash(BAT *b, BUN masksize)
        if (b->H->hash == NULL) {
                unsigned int tpe = ATOMbasetype(b->htype);
                BUN cnt = BATcount(b);
-               BUN mask;
+               BUN mask, maxmask = 0;
                BUN p = BUNfirst(b), q = BUNlast(b), r;
                Hash *h = NULL;
                Heap *hp = NULL;
@@ -287,11 +287,12 @@ BAThash(BAT *b, BUN masksize)
                        mask = HASHmask(cnt);
                } else {
                        /* dynamic hash: we start with
-                        * HASHmask(cnt/64); if there are too many
-                        * collisions we try HASHmask(cnt/16), then
-                        * HASHmask(cnt/4), and finally
+                        * HASHmask(cnt)/64; if there are too many
+                        * collisions we try HASHmask(cnt)/16, then
+                        * HASHmask(cnt)/4, and finally
                         * HASHmask(cnt).  */
-                       mask = HASHmask(cnt >> 6);
+                       maxmask = HASHmask(cnt);
+                       mask = maxmask >> 6;
                        p += (cnt >> 2);        /* try out on first 25% of b */
                        if (p > q)
                                p = q;
@@ -368,7 +369,7 @@ BAThash(BAT *b, BUN masksize)
                                }
                                break;
                        }
-               } while (r < p && mask < cnt && (mask <<= 2));
+               } while (r < p && mask < maxmask && (mask <<= 2));
 
                /* finish the hashtable with the current mask */
                p = r;
_______________________________________________
checkin-list mailing list
checkin-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/checkin-list

Reply via email to