On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs <simon.ri...@enterprisedb.com> wrote: > > Hash index pages are stored in sorted order, but we don't prepare the > data correctly. > > We sort the data as the first step of a hash index build, but we > forget to sort the data by hash as well as by hash bucket. >
I was looking into the nearby comments (Fetch hash keys and mask off bits we don't want to sort by.) and it sounds like we purposefully don't want to sort by the hash key. I see that this comment was originally introduced in the below commit: commit 4adc2f72a4ccd6e55e594aca837f09130a6af62b Author: Tom Lane <t...@sss.pgh.pa.us> Date: Mon Sep 15 18:43:41 2008 +0000 Change hash indexes to store only the hash code rather than the whole indexed value. But even before that, we seem to mask off the bits before comparison. Is it that we are doing so because we want to keep the order of hash keys in a particular bucket so such masking was required? If so, then maybe it is okay to compare the hash keys as you are proposing once we find that the values fall in a particular bucket. Another thing to note is that this code was again changed in ea69a0dead but it seems to be following the intent of the original code. Few comments on the patch: 1. I think it is better to use DatumGetUInt32 to fetch the hash key as the nearby code is using. 2. You may want to change the below comment in HSpool /* * We sort the hash keys based on the buckets they belong to. Below masks * are used in _hash_hashkey2bucket to determine the bucket of given hash * key. */ > > Aside: > > I'm not very sure why tuplesort has private code in it dedicated to > hash indexes, but it does. > Are you talking about tuplesort_begin_index_hash/comparetup_index_hash? I see the corresponding functions for btree as well in that file. > Even more strangely, the Tuplesortstate fixes the size of max_buckets > at tuplesort_begin() time rather than tuplesort_performsort(), forcing > us to estimate the number of tuples ahead of time rather than using > the exact number. Next trick would be to alter the APIs to allow exact > values to be used for sorting, which would allow page at a time > builds. > It is not clear to me what exactly you want to do here but maybe it is a separate topic and we should discuss this separately. -- With Regards, Amit Kapila.