On Mon, May 2, 2022 at 9:28 PM Simon Riggs <simon.ri...@enterprisedb.com> wrote: > > On Sat, 30 Apr 2022 at 12:12, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs > > <simon.ri...@enterprisedb.com> wrote: > > > > > > Hash index pages are stored in sorted order, but we don't prepare the > > > data correctly. > > > > > > We sort the data as the first step of a hash index build, but we > > > forget to sort the data by hash as well as by hash bucket. > > > > > > > I was looking into the nearby comments (Fetch hash keys and mask off > > bits we don't want to sort by.) and it sounds like we purposefully > > don't want to sort by the hash key. I see that this comment was > > originally introduced in the below commit: > > > > commit 4adc2f72a4ccd6e55e594aca837f09130a6af62b > > Author: Tom Lane <t...@sss.pgh.pa.us> > > Date: Mon Sep 15 18:43:41 2008 +0000 > > > > Change hash indexes to store only the hash code rather than the > > whole indexed > > value. > > > > But even before that, we seem to mask off the bits before comparison. > > Is it that we are doing so because we want to keep the order of hash > > keys in a particular bucket so such masking was required? > > We need to sort by both hash bucket and hash value. > > Hash bucket id so we can identify the correct hash bucket to insert into. > > But then on each bucket/overflow page we store it sorted by hash value > to make lookup faster, so inserts go faster if they are also sorted. >
I also think so. So, we should go with this unless someone else sees any flaw here. -- With Regards, Amit Kapila.