Re: Hash index build performance tweak from sorting

Amit Kapila Wed, 04 May 2022 03:28:01 -0700

On Mon, May 2, 2022 at 9:28 PM Simon Riggs <[email protected]> wrote:
>
> On Sat, 30 Apr 2022 at 12:12, Amit Kapila <[email protected]> wrote:
> >
> > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs
> > <[email protected]> wrote:
> > >
> > > Hash index pages are stored in sorted order, but we don't prepare the
> > > data correctly.
> > >
> > > We sort the data as the first step of a hash index build, but we
> > > forget to sort the data by hash as well as by hash bucket.
> > >
> >
> > I was looking into the nearby comments (Fetch hash keys and mask off
> > bits we don't want to sort by.) and it sounds like we purposefully
> > don't want to sort by the hash key. I see that this comment was
> > originally introduced in the below commit:
> >
> > commit 4adc2f72a4ccd6e55e594aca837f09130a6af62b
> > Author: Tom Lane <[email protected]>
> > Date:   Mon Sep 15 18:43:41 2008 +0000
> >
> >     Change hash indexes to store only the hash code rather than the
> > whole indexed
> >     value.
> >
> > But even before that, we seem to mask off the bits before comparison.
> > Is it that we are doing so because we want to keep the order of hash
> > keys in a particular bucket so such masking was required?
>
> We need to sort by both hash bucket and hash value.
>
> Hash bucket id so we can identify the correct hash bucket to insert into.
>
> But then on each bucket/overflow page we store it sorted by hash value
> to make lookup faster, so inserts go faster if they are also sorted.
>


I also think so. So, we should go with this unless someone else sees
any flaw here.

-- 
With Regards,
Amit Kapila.

Re: Hash index build performance tweak from sorting

Reply via email to