Re: optimize lookups in snapshot [sub]xip arrays

Zhang Mingli Sat, 23 Jul 2022 21:48:54 -0700

Hi, all


> 
>               if (!snapshot->suboverflowed)
>               {
>                       /* we have full data, so search subxip */
> -                     int32           j;
> -
> -                     for (j = 0; j < snapshot->subxcnt; j++)
> -                     {
> -                             if (TransactionIdEquals(xid, 
> snapshot->subxip[j]))
> -                                     return true;
> -                     }
> +                     if (XidInXip(xid, snapshot->subxip, snapshot->subxcnt,
> +                                              &snapshot->subxiph))
> +                             return true;
>  
>                       /* not there, fall through to search xip[] */
>               }


If snaphost->suboverflowed is  false then the subxcnt must be less than 
PGPROC_MAX_CACHED_SUBXIDS which is 64 now.

And we won’t use hash if the xcnt is less than XIP_HASH_MIN_ELEMENTS which is 
128 currently during discussion.

So that, subxid’s hash table will never be used, right?

Regards,

Zhang Mingli


> On Jul 14, 2022, at 01:09, Nathan Bossart <[email protected]> wrote:
> 
> Hi hackers,
> 
> A few years ago, there was a proposal to create hash tables for long
> [sub]xip arrays in snapshots [0], but the thread seems to have fizzled out.
> I was curious whether this idea still showed measurable benefits, so I
> revamped the patch and ran the same test as before [1].  Here are the
> results for 60₋second runs on an r5d.24xlarge with the data directory on
> the local NVMe storage:
> 
>     writers  HEAD  patch  diff
>    ----------------------------
>     16       659   664    +1%
>     32       645   663    +3%
>     64       659   692    +5%
>     128      641   716    +12%
>     256      619   610    -1%
>     512      530   702    +32%
>     768      469   582    +24%
>     1000     367   577    +57%
> 
> As before, the hash table approach seems to provide a decent benefit at
> higher client counts, so I felt it was worth reviving the idea.
> 
> The attached patch has some key differences from the previous proposal.
> For example, the new patch uses simplehash instead of open-coding a new
> hash table.  Also, I've bumped up the threshold for creating hash tables to
> 128 based on the results of my testing.  The attached patch waits until a
> lookup of [sub]xip before generating the hash table, so we only need to
> allocate enough space for the current elements in the [sub]xip array, and
> we avoid allocating extra memory for workloads that do not need the hash
> tables.  I'm slightly worried about increasing the number of memory
> allocations in this code path, but the results above seemed encouraging on
> that front.
> 
> Thoughts?
> 
> [0] https://postgr.es/m/35960b8af917e9268881cd8df3f88320%40postgrespro.ru
> [1] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e%402ndquadrant.com
> 
> -- 
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
> <v1-0001-Optimize-lookups-in-snapshot-transactions-in-prog.patch>

Re: optimize lookups in snapshot [sub]xip arrays

Reply via email to