On Tue, 6 Jan 2026 at 22:00, Heikki Linnakangas <[email protected]> wrote:
>
> On 05/01/2026 17:01, David Geier wrote:
> > The script I used for testing is attached. I ran CREATE INDEX three
> > times and took the fastest run. I'm getting the following results on my
> > i9-13950HX dev laptop in release build:
> >
> > Data set            | Patched (ms) | Master (ms)  | Speedup
> > --------------------|--------------|--------------|----------
> > movies(plot)        |   3,409      |  10,311      | 3.02x
> > lineitem(l_comment) | 161,569      | 256,986      | 1.59x
> >
>
> Impressive speedup!
>
> > The attached patches do the following:
> >
> > - v1-0001-Inline-ginCompareAttEntries.patch: Inline
> > ginCompareAttEntries() which is very frequently called by the GIN code.
>
> Looks good.
>
> > - v1-0002-Optimized-comparison-functions.patch: Use FunctionCallInvoke()
> > instead of FunctionCall2Coll(). This saves a bunch of per-comparison
> > setup code, such as calling InitFunctionCallInfoData().
>
> You lose the check for NULL result with this. That's probably still
> worth checking.
>
> > - v1-0003-Use-sort_template.h.patch: Use sort_template.h instead of
> > qsort(), to inline calls to the sort comparator. This is an interim step
> > that is further improved on by patch 0006.
>
> ok
>
> > - v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
> > ginExtractEntries() deduplicates and sorts the entries returned from the
> > extract value function. In case of pg_trgm, that is completely redundant
> > because the trigrams are already deduplicated and sorted. The current
> > version of this patch is just to demonstrate the potential. We need to
> > think about what we want here. Ideally, we would require the extraction
> > function to provide the entries deduplicated and sorted. Alternatively,
> > we could indicate to ginExtractEntries() if the entries are already
> > deduplicated and sorted. If we don't want to alter the signature of the
> > extract value function, we could e.g. use the MSB of the nentries argument.
>
> Yeah, this seems wrong as it is. You're assuming that if the extract
> function returns nullFlags == NULL, the array is already sorted and deduped.
>
> > - v1-0005-Make-btint4cmp-branchless.patch: Removes branches from
> > btint4cmp(), which is heavily called from the GIN code. This might as
> > well have benefit in other parts of the code base.
>
> Seems reasonable.
>
> > v1-0006-Use-radix-sort.patch: Replace the sort_template.h-based qsort()
> > with radix sort. For the purpose of demonstrating the possible gains,
> > I've only replaced the signed variant for now. I've also tried using
> > simplehash.h for deduplicating followed by a sort_template.h-based sort.
> > But that was slower.
>
> Ok.
>
> > v1-0007-Faster-qunique-comparator.patch: qunique() doesn't require a
> > full sort comparator (-1 = less, 0 = equal, 1 = greater) but only a
> > equal/unequal comparator (e.g. 0 = equal and 1 = unequal). The same
> > optimization can be done in plenty of other places in our code base.
> > Likely, in most of them the gains are insignificant.
>
> Makes sense. I'm a little disappointed the compiler won't do that
> optimization for us..
>
> Perhaps we should introduce a new qunique_eq() function with a different
> callback signature:
>
> /* like qunique(), but the callback function returns true/false rather
> than int */
> static inline size_t
> qunique_eq(void *array, size_t elements, size_t width,
>                 bool (*equal) (const void *, const void *))
>
> > v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch: Typically lots
> > of text is actually ASCII. Hence, we provide a fast path for this case
> > which is exercised if the MSB of the current character is unset.
>
> This uses pg_ascii_tolower() when for ASCII characters when built with
> the IGNORECASE. I don't think that's correct, if the proper collation
> would do something more complicated for than what pg_ascii_tolower() does.
>
> Did you measure how big is the impact from each individual patch?
> Patches 1 and 2 seem pretty much ready to be committed, but I wonder if
> they make any difference on their own.
>
> - Heikki
>
>
>


Hi!
patches 0003, 0004 & 0008 applies with whitespace errors.


reshke@yezzey-cbdb-bench:~/pgpure$ git am v1-0003-Use-sort_template.h.patch
Applying: Use sort_template.h
.git/rebase-apply/patch:66: trailing whitespace.

warning: 1 line adds whitespace errors.
reshke@yezzey-cbdb-bench:~/pgpure$ git am
v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
Applying: Avoid dedup and sort in ginExtractEntries
.git/rebase-apply/patch:30: trailing whitespace.
{
warning: 1 line adds whitespace errors.
reshke@yezzey-cbdb-bench:~/pgpure$ git am
v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch
Applying: Add ASCII fastpath to generate_trgm_only()
.git/rebase-apply/patch:101: trailing whitespace.
else
warning: 1 line adds whitespace errors.


I did run benchmarks of my VM using your data. v1-0001 solely improves
runtime by 4-5%. v2-0002 does not show runtime improvement for me.
With parallel GIN build, performance gains are 2-3 % for 2 workers and
not noticeable after that.

I will try to run some more benchmarks on this.

-- 
Best regards,
Kirill Reshke


Reply via email to