Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

Heikki Linnakangas Sat, 08 Jul 2023 14:57:40 -0700

On 02/07/2023 19:09, Tomas Vondra wrote:

Here's an updated version of the patch series.


I've polished and pushed the first three patches with cleanup, tests to
improve test coverage and so on. I chose not to backpatch those - I
planned to do that to make future backpatches simpler, but the changes
ended up less disruptive than expected.

The remaining patches are just about adding SK_SEARCHARRAY to BRIN.

0001 - adds the optional preprocess procedure, calls it from brinrescan

0002 to 0005 - adds the support to the existing BRIN opclasses

Could you implement this completely in the consistent-function, bycaching the sorted array in fn_extra, without adding the new preprocessprocedure? On first call, when fn_extra == NULL, sort the array andstash it in fn_extra.

I don't think that works, because fn_extra isn't reset when the scankeys change on rescan. We could reset it, and document that you can usefn_extra for per-scankey caching. There's some precedence for notresetting it though, see commit d22a09dc70f. But we could provide anopaque per-scankey scratch space like that somewhere else. In BrinDesc,perhaps.

The new preprocess support function feels a bit too inflexible to me.True, you can store whatever you want in the ScanKey that it returns,but since that's the case, why not just make it void * ?It seems thatthe constraint here was that you need to pass a ScanKey to theconsistent function, because the consistent function's signature is whatit is. But we can change the signature, if we introduce a new supportamproc number for it.

The main open question I have is what exactly does it mean that the
procedure is optional. In particular, should it be supported to have a
BRIN opclass without the "preprocess" procedure but using the other
built-in support procedures?

For example, imagine you have a custom BRIN opclass in an extension (for
a custom data type or something). This does not need to implement any
procedures, it can just call the existing built-in ones. Of course, this
won't get the "preprocess" procedure automatically.

Should we support such opclasses or should we force the extension to be
updated by adding a preprocess procedure? I'd say "optional" means we
should support (otherwise it'd not really optional).

The reason why this matters is that "amsearcharray" is AM-level flag,
but the support procedure is defined by the opclass. So the consistent
function needs to handle SK_SEARCHARRAY keys both with and without
preprocessing.

That's mostly what I did for all existing BRIN opclasses (it's a bit
confusing that opclass may refer to both the "generic" minmax or the
opclass defined for a particular data type). All the opclasses now
handle three cases:

1) scalar keys (just like before, with amsearcharray=fase)

2) array keys with preprocessing (sorted array, array of hashes, ...)

3) array keys without preprocessing (for compatibility with old
    opclasses missing the optional preprocess procedure)

The current code is a bit ugly, because it duplicates a bunch of code,
because the option (3) mostly does (1) in a loop. I'm confident this can
be reduced by refactoring and reusing some of the "shared" code.

The question is if my interpretation of what "optional" procedure means
is reasonable. Thoughts?

The other thing is how to test this "compatibility" code. I assume we
want to have the procedure for all built-in opclasses, so that won't
exercise it. I did test it by temporarily removing the procedure from a
couple pg_amproc.dat entries. I guess creating a custom opclass in the
regression test is the right solution.

It would be unpleasant to force all BRIN opclasses to immediatelyimplement the searcharray-logic. If we don't want to do that, we need toimplement generic SK_SEARCHARRAY handling in BRIN AM itself. That wouldbe pretty easy, right? Just call the regular consistent function forevery element in the array.

If an opclass wants to provide a faster/better implementation, it canprovide a new consistent support procedure that supports that. Let'sassign a new amproc number for new-style consistent function, which mustsupport SK_SEARCHARRAY, and pass it some scratch space where it cancache whatever per-scankey data. Because it gets a new amproc number, wecan define the arguments as we wish. We can pass a pointer to theper-scankey scratch space as a new argument, for example.

We did this backwards-compatibility dance with the 3/4-argument variantsof the current consistent functions. But I think assigning a whole newprocedure number is better than looking at the number of arguments.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

Reply via email to