On 02/07/2023 19:09, Tomas Vondra wrote:
Here's an updated version of the patch series.

I've polished and pushed the first three patches with cleanup, tests to
improve test coverage and so on. I chose not to backpatch those - I
planned to do that to make future backpatches simpler, but the changes
ended up less disruptive than expected.

The remaining patches are just about adding SK_SEARCHARRAY to BRIN.

0001 - adds the optional preprocess procedure, calls it from brinrescan

0002 to 0005 - adds the support to the existing BRIN opclasses

Could you implement this completely in the consistent-function, by caching the sorted array in fn_extra, without adding the new preprocess procedure? On first call, when fn_extra == NULL, sort the array and stash it in fn_extra.

I don't think that works, because fn_extra isn't reset when the scan keys change on rescan. We could reset it, and document that you can use fn_extra for per-scankey caching. There's some precedence for not resetting it though, see commit d22a09dc70f. But we could provide an opaque per-scankey scratch space like that somewhere else. In BrinDesc, perhaps.

The new preprocess support function feels a bit too inflexible to me. True, you can store whatever you want in the ScanKey that it returns, but since that's the case, why not just make it void * ?It seems that the constraint here was that you need to pass a ScanKey to the consistent function, because the consistent function's signature is what it is. But we can change the signature, if we introduce a new support amproc number for it.

The main open question I have is what exactly does it mean that the
procedure is optional. In particular, should it be supported to have a
BRIN opclass without the "preprocess" procedure but using the other
built-in support procedures?

For example, imagine you have a custom BRIN opclass in an extension (for
a custom data type or something). This does not need to implement any
procedures, it can just call the existing built-in ones. Of course, this
won't get the "preprocess" procedure automatically.

Should we support such opclasses or should we force the extension to be
updated by adding a preprocess procedure? I'd say "optional" means we
should support (otherwise it'd not really optional).

The reason why this matters is that "amsearcharray" is AM-level flag,
but the support procedure is defined by the opclass. So the consistent
function needs to handle SK_SEARCHARRAY keys both with and without
preprocessing.

That's mostly what I did for all existing BRIN opclasses (it's a bit
confusing that opclass may refer to both the "generic" minmax or the
opclass defined for a particular data type). All the opclasses now
handle three cases:

1) scalar keys (just like before, with amsearcharray=fase)

2) array keys with preprocessing (sorted array, array of hashes, ...)

3) array keys without preprocessing (for compatibility with old
    opclasses missing the optional preprocess procedure)

The current code is a bit ugly, because it duplicates a bunch of code,
because the option (3) mostly does (1) in a loop. I'm confident this can
be reduced by refactoring and reusing some of the "shared" code.

The question is if my interpretation of what "optional" procedure means
is reasonable. Thoughts?

The other thing is how to test this "compatibility" code. I assume we
want to have the procedure for all built-in opclasses, so that won't
exercise it. I did test it by temporarily removing the procedure from a
couple pg_amproc.dat entries. I guess creating a custom opclass in the
regression test is the right solution.

It would be unpleasant to force all BRIN opclasses to immediately implement the searcharray-logic. If we don't want to do that, we need to implement generic SK_SEARCHARRAY handling in BRIN AM itself. That would be pretty easy, right? Just call the regular consistent function for every element in the array.

If an opclass wants to provide a faster/better implementation, it can provide a new consistent support procedure that supports that. Let's assign a new amproc number for new-style consistent function, which must support SK_SEARCHARRAY, and pass it some scratch space where it can cache whatever per-scankey data. Because it gets a new amproc number, we can define the arguments as we wish. We can pass a pointer to the per-scankey scratch space as a new argument, for example.

We did this backwards-compatibility dance with the 3/4-argument variants of the current consistent functions. But I think assigning a whole new procedure number is better than looking at the number of arguments.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to