Re: POC, WIP: OR-clause support for indexes

Alena Rybakina Mon, 31 Jul 2023 09:38:50 -0700

Hi!

I think it really helps to speed up the search with similar deep
filtering compared to cluster indexes, but do we have cases where we
don't use this algorithm because it takes longer than an usual index?
I thought about the situation with wide indexes (with a lot of multiple
columns) and having a lot of filtering predicates for them.

I think that it should be possible for the optimizer to only use
multi-column SAOP index paths when there is at least likely to be some
small advantage -- that's definitely my goal. Importantly, we may not
really need to accurately model the costs where the new approach turns
out to be much faster. The only essential thing is that we avoid cases
where the new approach is much slower than the old approach. Which is
possible (in at least some cases) by making the runtime behavior
adaptive.


The best decision that the planner can make may be no decision at all.
Better to wait until runtime where at all possible, since that gives
us the latest and most accurate picture of things.

But I'm not sure about this, so it seems to me that this is a problem of
improper use of indexes rather.

It's hard to say how true that is.

Certainly, workloads similar to the TPC-DS benchmark kinda need
something like MDAM. It's just not practical to have enough indexes to
support every possible query -- the benchmark is deliberately designed
to have unpredictable, hard-to-optimize access paths. It seems to
require having fewer, more general indexes that can support
multi-dimensional access reasonably efficiently.

Of course, with OLTP it's much more likely that the workload will have
predictable access patterns. That makes having exactly the right
indexes much more practical. So maybe you're right there. But, I still
see a lot of value in a design that is as forgiving as possible. Users
really like that kind of thing in my experience.

I tend to agree with you, but a runtime estimate cannot give us anaccurate picture when using indexes correctly orany other optimizations due to the unstable state of the environment inwhich the query is executed.

I believe that a more complex analysis is needed here.

Currently, the optimizer doesn't recognize multi-column indexes with
SAOPs on every column as having a valid sort order, except on the
first column. It seems possible that that has consequences for your
patch. (I'm really only guessing, though; don't trust anything that I
say about the optimizer too much.)

Honestly, I couldn't understand your concerns very well, could you
describe it in more detail?

Well, I'm not sure if there is any possible scenario where the
transformation from your patch makes it possible to go from an access
path that has a valid sort order (usually because there is an
underlying index scan) into an access path that doesn't. In fact, the
opposite situation seems more likely (which is good news) --
especially if you assume that my own patch is also present.

Going from a bitmap OR (which cannot return sorted output) to a
multi-column SAOP index scan (which now can) may have significant
value in some specific circumstances. Most obviously, it's really
useful when it enables us to feed tuples into a GroupAggregate without
a separate sort step, and without a hash aggregate (that's why I see
value in combining your patch with my own patch). You just need to be
careful about allowing the opposite situation to take place.

More generally, there is a need to think about strange second order
effects. We want to be open to useful second order effects that make
query execution much faster in some specific context, while avoiding
harmful second order effects. Intuitively, I think that it should be
possible to do this with the transformations performed by your patch.

In other words, "helpful serendipity" is an important advantage, while
"harmful anti-serendipity" is what we really want to avoid. Ideally by
making the harmful cases impossible "by construction".

I noticed only one thing there: when we have unsorted array values inSOAP, the query takes longer thanwhen it has a sorted array. I'll double-check it just in case and writeabout the results later.


I am also testing some experience with multi-column indexes using SAOPs.

--
Regards,
Alena Rybakina
Postgres Professional

Re: POC, WIP: OR-clause support for indexes

Reply via email to