Re: POC, WIP: OR-clause support for indexes

Alena Rybakina Thu, 03 Aug 2023 12:48:04 -0700

On 02.08.2023 21:58, Peter Geoghegan wrote:

On Wed, Aug 2, 2023 at 8:58 AM Alena Rybakina <[email protected]> wrote:

No, I haven't thought about it yet. I studied the example and it would
really be nice to add optimization here. I didn't notice any problems
with its implementation. I also have an obvious example with the "or"
operator, for example
, select * from multi_test, where (a, b ) = ( 1, 1 ) or (a, b ) = ( 2, 1
) ...;


Although I think such a case will be used less often.

Right. As I said, I don't particularly care about the row constructor
syntax -- it's not essential.

In my experience patches like this one that ultimately don't succeed
usually *don't* have specific problems that cannot be fixed. The real
problem tends to be ambiguity about the general high level design. So
more than anything else, ambiguity is the thing that you need to
minimize to be successful here. This is the #1 practical problem, by
far. This may be the only thing about your patch that I feel 100% sure
of.

In my experience it can actually be easier to expand the scope of a
project, and to come up with a more general solution:

https://en.wikipedia.org/wiki/Inventor%27s_paradox

I'm not trying to make your work more difficult by expanding its
scope. I'm actually trying to make your work *easier* by expanding its
scope. I don't claim to know what the specific scope of your patch
should be at all. Just that it might be possible to get a much clearer
picture of what the ideal scope really is by *trying* to generalize it
further -- that understanding is what we lack right now. Even if this
exercise fails in some way, it won't really have been a failure. The
reasons why it fails will still be interesting and practically
relevant.

As I explained to Jim, I am trying to put things in this area on a
more rigorous footing. For example, I have said that the way that the
nbtree code executes SAOP quals is equivalent to DNF. That is
basically true, but it's also my own slightly optimistic
interpretation of history and of the design. That's a good start, but
it's not enough on its own.

My interpretation might still be wrong in some subtle way, that I have
yet to discover. That's really what I'm concerned about with your
patch, too. I'm currently trying to solve a problem that I don't yet
fully understand, so for me "getting a properly working flow of
information" seems like a good practical exercise. I'm trying to
generalize the design of my own patch as far as I can, to see what
breaks, and why it breaks. My intuition is that this will help me with
my own patch by forcing me to gain a truly rigorous understanding of
the problem.

My suggestion about generalizing your approach to cover RowCompareExpr
cases is what I would do, if I were you, and this was my patch. That's
almost exactly what I'm doing with my own patch already, in fact.

It's all right. I understand your position)

I also agree to try to find other optimization cases and generalize them.

I read the wiki article, and as I understand it, in such a situation wesee a difficult problem with finding expressions that need to beconverted into a logically correct expression and simplify execution forthe executor. For example, this is a ROW type. It can have a simplerexpression with AND and OR operations, besides we can excludeduplicates. But some of these transformations may be incorrect or theywill have a more complex representation. We can try to find theproblematic expressions and try to combine them into groups and finallyfind a solutions for each groups or, conversely, discover that theexisting transformation is uncorrected. If I understand correctly, weshould first start searching for "ROW" expressions (define a group forthem) and think about a solution for the group.

It seems to me the most difficult thing is to notice problematic cases
where the transformations are incorrect, but I think it can be implemented.

Right. To be clear, I am sure that it won't be practical to come up
with a 100% theoretically pure approach. If for no other reason than
this: normalizing to CNF in all cases will run into problems with very
complex predicates. It might even be computationally intractable
(could just be very slow). So there is a clear need to keep
theoretical purity in balance with practical considerations. There is
a need for some kind of negotiation between those two things. Probably
some set of heuristics will ultimately be required to keep costs and
benefits in balance.

I don't expect you to determine what set of heuristics will ultimately
be required to determine when and how to perform CNF conversions, in
the general case. But having at least some vague idea seems like it
might increase confidence in your design.

I agree, but I think this will be the second step after solutions arefound.

Do you think that this problem is just an accidental side-effect? It
isn't necessarily the responsibility of your patch to fix such things.
If it's even possible for selectivity estimates to change, then it's
already certain that sometimes they'll be worse than before -- if only
because of chance interactions. The optimizer is often right for the
wrong reasons, and wrong for the right reasons -- we cannot really
expect any patch to completely avoid problems like that.

To be honest, I tried to fix it many times by calling the function tocalculate selectivity, and each time the result of the estimate did notchange. I didn't have any problems in this part after moving thetransformation to the parsing stage. I even tried to perform thistransformation at the planning stage (to the preprocess_qual_conditionsfunction), but I ran into the same problem there as well.

To tell the truth, I think I'm ready to investigate this problem again(maybe I'll be able to see it differently or really find that I missedsomething in previous times).


--
Regards,
Alena Rybakina
Postgres Professional

Re: POC, WIP: OR-clause support for indexes

Reply via email to