Re: Use extended statistics to estimate (Var op Var) clauses

Mark Dilger Wed, 11 Aug 2021 07:53:05 -0700

> On Aug 11, 2021, at 5:08 AM, Dean Rasheed <dean.a.rash...@gmail.com> wrote:
> 
> This feels like rather an artificial example though. Is there any real
> use for this sort of clause?

The test generated random combinations of clauses and then checked if any had 
consistently worse performance.  These came up.  I don't know that they 
represent anything real.

What was not random in the tests was the data in the tables.  I've gotten 
curious if these types of clauses (with columns compared against themselves) 
would still be bad for random rather than orderly data sets.  I'll go check....

   testing....

Wow.  Randomizing the data makes the problems even more extreme.  It seems my 
original test set was actually playing to this patch's strengths, not its 
weaknesses.  I've changed the columns to double precision and filled the 
columns with random() data, where column1 gets random()^1, column2 gets 
random()^2, etc.  So on average the larger numbered columns will be smaller, 
and the mcv list will be irrelevant, since values should not tend to repeat.

Over all queries, 47791 have better estimates after the patch, but 34802 had 
worse estimates after the patch (with the remaining 17407 queries having 
roughly equal quality).

The worst estimates are still ones that have a column compared to itself:

better:0, worse:33:  A <= B or A <= A or A <= A
better:0, worse:33:  A <= B or A = A or not A <> A
better:0, worse:33:  A <= B or A >= A or not A <> A
better:0, worse:33:  A <> B or A <= A
better:0, worse:33:  A <> B or A <= A or A <> A
better:0, worse:33:  A <> B or A <= A or A >= A
better:0, worse:33:  A <> B or A <= A or not A = A
better:0, worse:33:  A <> B or A > A or not A < A
better:0, worse:33:  A <> B or A >= A
better:0, worse:33:  A <> B or A >= A and A <= A
better:0, worse:33:  A = B or not A > A or not A > A
better:0, worse:33:  A >= B or not A <> A or A = A
better:0, worse:39:  B <= A or B <= B or B <= B
better:0, worse:39:  B <= A or B = B or not B <> B
better:0, worse:39:  B <= A or B >= B or not B <> B
better:0, worse:39:  B <> A or B <= B
better:0, worse:39:  B <> A or B <= B or B <> B
better:0, worse:39:  B <> A or B <= B or B >= B
better:0, worse:39:  B <> A or B <= B or not B = B
better:0, worse:39:  B <> A or B > B or not B < B
better:0, worse:39:  B <> A or B >= B
better:0, worse:39:  B <> A or B >= B and B <= B
better:0, worse:39:  B = A or not B > B or not B > B
better:0, worse:39:  B >= A or not B <> B or B = B

But there are plenty that got worse without that, such as the following 
examples:

better:25, worse:39:  A < B and A < B or B > A
better:10, worse:48:  A < B and A < C
better:10, worse:54:  A < B and A < C or C > A

I'll go test random data designed to have mcv lists of significance....

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Re: Use extended statistics to estimate (Var op Var) clauses

Reply via email to