Re: Use extended statistics to estimate (Var op Var) clauses

2022-10-12 Thread Michael Paquier
On Fri, Jul 22, 2022 at 02:17:47PM +0100, Dean Rasheed wrote: > It's a worry that none of the existing regression tests picked up on > that. Perhaps a similar test could be added using the existing test > data. Otherwise, I think it'd be worth adding a new test with similar > data to the above.

Re: Use extended statistics to estimate (Var op Var) clauses

2022-07-22 Thread Dean Rasheed
On Thu, 21 Jul 2022 at 12:42, Tomas Vondra wrote: > > > This needs to account for nullfrac, since x = x is only true if x is not > > null. > > Right, I forgot to account for nullfrac. > Ditto variable <= variable > > I don't like how matching_restriction_variables() is adding a > > non-trivial

Re: Use extended statistics to estimate (Var op Var) clauses

2022-07-09 Thread Dean Rasheed
On Mon, 13 Dec 2021 at 02:21, Tomas Vondra wrote: > > Hi, > > I finally got around to this patch again, focusing mostly on the first > part that simply returns either 1.0 or 0.0 for Var op Var conditions > (i.e. the part not really using extended statistics). > Just starting to look at this

Re: Use extended statistics to estimate (Var op Var) clauses

2021-12-21 Thread Mark Dilger
> On Dec 21, 2021, at 4:28 PM, Mark Dilger wrote: > > Maybe there is some reason this is ok. ... and there is. Sorry for the noise. The planner appears to be smart enough to know that column "salary" is not being changed, and therefore NEW.salary and OLD.salary are equal. If I test a

Re: Use extended statistics to estimate (Var op Var) clauses

2021-12-21 Thread Mark Dilger
> On Dec 12, 2021, at 6:21 PM, Tomas Vondra > wrote: > > <0001-Improve-estimates-for-Var-op-Var-with-the-same-Var.patch> +* It it's (variable = variable) with the same variable on both sides, it's s/It it's/If it's/ 0001 lacks regression coverage. > <0002-simplification.patch>

Re: Use extended statistics to estimate (Var op Var) clauses

2021-12-12 Thread Zhihong Yu
On Sun, Dec 12, 2021 at 6:04 PM Tomas Vondra wrote: > On 8/31/21 00:14, Zhihong Yu wrote: > > Hi, > > For patch 0002, > > > > + s1 = statext_clauselist_selectivity(root, clauses, > > varRelid, > > + jointype, > > sjinfo,

Re: Use extended statistics to estimate (Var op Var) clauses

2021-12-12 Thread Tomas Vondra
Hi, I finally got around to this patch again, focusing mostly on the first part that simply returns either 1.0 or 0.0 for Var op Var conditions (i.e. the part not really using extended statistics). I have been unhappy about using examine_variable, which does various expensive things like

Re: Use extended statistics to estimate (Var op Var) clauses

2021-12-12 Thread Tomas Vondra
On 8/31/21 00:14, Zhihong Yu wrote: Hi, For patch 0002, +                   s1 = statext_clauselist_selectivity(root, clauses, varRelid, +                                                       jointype, sjinfo, rel, + , false); + +      

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-30 Thread Zhihong Yu
On Mon, Aug 30, 2021 at 9:00 AM Tomas Vondra wrote: > On 8/28/21 6:30 PM, Mark Dilger wrote: > > > > > >> On Aug 28, 2021, at 6:52 AM, Tomas Vondra > >> wrote: > >> > >> Part 0003 fixes handling of those clauses so that we don't treat > >> them as simple, but it does that by tweaking > >>

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-30 Thread Tomas Vondra
On 8/28/21 6:30 PM, Mark Dilger wrote: On Aug 28, 2021, at 6:52 AM, Tomas Vondra wrote: Part 0003 fixes handling of those clauses so that we don't treat them as simple, but it does that by tweaking statext_is_compatible_clause(), as suggested by Dean. Function examine_opclause_args()

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-28 Thread Mark Dilger
> On Aug 28, 2021, at 10:18 AM, Zhihong Yu wrote: > > I wonder if the queries with (a=a) or (a complexity to address. > Has anyone seen such clause in production queries ? You might expect clauses like WHERE FALSE to be unusual, but that phrase gets added quite a lot by query generators.

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-28 Thread Zhihong Yu
On Sat, Aug 28, 2021 at 9:30 AM Mark Dilger wrote: > > > > On Aug 28, 2021, at 6:52 AM, Tomas Vondra > wrote: > > > > Part 0003 fixes handling of those clauses so that we don't treat them as > simple, but it does that by tweaking statext_is_compatible_clause(), as > suggested by Dean. > >

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-28 Thread Mark Dilger
> On Aug 28, 2021, at 6:52 AM, Tomas Vondra > wrote: > > Part 0003 fixes handling of those clauses so that we don't treat them as > simple, but it does that by tweaking statext_is_compatible_clause(), as > suggested by Dean. Function examine_opclause_args() doesn't set issimple to

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-28 Thread Zhihong Yu
On Sat, Aug 28, 2021 at 6:53 AM Tomas Vondra wrote: > Hi, > > The attached patch series is modified to improve estimates for these > special clauses (Var op Var with the same var on both sides) without > extended statistics. This is done in 0001, and it seems fairly simple > and cheap. > > The

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-28 Thread Tomas Vondra
Hi, The attached patch series is modified to improve estimates for these special clauses (Var op Var with the same var on both sides) without extended statistics. This is done in 0001, and it seems fairly simple and cheap. The 0002 part is still the same patch as on 2021/07/20. Part 0003

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-20 Thread Robert Haas
On Fri, Aug 20, 2021 at 3:32 PM Tomas Vondra wrote: > Yeah, I agree this seems like the right approach (except I guess you > meant "a != a" and not "a != 0"). Err, yes. > Assuming we want to do something about > these clauses at all - I'm still wondering if those clauses are common > in

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-20 Thread Tomas Vondra
On 8/20/21 8:56 PM, Robert Haas wrote: On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra wrote: After looking at this for a while, it's clear the main issue is handling of clauses referencing the same Var twice, like for example (a = a) or (a < a). But it's not clear to me if this is something

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-20 Thread Tomas Vondra
On 8/18/21 12:43 PM, Tomas Vondra wrote: Hi Mark, This thread inspired me to do something fairly similar - a generator that generates queries of varying complexity, executes them on table with and without extended statistics. I've been thinking about that before, but this finally pushed me

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-20 Thread Robert Haas
On Fri, Aug 20, 2021 at 2:21 PM Tomas Vondra wrote: > After looking at this for a while, it's clear the main issue is handling > of clauses referencing the same Var twice, like for example (a = a) or > (a < a). But it's not clear to me if this is something worth fixing, or > if extended

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-20 Thread Mark Dilger
> On Aug 20, 2021, at 11:20 AM, Tomas Vondra > wrote: > > I think we can either reject the patch, which would mean we don't consider > (Var op Var) clauses to be common/important enough. Or we need to improve the > existing selectivity functions (even those without extended statistics) to

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-20 Thread Tomas Vondra
On 8/18/21 3:16 PM, Mark Dilger wrote: On Aug 18, 2021, at 3:43 AM, Tomas Vondra wrote: I've pushed everything (generator and results) to this github repo Thanks for the link. I took a very brief look. Perhaps we can combine efforts. I need to make progress on several other patches

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-18 Thread Mark Dilger
> On Aug 18, 2021, at 3:43 AM, Tomas Vondra > wrote: > > I've pushed everything (generator and results) to this github repo Thanks for the link. I took a very brief look. Perhaps we can combine efforts. I need to make progress on several other patches first, but hope to get back to

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-18 Thread Tomas Vondra
Hi Mark, This thread inspired me to do something fairly similar - a generator that generates queries of varying complexity, executes them on table with and without extended statistics. I've been thinking about that before, but this finally pushed me to do that, and some of the results are

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-12 Thread Mark Dilger
> On Aug 11, 2021, at 3:48 PM, Mark Dilger wrote: > > I'm working on a correlated stats test as I write this. I'll get back to you > when I have results. Ok, the tests showed no statistically significant regressions. All tests included the same sorts of whereclause expressions as used in

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Tomas Vondra
On 8/12/21 12:48 AM, Mark Dilger wrote: On Aug 11, 2021, at 3:45 PM, Tomas Vondra wrote: As I said in my last reply, I'm not sure it's particularly useful to look at overall results from data sets with independent columns. That's not what extended statistics are for, and people should not

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Mark Dilger
> On Aug 11, 2021, at 3:45 PM, Tomas Vondra > wrote: > > As I said in my last reply, I'm not sure it's particularly useful to look at > overall results from data sets with independent columns. That's not what > extended statistics are for, and people should not create them in those cases

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Tomas Vondra
On 8/12/21 12:02 AM, Mark Dilger wrote: ... Once the data is made deterministic, the third set looks slightly better than the first, rather than slightly worse. But almost 20% of the query types still look worse after applying the patch. I'm going to dig deeper into those to see if that

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Mark Dilger
> On Aug 11, 2021, at 10:38 AM, Tomas Vondra > wrote: > > So I'm a bit puzzled about the claim that random data make the problems more > extreme. Can you explain? Hmm... you appear to be right. I changed the gentest.pl script to fill the tables with randomized data, but the random data

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Tomas Vondra
On 8/11/21 5:17 PM, Mark Dilger wrote: On Aug 11, 2021, at 7:51 AM, Mark Dilger wrote: I'll go test random data designed to have mcv lists of significance Done. The data for column_i is set to floor(random()^i*20). column_1 therefore is evenly distributed between 0..19, with

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Tomas Vondra
On 8/11/21 4:51 PM, Mark Dilger wrote: On Aug 11, 2021, at 5:08 AM, Dean Rasheed wrote: This feels like rather an artificial example though. Is there any real use for this sort of clause? The test generated random combinations of clauses and then checked if any had consistently worse

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Tomas Vondra
On 8/11/21 4:51 PM, Mark Dilger wrote: On Aug 11, 2021, at 5:08 AM, Dean Rasheed wrote: This feels like rather an artificial example though. Is there any real use for this sort of clause? The test generated random combinations of clauses and then checked if any had consistently worse

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Mark Dilger
> On Aug 11, 2021, at 7:51 AM, Mark Dilger wrote: > > I'll go test random data designed to have mcv lists of significance Done. The data for column_i is set to floor(random()^i*20). column_1 therefore is evenly distributed between 0..19, with successive columns weighted more towards

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Tomas Vondra
On 8/11/21 2:08 PM, Dean Rasheed wrote: On Wed, 11 Aug 2021 at 00:05, Tomas Vondra wrote: So with the statistics, the estimate gets a bit worse. The reason is fairly simple - if you look at the two parts of the OR clause, we get this: clause actualno stats

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Mark Dilger
> On Aug 11, 2021, at 5:08 AM, Dean Rasheed wrote: > > This feels like rather an artificial example though. Is there any real > use for this sort of clause? The test generated random combinations of clauses and then checked if any had consistently worse performance. These came up. I don't

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-11 Thread Dean Rasheed
On Wed, 11 Aug 2021 at 00:05, Tomas Vondra wrote: > > So with the statistics, the estimate gets a bit worse. The reason is > fairly simple - if you look at the two parts of the OR clause, we get this: > > clause actualno statswith stats >

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-10 Thread Tomas Vondra
On 8/9/21 9:19 PM, Mark Dilger wrote: On Jul 20, 2021, at 11:28 AM, Tomas Vondra wrote: Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company <0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch> Hi Tomas, I tested this patch against

Re: Use extended statistics to estimate (Var op Var) clauses

2021-08-09 Thread Mark Dilger
> On Jul 20, 2021, at 11:28 AM, Tomas Vondra > wrote: > > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > <0001-Handling-Expr-op-Expr-clauses-in-extended-stats-20210720.patch> Hi Tomas, I tested this patch against master looking for types of

Re: Use extended statistics to estimate (Var op Var) clauses

2021-07-21 Thread Dean Rasheed
On Tue, 20 Jul 2021 at 19:28, Tomas Vondra wrote: > > > The new code in statext_is_compatible_clause_internal() is a little > > hard to follow because some of the comments aren't right > > I ended up doing something slightly different - examine_opclause_args > now "returns" a list of expressions,

Re: Use extended statistics to estimate (Var op Var) clauses

2021-07-20 Thread Tomas Vondra
Hi, On 7/5/21 2:46 PM, Dean Rasheed wrote: > On Sun, 13 Jun 2021 at 21:28, Tomas Vondra > wrote: >> >> Here is a slightly updated version of the patch >> > > Hi, > > I have looked at this in some more detail, and it all looks pretty > good, other than some mostly cosmetic stuff. > Thanks for

Re: Use extended statistics to estimate (Var op Var) clauses

2021-07-05 Thread Dean Rasheed
On Sun, 13 Jun 2021 at 21:28, Tomas Vondra wrote: > > Here is a slightly updated version of the patch > Hi, I have looked at this in some more detail, and it all looks pretty good, other than some mostly cosmetic stuff. The new code in statext_is_compatible_clause_internal() is a little hard

Re: Use extended statistics to estimate (Var op Var) clauses

2021-06-21 Thread Dean Rasheed
On Sun, 13 Jun 2021 at 21:28, Tomas Vondra wrote: > > Here is a slightly updated version of the patch > > The main issue I ran into > is the special case clauselist_selectivity, which does > > if (list_length(clauses) == 1) > return clause_selectivity_ext(...); > > which applies to

Re: Use extended statistics to estimate (Var op Var) clauses

2021-06-14 Thread Tomas Vondra
On 6/14/21 5:36 PM, Mark Dilger wrote: > > >> On Jun 13, 2021, at 1:28 PM, Tomas Vondra >> wrote: >> >> Here is a slightly updated version of the patch > > Thanks for taking this up again! > > Applying the new test cases from your patch, multiple estimates have > gotten better. That looks

Re: Use extended statistics to estimate (Var op Var) clauses

2021-06-14 Thread Mark Dilger
> On Jun 13, 2021, at 1:28 PM, Tomas Vondra > wrote: > > Here is a slightly updated version of the patch Thanks for taking this up again! Applying the new test cases from your patch, multiple estimates have gotten better. That looks good. I wrote a few extra test cases and saw no

Re: Use extended statistics to estimate (Var op Var) clauses

2021-06-13 Thread Zhihong Yu
On Sun, Jun 13, 2021 at 1:29 PM Tomas Vondra wrote: > Hi, > > Here is a slightly updated version of the patch - rebased to current > master and fixing some minor issues to handle expressions (and not just > the Var nodes as before). > > The changes needed to support (Expr op Expr) are mostly

Re: Use extended statistics to estimate (Var op Var) clauses

2021-06-13 Thread Tomas Vondra
Hi, Here is a slightly updated version of the patch - rebased to current master and fixing some minor issues to handle expressions (and not just the Var nodes as before). The changes needed to support (Expr op Expr) are mostly mechanical, though I'm sure the code needs some cleanup. The

Re: Use extended statistics to estimate (Var op Var) clauses

2021-03-01 Thread Tomas Vondra
On 3/1/21 8:58 PM, Mark Dilger wrote: On Nov 12, 2020, at 5:14 PM, Tomas Vondra wrote: <0001-Support-estimation-of-clauses-of-the-form-V-20201113.patch> Your patch no longer applies. Can we get a new version please? I do not plan to work on this patch in the 2021-03 commitfest.

Re: Use extended statistics to estimate (Var op Var) clauses

2021-03-01 Thread Mark Dilger
> On Nov 12, 2020, at 5:14 PM, Tomas Vondra > wrote: > > <0001-Support-estimation-of-clauses-of-the-form-V-20201113.patch> Your patch no longer applies. Can we get a new version please? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Use extended statistics to estimate (Var op Var) clauses

2020-11-12 Thread Tomas Vondra
Hi, Attached is a patch to allow estimation of (Var op Var) clauses using extended statistics. Currently we only use extended stats to estimate (Var op Const) clauses, which is sufficient for most cases, but it's not very hard to support this second type of clauses. This is not an entirely new