Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-04-09 Thread Tomas Vondra
On Tue, Apr 09, 2019 at 12:14:47PM -0400, Alvaro Herrera wrote: On 2019-Mar-27, Tomas Vondra wrote: Attached are the remaining parts of this patch series - the multivariate histograms, and also a new patch tweaking regression tests for the old statistic types (ndistinct, dependencies) to adopt

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-04-09 Thread Alvaro Herrera
On 2019-Mar-27, Tomas Vondra wrote: > Attached are the remaining parts of this patch series - the multivariate > histograms, and also a new patch tweaking regression tests for the old > statistic types (ndistinct, dependencies) to adopt the function-based > approach instead of the regular

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-04-08 Thread Michael Paquier
On Sat, Mar 30, 2019 at 09:13:01PM +0100, Tomas Vondra wrote: > Hmmm, what's the right status in the CF app when a part of a patch was > committed and the rest should be moved to the next CF? Committed, Moved > to next CF, or something else? This stuff has been around for nine commit fests, and

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-30 Thread Tomas Vondra
On Sun, Mar 31, 2019 at 08:50:53AM +0800, John Naylor wrote: I believe I found a typo in mcv.c, fix attached. Thanks, pushed. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-30 Thread John Naylor
I believe I found a typo in mcv.c, fix attached. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services mcv-comment-fix.patch Description: Binary data

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-30 Thread Tomas Vondra
On Wed, Mar 27, 2019 at 08:55:07PM +0100, Tomas Vondra wrote: Hi, I've now committed the MCV part, ... Hmmm, what's the right status in the CF app when a part of a patch was committed and the rest should be moved to the next CF? Committed, Moved to next CF, or something else? regards --

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-27 Thread Petr Jelinek
On 27/03/2019 20:55, Tomas Vondra wrote: > Hi, > > I've now committed the MCV part, after addressing the last two issues > raised by Dean: > Congrats! -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-27 Thread Tomas Vondra
Hi, I've now committed the MCV part, after addressing the last two issues raised by Dean: * The MCV build now always uses the mincount to decide which of the items to keep in the list. * Both the MCV build and deserialization now uses the same maximum number of list items (10k).

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-26 Thread Tomas Vondra
On Tue, Mar 26, 2019 at 01:37:33PM +, Dean Rasheed wrote: On Tue, 26 Mar 2019 at 11:59, Dean Rasheed wrote: On Mon, 25 Mar 2019 at 23:36, Tomas Vondra wrote: > > Attached is an updated patch... I just looked through the latest set of changes and I have a couple of additional review

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-26 Thread Tomas Vondra
On Tue, Mar 26, 2019 at 11:59:56AM +, Dean Rasheed wrote: On Mon, 25 Mar 2019 at 23:36, Tomas Vondra wrote: Attached is an updated patch, fixing all the issues pointed out so far. Unless there are some objections, I plan to commit the 0001 part by the end of this CF. Part 0002 is a matter

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-26 Thread Dean Rasheed
On Tue, 26 Mar 2019 at 11:59, Dean Rasheed wrote: > > On Mon, 25 Mar 2019 at 23:36, Tomas Vondra > wrote: > > > > Attached is an updated patch... > > I just looked through the latest set of changes and I have a couple of > additional review comments: > I just spotted another issue while

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-26 Thread Dean Rasheed
On Mon, 25 Mar 2019 at 23:36, Tomas Vondra wrote: > > Attached is an updated patch, fixing all the issues pointed out so far. > Unless there are some objections, I plan to commit the 0001 part by the > end of this CF. Part 0002 is a matter for PG13, as previously agreed. > Yes, I think that's

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-25 Thread Tomas Vondra
On 3/24/19 8:36 AM, Dean Rasheed wrote: > On Sun, 24 Mar 2019 at 00:17, David Rowley > wrote: >> >> On Sun, 24 Mar 2019 at 12:41, Tomas Vondra >> wrote: >>> >>> On 3/21/19 4:05 PM, David Rowley wrote: >> 29. Looking at the tests I see you're testing that you get bad estimates

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-25 Thread Tomas Vondra
Hi, Attached is an updated patch, fixing all the issues pointed out so far. Unless there are some objections, I plan to commit the 0001 part by the end of this CF. Part 0002 is a matter for PG13, as previously agreed. On 3/24/19 1:17 AM, David Rowley wrote: > On Sun, 24 Mar 2019 at 12:41, Tomas

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-24 Thread Dean Rasheed
On Sun, 24 Mar 2019 at 00:17, David Rowley wrote: > > On Sun, 24 Mar 2019 at 12:41, Tomas Vondra > wrote: > > > > On 3/21/19 4:05 PM, David Rowley wrote: > > > > 29. Looking at the tests I see you're testing that you get bad > > > estimates without extended stats. That does not really seem

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-23 Thread David Rowley
On Sun, 24 Mar 2019 at 12:41, Tomas Vondra wrote: > > On 3/21/19 4:05 PM, David Rowley wrote: > > 11. In get_mincount_for_mcv_list() it's probably better to have the > > numerical literals of 0.0 instead of just 0. > > Why? Isn't it what we do for float and double literals? > > > 12. I think it

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-23 Thread Tomas Vondra
On 3/21/19 4:05 PM, David Rowley wrote: > On Mon, 18 Mar 2019 at 02:18, Tomas Vondra > wrote: >> Yes, it was using the toasted value directly. The attached patch >> detoasts the value explicitly, similarly to the per-column stats, and it >> also removes the 1MB limit. > > I just made a pass

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-21 Thread David Rowley
On Mon, 18 Mar 2019 at 02:18, Tomas Vondra wrote: > Yes, it was using the toasted value directly. The attached patch > detoasts the value explicitly, similarly to the per-column stats, and it > also removes the 1MB limit. I just made a pass over 0001 and 0002. 0002 is starting to look pretty

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Tomas Vondra
Hi, On 3/17/19 12:47 PM, Dean Rasheed wrote: > On Sat, 16 Mar 2019 at 23:44, Tomas Vondra > wrote: >> >>> 28). I just spotted the 1MB limit on the serialised MCV list size. I >>> think this is going to be too limiting. For example, if the stats >>> target is at its maximum of 1, that only

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Tomas Vondra
On 3/17/19 1:14 PM, Dean Rasheed wrote: > On Sat, 16 Mar 2019 at 23:44, Tomas Vondra > wrote: >>> >>> 16). This regression test fails for me: >>> >>> @@ -654,11 +654,11 @@ >>> -- check change of unrelated column type does not reset the MCV statistics >>> ALTER TABLE mcv_lists ALTER COLUMN d

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Dean Rasheed
On Sat, 16 Mar 2019 at 23:44, Tomas Vondra wrote: > > > > 16). This regression test fails for me: > > > > @@ -654,11 +654,11 @@ > > -- check change of unrelated column type does not reset the MCV statistics > > ALTER TABLE mcv_lists ALTER COLUMN d TYPE VARCHAR(64); > > SELECT * FROM

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Dean Rasheed
On Sat, 16 Mar 2019 at 23:44, Tomas Vondra wrote: > > > 28). I just spotted the 1MB limit on the serialised MCV list size. I > > think this is going to be too limiting. For example, if the stats > > target is at its maximum of 1, that only leaves around 100 bytes > > for each item's values,

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-17 Thread Dean Rasheed
On Sat, 16 Mar 2019 at 23:44, Tomas Vondra wrote: > > > 21). For consistency with other bms_ functions, I think the name of > > the Bitmapset argument for bms_member_index() should just be called > > "a". Nitpicking, I'd also put bms_member_index() immediately after > > bms_is_member() in the

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-16 Thread Tomas Vondra
On 3/16/19 10:26 PM, Dean Rasheed wrote: > On Fri, 15 Mar 2019 at 00:06, Tomas Vondra > wrote: >> ... attached patch ... > > Some more review comments, carrying on from where I left off: > > 16). This regression test fails for me: > > @@ -654,11 +654,11 @@ > -- check change of unrelated

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-16 Thread Dean Rasheed
On Fri, 15 Mar 2019 at 00:06, Tomas Vondra wrote: > ... attached patch ... Some more review comments, carrying on from where I left off: 16). This regression test fails for me: @@ -654,11 +654,11 @@ -- check change of unrelated column type does not reset the MCV statistics ALTER TABLE

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-16 Thread Tomas Vondra
On 3/16/19 11:55 AM, Dean Rasheed wrote: > On Fri, 15 Mar 2019 at 00:06, Tomas Vondra > wrote: >> I've noticed an annoying thing when modifying type of column not >> included in a statistics... >> >> That is, we don't remove the statistics, but the estimate still changes. >> But that's because

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-16 Thread Dean Rasheed
On Fri, 15 Mar 2019 at 00:06, Tomas Vondra wrote: > I've noticed an annoying thing when modifying type of column not > included in a statistics... > > That is, we don't remove the statistics, but the estimate still changes. > But that's because the ALTER TABLE also resets reltuples/relpages: > >

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-15 Thread Robert Haas
On Thu, Mar 14, 2019 at 7:50 AM Kyotaro HORIGUCHI wrote: > Thank you for the kind explanation. I'm not sure but I understand > this as '"lists" is an extension' turned into 'lists are an > extension'. That is, the "lists' expresses a concept rather than > the plurarilty. (But I haven't got a gut

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-14 Thread Tomas Vondra
On 3/14/19 12:56 PM, Kyotaro HORIGUCHI wrote: > At Wed, 13 Mar 2019 19:37:45 +1300, David Rowley > wrote in > >> On Wed, 13 Mar 2019 at 17:20, Kyotaro HORIGUCHI >> wrote: >>> bms_member_index seems working differently than maybe expected. >>> >>> bms_member_index((2, 4), 0) => 0, (I think)

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-14 Thread Kyotaro HORIGUCHI
At Wed, 13 Mar 2019 19:37:45 +1300, David Rowley wrote in > On Wed, 13 Mar 2019 at 17:20, Kyotaro HORIGUCHI > wrote: > > bms_member_index seems working differently than maybe expected. > > > > bms_member_index((2, 4), 0) => 0, (I think) should be -1 > > bms_member_index((2, 4), 1) => 0,

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-14 Thread Kyotaro HORIGUCHI
At Wed, 13 Mar 2019 12:39:30 -0400, Robert Haas wrote in > On Wed, Mar 13, 2019 at 12:20 AM Kyotaro HORIGUCHI > wrote: > > +Multivariate MCV (most-common values) lists are a straightforward > > extension of > > > > "lists are *a*" is wrong? > > No, that's correct. Not sure exactly what

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-13 Thread Robert Haas
On Wed, Mar 13, 2019 at 12:20 AM Kyotaro HORIGUCHI wrote: > +Multivariate MCV (most-common values) lists are a straightforward extension > of > > "lists are *a*" is wrong? No, that's correct. Not sure exactly what your concern is, but it's probably related to the fact that the first parent

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-13 Thread David Rowley
On Wed, 13 Mar 2019 at 17:20, Kyotaro HORIGUCHI wrote: > bms_member_index seems working differently than maybe expected. > > bms_member_index((2, 4), 0) => 0, (I think) should be -1 > bms_member_index((2, 4), 1) => 0, should be -1 > bms_member_index((2, 4), 2) => 0, should be 0 >

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-12 Thread Kyotaro HORIGUCHI
Hello. At Wed, 13 Mar 2019 02:25:40 +0100, Tomas Vondra wrote in <19f76496-dcf3-ccea-dd82-26fbed57b...@2ndquadrant.com> > Hi, > > attached is an updated version of the patch, addressing most of the > issues raised in the recent reviews. There are two main exceptions: > > 1) I haven't reworked

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-12 Thread Tomas Vondra
Hi, attached is an updated version of the patch, addressing most of the issues raised in the recent reviews. There are two main exceptions: 1) I haven't reworked the regression tests to use a function to check cardinality estimates and making them faster. 2) Review handling of bitmap in

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-11 Thread Dean Rasheed
On Sun, 10 Mar 2019 at 22:28, David Rowley wrote: > > On Mon, 11 Mar 2019 at 06:36, Tomas Vondra > wrote: > > > > On 3/9/19 7:33 PM, Dean Rasheed wrote: > > > I wonder if it's possible to write smaller, more targeted tests. > > > Currently "stats_ext" is by far the slowest test in its group,

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-11 Thread Dean Rasheed
On Sun, 10 Mar 2019 at 17:36, Tomas Vondra wrote: > On 3/10/19 2:09 PM, Dean Rasheed wrote: > > 14). The attnums Bitmapset passed to > > statext_is_compatible_clause_internal() is an input/output argument > > that it updates. That should probably be documented. When it calls > > itself

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-10 Thread Tomas Vondra
On 3/10/19 11:27 PM, David Rowley wrote: > On Mon, 11 Mar 2019 at 06:36, Tomas Vondra > wrote: >> >> On 3/9/19 7:33 PM, Dean Rasheed wrote: >>> I wonder if it's possible to write smaller, more targeted tests. >>> Currently "stats_ext" is by far the slowest test in its group, and I'm >>> not

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-10 Thread David Rowley
On Mon, 11 Mar 2019 at 06:36, Tomas Vondra wrote: > > On 3/9/19 7:33 PM, Dean Rasheed wrote: > > I wonder if it's possible to write smaller, more targeted tests. > > Currently "stats_ext" is by far the slowest test in its group, and I'm > > not sure that some of those tests add much. It ought to

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-10 Thread Tomas Vondra
Hi Dean, Thanks for the review. I'll post a patch fixing most of the stuff soon, but a few comments/questions regarding some of the issues: On 3/9/19 7:33 PM, Dean Rasheed wrote: > 5). It's not obvious what some of the new test cases in the > "stats_ext" tests are intended to show. For example,

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-10 Thread Dean Rasheed
On Sun, 10 Mar 2019 at 13:09, Dean Rasheed wrote: > Here are some more comments: > One more thing --- the comment for statext_clauselist_selectivity() says: * So (simple_selectivity - base_selectivity) may be seen as a correction for * the part not covered by the MCV list. That's not quite

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-10 Thread Dean Rasheed
On Sat, 9 Mar 2019 at 18:33, Dean Rasheed wrote: > > On Thu, 28 Feb 2019 at 19:56, Tomas Vondra > wrote: > > Attached is an updated version of this patch series. > > Here are some random review comments. I'll add more later, but I'm out > of energy for today. > Here are some more comments:

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-09 Thread Dean Rasheed
On Thu, 28 Feb 2019 at 19:56, Tomas Vondra wrote: > Attached is an updated version of this patch series. Here are some random review comments. I'll add more later, but I'm out of energy for today. 1). src/test/regress/expected/type_sanity.out has bit-rotted. 2). Duplicate OIDs (3425). 3). It

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-02 Thread David Rowley
On Fri, 1 Mar 2019 at 08:56, Tomas Vondra wrote: > Attached is an updated version of this patch series. I made a quick pass over the 0001 patch. I edited a few small things along the way; patch attached. I'll try to do a more in-depth review soon. -- David Rowley

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-28 Thread Tomas Vondra
Hi, Attached is an updated version of this patch series. I've decided to rebase and send both parts (MCV and histograms), although we've agreed to focus on the MCV part for now. I don't want to leave the histogram to lag behind, because (a) then it'd be much more work to update it, and (b) I

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-06 Thread Dean Rasheed
On Wed, 6 Feb 2019 at 23:44, Tomas Vondra wrote: > > On 2/6/19 10:59 PM, David Rowley wrote: > > On Thu, 7 Feb 2019 at 03:16, Alvaro Herrera > > wrote: > >> I wonder what should we be doing with this series -- concretely, should > >> the effort concentrate on one of the two patches, and leave

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-06 Thread Tomas Vondra
On 2/6/19 10:59 PM, David Rowley wrote: > On Thu, 7 Feb 2019 at 03:16, Alvaro Herrera wrote: >> I wonder what should we be doing with this series -- concretely, should >> the effort concentrate on one of the two patches, and leave the other >> for pg13, to increase the chances of the first one

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-06 Thread David Rowley
On Thu, 7 Feb 2019 at 03:16, Alvaro Herrera wrote: > I wonder what should we be doing with this series -- concretely, should > the effort concentrate on one of the two patches, and leave the other > for pg13, to increase the chances of the first one being in pg12? I > would favor that approach,

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-06 Thread Alvaro Herrera
On 2019-Feb-04, Tomas Vondra wrote: > On 2/4/19 5:53 AM, Michael Paquier wrote: > > Moved the patch to next CF for now, waiting on author as the last > > review happened not so long ago. > > Thanks. Yes, I intend to send a new patch version soon. I wonder what should we be doing with this

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-04 Thread Tomas Vondra
On 2/4/19 5:53 AM, Michael Paquier wrote: > On Sun, Feb 03, 2019 at 02:43:24AM -0800, Andres Freund wrote: >> Are you planning to update the patch, or should the entry be marked as >> RWF? > > Moved the patch to next CF for now, waiting on author as the last > review happened not so long ago.

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-03 Thread Michael Paquier
On Sun, Feb 03, 2019 at 02:43:24AM -0800, Andres Freund wrote: > Are you planning to update the patch, or should the entry be marked as > RWF? Moved the patch to next CF for now, waiting on author as the last review happened not so long ago. -- Michael signature.asc Description: PGP signature

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-03 Thread Andres Freund
Hi Tomas, On 2019-01-24 14:59:50 +1300, David Rowley wrote: > On Wed, 23 Jan 2019 at 12:46, David Rowley > wrote: > > (Stopped in statext_mcv_build(). Need to take a break) > > Continuing... Are you planning to update the patch, or should the entry be marked as RWF? - Andres

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-23 Thread David Rowley
On Wed, 23 Jan 2019 at 12:46, David Rowley wrote: > (Stopped in statext_mcv_build(). Need to take a break) Continuing... 27. statext_mcv_build() could declare the int j,k variables in the scope that they're required in. 28. "an array" * build array of SortItems for distinct groups and counts

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-22 Thread David Rowley
On Wed, 23 Jan 2019 at 03:43, David Rowley wrote: > I made another pass over the 0001 patch. I've not read through mcv.c > again yet. Will try to get to that soon. > > 0001-multivariate-MCV-lists-20190117.patch I started on mcv.c this morning. I'm still trying to build myself a picture of how it

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-22 Thread David Rowley
On Fri, 18 Jan 2019 at 10:03, Tomas Vondra wrote: > thanks for the review. The attached patches address most of the issues > mentioned in the past several messages, both in the MCV and histogram parts. I made another pass over the 0001 patch. I've not read through mcv.c again yet. Will try to

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-21 Thread David Rowley
Thanks for making those changes. On Fri, 18 Jan 2019 at 10:03, Tomas Vondra wrote: > A couple of items remains: > > > 15. I see you broke out the remainder of the code from > > clauselist_selectivity() into clauselist_selectivity_simple(). The > > comment looks like just a copy and paste from

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-17 Thread Tomas Vondra
Hi, thanks for the review. The attached patches address most of the issues mentioned in the past several messages, both in the MCV and histogram parts. A couple of items remains: > 15. I see you broke out the remainder of the code from > clauselist_selectivity() into

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-17 Thread David Rowley
I've started looking over 0002. Here are a few things so far: 1. I think this should be pg_statistic_ext.stxhistogram? Values of the pg_histogram can be obtained only from the pg_statistic.stxhistogram column. 2. I don't think this bms_copy is needed anymore. I think it was

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-17 Thread Dean Rasheed
On Thu, 17 Jan 2019 at 03:42, David Rowley wrote: > 39. I don't see analyze_mcv_list() being used anywhere around this comment: > > * If we can fit all the items onto the MCV list, do that. Otherwise use > * analyze_mcv_list to decide how many items to keep in the MCV list, just > * like for the

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread Dean Rasheed
On Thu, 17 Jan 2019 at 03:42, David Rowley wrote: > 35. The evaluation order of this macro is wrong. > > #define ITEM_SIZE(ndims) \ > (ndims * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)) > > You'd probably want ITEM_SIZE(10) to return 170, but: > > select (10 * (2 + 1) + 2 * 8); >

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread David Rowley
On Thu, 17 Jan 2019 at 01:56, David Rowley wrote: > At this stage I'm trying to get to know the patch. I read a lot of > discussing between you and Dean ironing out how the stats should be > used to form selectivities. At the time I'd not read the patch yet, > so most of it went over my head. >

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread David Rowley
On Thu, 17 Jan 2019 at 14:19, Tomas Vondra wrote: > > 12. Should we be referencing the source from the docs? > > > > See mcv_clauselist_selectivity > > in src/backend/statistics/mcv.c for details. > > > > hmm. I see it's not the first going by: git grep -E "\w+\.c\<" > > gt > Hmm,

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread Tomas Vondra
On 1/16/19 7:56 AM, David Rowley wrote:> On Tue, 15 Jan 2019 at 08:21, Tomas Vondra wrote: >> Turns out you were right - the attribute_referenced piece was quite >> unnecessary. So I've removed it. I've also extended the regression tests >> to verify changing type of another column does not

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread David Rowley
On Tue, 15 Jan 2019 at 08:21, Tomas Vondra wrote: > Turns out you were right - the attribute_referenced piece was quite > unnecessary. So I've removed it. I've also extended the regression tests > to verify changing type of another column does not reset the stats. (Trying to find my feet over

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-14 Thread Tomas Vondra
On 1/14/19 4:31 PM, Tomas Vondra wrote: > > On 1/14/19 12:20 PM, Dean Rasheed wrote: >> (Removing Adrien from the CC list, because messages to that address >> keep bouncing) >> >> On Sun, 13 Jan 2019 at 00:31, Tomas Vondra >> wrote: >>> >>> On 1/10/19 6:09 PM, Dean Rasheed wrote: In

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-14 Thread Tomas Vondra
On 1/14/19 12:20 PM, Dean Rasheed wrote: > (Removing Adrien from the CC list, because messages to that address > keep bouncing) > > On Sun, 13 Jan 2019 at 00:31, Tomas Vondra > wrote: >> >> On 1/10/19 6:09 PM, Dean Rasheed wrote: >>> >>> In the previous discussion around

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-14 Thread Dean Rasheed
(Removing Adrien from the CC list, because messages to that address keep bouncing) On Sun, 13 Jan 2019 at 00:31, Tomas Vondra wrote: > > On 1/10/19 6:09 PM, Dean Rasheed wrote: > > > > In the previous discussion around UpdateStatisticsForTypeChange(), the > > consensus appeared to be that we

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-14 Thread Dean Rasheed
On Sun, 13 Jan 2019 at 00:04, Tomas Vondra wrote: > On 1/12/19 8:49 AM, Dean Rasheed wrote: > > A possible refinement would be to say that if there are more than > > stats_target items more common than this mincount threshold, rather than > > excluding the least common ones to get the target

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-12 Thread Tomas Vondra
On 1/10/19 6:09 PM, Dean Rasheed wrote: > On Wed, 26 Dec 2018 at 22:09, Tomas Vondra > wrote: >> >> Attached is an updated version of the patch - rebased and fixing the >> warnings reported by Thomas Munro. >> > > Here are a few random review comments based on what I've read so far: > > > On

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-12 Thread Tomas Vondra
On 1/12/19 8:49 AM, Dean Rasheed wrote: > On Fri, 11 Jan 2019, 21:18 Tomas Vondra wrote: > > > On 1/10/19 4:20 PM, Dean Rasheed wrote: > > ... > > > > So perhaps what we should do for multivariate stats is simply use the > > relative

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-11 Thread Dean Rasheed
On Fri, 11 Jan 2019, 21:18 Tomas Vondra > On 1/10/19 4:20 PM, Dean Rasheed wrote: > > ... > > > > So perhaps what we should do for multivariate stats is simply use the > > relative standard error approach (i.e., reuse the patch in [2] with a > > 20% RSE cutoff). That had a lot of testing at the

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-11 Thread Tomas Vondra
On 1/10/19 4:20 PM, Dean Rasheed wrote: > ... > > So perhaps what we should do for multivariate stats is simply use the > relative standard error approach (i.e., reuse the patch in [2] with a > 20% RSE cutoff). That had a lot of testing at the time, against a wide > range of data distributions,

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-10 Thread Dean Rasheed
On Wed, 26 Dec 2018 at 22:09, Tomas Vondra wrote: > > Attached is an updated version of the patch - rebased and fixing the > warnings reported by Thomas Munro. > Here are a few random review comments based on what I've read so far: On the CREATE STATISTICS doc page, the syntax in the new

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-10 Thread Tomas Vondra
On 1/10/19 4:20 PM, Dean Rasheed wrote: > On Wed, 9 Jan 2019 at 15:40, Tomas Vondra > wrote: >> On 1/8/19 3:18 PM, Dean Rasheed wrote: >>> So actually, the estimate for a group of values will be either the MCV >>> item's frequency (if the MCV item is kept), or (roughly) the MCV >>> item's

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-10 Thread Dean Rasheed
On Wed, 9 Jan 2019 at 15:40, Tomas Vondra wrote: > On 1/8/19 3:18 PM, Dean Rasheed wrote: > > So actually, the estimate for a group of values will be either the MCV > > item's frequency (if the MCV item is kept), or (roughly) the MCV > > item's base_frequency (if the MCV item is not kept). That

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-09 Thread Tomas Vondra
On 1/8/19 3:18 PM, Dean Rasheed wrote: > On Mon, 7 Jan 2019 at 00:45, Tomas Vondra > wrote: >> >> FWIW the main unsolved issue (at least on the MCV part) is how it >> decides which items to keep in the list. >> >> As explained in [1], in the multivariate case we can't simply look at >> the

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-08 Thread Dean Rasheed
On Mon, 7 Jan 2019 at 00:45, Tomas Vondra wrote: > > FWIW the main unsolved issue (at least on the MCV part) is how it > decides which items to keep in the list. > > As explained in [1], in the multivariate case we can't simply look at > the group frequency and compare it to the average frequency

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-06 Thread Tomas Vondra
FWIW the main unsolved issue (at least on the MCV part) is how it decides which items to keep in the list. As explained in [1], in the multivariate case we can't simply look at the group frequency and compare it to the average frequency (of the non-MCV items), which is what analyze_mcv_list()

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-11-26 Thread Tomas Vondra
On 11/26/18 11:29 PM, Thomas Munro wrote: > On Mon, Sep 3, 2018 at 11:17 AM Tomas Vondra > wrote: >> Attached is an updated version of the patch series, adopting a couple of >> improvements - both for MCV lists and histograms. > > Hello Tomas, > > FYI, here are a couple of warnings from GCC (I

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-11-26 Thread Thomas Munro
On Mon, Sep 3, 2018 at 11:17 AM Tomas Vondra wrote: > Attached is an updated version of the patch series, adopting a couple of > improvements - both for MCV lists and histograms. Hello Tomas, FYI, here are a couple of warnings from GCC (I just noticed because I turned on -Werror on cfbot so

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-09-10 Thread Tomas Vondra
Hi, On 07/18/2018 09:32 AM, Konstantin Knizhnik wrote: > > > On 18.07.2018 02:58, Tomas Vondra wrote: >> On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote: >>> ... >>> >>> Teodor Sigaev has proposed an alternative approach for calculating >>> selectivity of multicolumn join or compound index

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-09-09 Thread Tomas Vondra
Hi, On 09/04/2018 04:16 PM, Dean Rasheed wrote: > On 3 September 2018 at 00:17, Tomas Vondra > wrote: >> Hi, >> >> Attached is an updated version of the patch series, adopting a couple of >> improvements - both for MCV lists and histograms. >> >> >> MCV >> --- >> >> For the MCV list part, I've

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-09-04 Thread Dean Rasheed
On 3 September 2018 at 00:17, Tomas Vondra wrote: > Hi, > > Attached is an updated version of the patch series, adopting a couple of > improvements - both for MCV lists and histograms. > > > MCV > --- > > For the MCV list part, I've adopted the approach proposed by Dean, using > base selectivity

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-08-06 Thread Tomas Vondra
On 08/03/2018 04:24 PM, Dean Rasheed wrote: On 17 July 2018 at 14:03, Tomas Vondra wrote: For equalities it's going to be hard. The only thing I can think of at the moment is checking if there are any matching buckets at all, and using that to decide whether to extrapolate the MCV

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-18 Thread Konstantin Knizhnik
On 18.07.2018 02:58, Tomas Vondra wrote: On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote: ... Teodor Sigaev has proposed an alternative approach for calculating selectivity of multicolumn join or compound index search. Usually DBA creates compound indexes which can be used  by optimizer

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-17 Thread Tomas Vondra
On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote: > ... > > Teodor Sigaev has proposed an alternative approach for calculating > selectivity of multicolumn join or compound index search. > Usually DBA creates compound indexes which can be used  by optimizer to > build efficient query execution

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-17 Thread Konstantin Knizhnik
On 16.07.2018 23:55, Tomas Vondra wrote: On 07/16/2018 02:54 PM, Dean Rasheed wrote: On 16 July 2018 at 13:23, Tomas Vondra wrote: The top-level clauses allow us to make such deductions, with deeper clauses it's much more difficult (perhaps impossible). Because for example with (a=1 AND

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-17 Thread Tomas Vondra
On 07/17/2018 11:09 AM, Dean Rasheed wrote: On 16 July 2018 at 21:55, Tomas Vondra wrote: ... >> So, how would the proposed algorithm work? Let's start with "a=1": sel(a=1) = 0.1508 I don't see much point in applying the two "b" clauses independently (or how would it be done, as it's

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-17 Thread Dean Rasheed
On 16 July 2018 at 21:55, Tomas Vondra wrote: > > > On 07/16/2018 02:54 PM, Dean Rasheed wrote: >> On 16 July 2018 at 13:23, Tomas Vondra wrote: > The top-level clauses allow us to make such deductions, with deeper > clauses it's much more difficult (perhaps impossible). Because for

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-16 Thread Tomas Vondra
On 07/16/2018 02:54 PM, Dean Rasheed wrote: > On 16 July 2018 at 13:23, Tomas Vondra wrote: The top-level clauses allow us to make such deductions, with deeper clauses it's much more difficult (perhaps impossible). Because for example with (a=1 AND b=1) there can be just a

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-16 Thread Dean Rasheed
On 16 July 2018 at 13:23, Tomas Vondra wrote: >>> The top-level clauses allow us to make such deductions, with deeper >>> clauses it's much more difficult (perhaps impossible). Because for >>> example with (a=1 AND b=1) there can be just a single match, so if we >>> find it in MCV we're done.

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-16 Thread Tomas Vondra
On 07/15/2018 11:36 AM, Dean Rasheed wrote: On 13 July 2018 at 18:27, Tomas Vondra wrote: I'm not so sure. The issue is that a lot of the MCV deductions depends on whether we can answer questions like "Is there a single match?" or "If we got a match in MCV, do we need to look at the non-MCV

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-16 Thread Tomas Vondra
On 07/16/2018 12:16 AM, Tomas Vondra wrote: On 07/15/2018 04:43 PM, Dean Rasheed wrote: On 15 July 2018 at 14:29, Tomas Vondra wrote: It's quite unclear to me how this algorithm could reliably end up with hist_sel=0 (in cases where we already don't end up with that). I mean, if a bucket

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-15 Thread Tomas Vondra
On 07/15/2018 04:43 PM, Dean Rasheed wrote: > On 15 July 2018 at 14:29, Tomas Vondra wrote: >> It's quite unclear to me how this algorithm could reliably end up with >> hist_sel=0 (in cases where we already don't end up with that). I mean, >> if a bucket matches the conditions, then the only

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-15 Thread Dean Rasheed
On 15 July 2018 at 14:29, Tomas Vondra wrote: > It's quite unclear to me how this algorithm could reliably end up with > hist_sel=0 (in cases where we already don't end up with that). I mean, > if a bucket matches the conditions, then the only way to eliminate is by > deducing that MCV already

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-15 Thread Tomas Vondra
On 07/15/2018 11:36 AM, Dean Rasheed wrote: > ... > > What I'm considering is an algorithm where we simultaneously compute 3 things: > > simple_sel - The result we would get without multivariate stats (*) > mcv_sel - The multivariate MCV result > hist_sel - The multivariate histogram result >

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-15 Thread Dean Rasheed
On 13 July 2018 at 18:27, Tomas Vondra wrote: > I'm not so sure. The issue is that a lot of the MCV deductions depends > on whether we can answer questions like "Is there a single match?" or > "If we got a match in MCV, do we need to look at the non-MCV part?" This > is not very different from

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-13 Thread Tomas Vondra
On 07/13/2018 01:19 PM, Dean Rasheed wrote: > On 24 June 2018 at 20:45, Tomas Vondra wrote: >> Attached is a rebased version of this patch series, mostly just fixing >> the breakage caused by reworked format of initial catalog data. >> >> Aside from that, the MCV building now adopts the logic

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-07-13 Thread Dean Rasheed
On 24 June 2018 at 20:45, Tomas Vondra wrote: > Attached is a rebased version of this patch series, mostly just fixing > the breakage caused by reworked format of initial catalog data. > > Aside from that, the MCV building now adopts the logic introduced by > commit b5db1d93d2 for single-column

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2018-06-24 Thread Tomas Vondra
Hi all, Attached is a rebased version of this patch series, mostly just fixing the breakage caused by reworked format of initial catalog data. Aside from that, the MCV building now adopts the logic introduced by commit b5db1d93d2 for single-column MCV lists. The new algorithm seems pretty good

  1   2   >