Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes
On Sun, Aug 27, 2017 at 8:31 PM, Michael Maliswrote: > (Sorry David. I initially replied only to you) > > Ok. I've attached a patch of a proof-of-concept. I have a few > questions about tests. > > What is typical workflow to add tests for changes to the planner? Add submitted patches at commitfest.postgresql.org > Also > I ran make check and it appears one of the existing tests is failing. > What is a typical way for going about discovering why the query plan > for a specific query changed? I don't have any magic answer on this point. > Also, how should I go about changing the > old test? Should I replace the old test output with the new test > output or modify the old test slightly to get it to produce the same > case as before? That's a judgement call, based on what you think the point of the test was. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes
Michael Malis wrote: > Hmm... It seems the command I used for obtaining a patch I got from > here https://wiki.postgresql.org/wiki/Working_with_Git truncated part > of the patch. I've attached the file generated from git diff > --patience master improve-partial-index-correlation-calculation > --no-color > improve-correlated-partial-index-cost-v2.patch to this > email. What is the correct command for generating a context diff? Yeah, I've had patches truncated by that too and I've never cared enough to see about getting it fixed. I think it's a bug in the filterdiff utility, but I got a stupid answer from the Debian maintainer when I reported it and didn't care to follow up any further. Eventually I gave up on using context diffs because of this problem. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes
Hmm... It seems the command I used for obtaining a patch I got from here https://wiki.postgresql.org/wiki/Working_with_Git truncated part of the patch. I've attached the file generated from git diff --patience master improve-partial-index-correlation-calculation --no-color > improve-correlated-partial-index-cost-v2.patch to this email. What is the correct command for generating a context diff? diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 051a854..58224e6 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -163,7 +163,10 @@ static void set_rel_width(PlannerInfo *root, RelOptInfo *rel); static double relation_byte_size(double tuples, int width); static double page_size(double tuples, int width); static double get_parallel_divisor(Path *path); - +static double ordered_page_read_cost(double pages_fetched, + int baserel_pages, + double spc_seq_page_cost, + double spc_random_page_cost); /* * clamp_row_est @@ -652,9 +655,26 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count, if (pages_fetched > 0) { - min_IO_cost = spc_random_page_cost; - if (pages_fetched > 1) -min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost; + if (index->indpred == NIL) + { +min_IO_cost = spc_random_page_cost; +if (pages_fetched > 1) + min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost; + } + else + { +/* + * For a partial index perfectly correlated with the table + * ordering, consecutive pages fetched are not guarenteed to + * be adjacent in the table. Instead use + * ordered_page_read_cost to estimate the cost of reading + * pages from the heap. + */ +min_IO_cost = ordered_page_read_cost(pages_fetched, + baserel->pages, + spc_seq_page_cost, + spc_seq_page_cost); + } } else min_IO_cost = 0; @@ -934,13 +954,11 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel, Cost indexTotalCost; QualCost qpqual_cost; Cost cpu_per_tuple; - Cost cost_per_page; Cost cpu_run_cost; double tuples_fetched; double pages_fetched; double spc_seq_page_cost, spc_random_page_cost; - double T; /* Should only be applied to base relations */ Assert(IsA(baserel, RelOptInfo)); @@ -961,28 +979,16 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel, _fetched); startup_cost += indexTotalCost; - T = (baserel->pages > 1) ? (double) baserel->pages : 1.0; /* Fetch estimated page costs for tablespace containing table. */ get_tablespace_page_costs(baserel->reltablespace, _random_page_cost, _seq_page_cost); - /* - * For small numbers of pages we should charge spc_random_page_cost - * apiece, while if nearly all the table's pages are being read, it's more - * appropriate to charge spc_seq_page_cost apiece. The effect is - * nonlinear, too. For lack of a better idea, interpolate like this to - * determine the cost per page. - */ - if (pages_fetched >= 2.0) - cost_per_page = spc_random_page_cost - - (spc_random_page_cost - spc_seq_page_cost) - * sqrt(pages_fetched / T); - else - cost_per_page = spc_random_page_cost; - - run_cost += pages_fetched * cost_per_page; + run_cost += ordered_page_read_cost(pages_fetched, + baserel->pages, + spc_seq_page_cost, + spc_random_page_cost); /* * Estimate CPU costs per tuple. @@ -5183,3 +5189,34 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual, return pages_fetched; } + + +/* + * Estimate the total cost of reading possibly nonconsecutive pages from the + * heap in order. + */ +static double +ordered_page_read_cost(double pages_fetched, int baserel_pages, + double spc_seq_page_cost, double spc_random_page_cost) +{ + double cost_per_page; + double T; + + T = (baserel_pages > 1) ? (double) baserel_pages : 1.0; + + /* + * For small numbers of pages we should charge spc_random_page_cost + * apiece, while if nearly all the table's pages are being read, it's more + * appropriate to charge spc_seq_page_cost apiece. The effect is + * nonlinear, too. For lack of a better idea, interpolate like this to + * determine the cost per page. + */ + if (pages_fetched >= 2.0) + cost_per_page = spc_random_page_cost - + (spc_random_page_cost - spc_seq_page_cost) + * sqrt(pages_fetched / T); + else + cost_per_page = spc_random_page_cost; + + return pages_fetched * cost_per_page; +} -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes
(Sorry David. I initially replied only to you) Ok. I've attached a patch of a proof-of-concept. I have a few questions about tests. What is typical workflow to add tests for changes to the planner? Also I ran make check and it appears one of the existing tests is failing. What is a typical way for going about discovering why the query plan for a specific query changed? Also, how should I go about changing the old test? Should I replace the old test output with the new test output or modify the old test slightly to get it to produce the same case as before? Thanks, Michael diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 051a854..58224e6 100644 *** a/src/backend/optimizer/path/costsize.c --- b/src/backend/optimizer/path/costsize.c *** *** 163,169 static void set_rel_width(PlannerInfo *root, RelOptInfo *rel); static double relation_byte_size(double tuples, int width); static double page_size(double tuples, int width); static double get_parallel_divisor(Path *path); ! /* * clamp_row_est --- 163,172 static double relation_byte_size(double tuples, int width); static double page_size(double tuples, int width); static double get_parallel_divisor(Path *path); ! static double ordered_page_read_cost(double pages_fetched, ! int baserel_pages, ! double spc_seq_page_cost, ! double spc_random_page_cost); /* * clamp_row_est *** *** 652,660 cost_index(IndexPath *path, PlannerInfo *root, double loop_count, if (pages_fetched > 0) { ! min_IO_cost = spc_random_page_cost; ! if (pages_fetched > 1) ! min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost; } else min_IO_cost = 0; --- 655,680 if (pages_fetched > 0) { ! if (index->indpred == NIL) ! { ! min_IO_cost = spc_random_page_cost; ! if (pages_fetched > 1) ! min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost; ! } ! else ! { ! /* ! * For a partial index perfectly correlated with the table ! * ordering, consecutive pages fetched are not guarenteed to ! * be adjacent in the table. Instead use ! * ordered_page_read_cost to estimate the cost of reading ! * pages from the heap. ! */ ! min_IO_cost = ordered_page_read_cost(pages_fetched, ! baserel->pages, ! spc_seq_page_cost, ! spc_seq_page_cost); ! } } else min_IO_cost = 0; *** *** 934,946 cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel, Cost indexTotalCost; QualCost qpqual_cost; Cost cpu_per_tuple; - Cost cost_per_page; Cost cpu_run_cost; double tuples_fetched; double pages_fetched; double spc_seq_page_cost, spc_random_page_cost; - double T; /* Should only be applied to base relations */ Assert(IsA(baserel, RelOptInfo)); --- 954,964 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes
On Sat, Aug 26, 2017 at 05:50:26PM -0700, Michael Malis wrote: > Do you think this is a reasonable approach? Should I start working > on a patch based on the solution I described or is there some other > approach I should look into? You'll get more traction with a proof-of-concept patch accompanying the plan than without. Don't bother with any level of care past proof-of-concept until you get positive feedback. Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes
Do you think this is a reasonable approach? Should I start working on a patch based on the solution I described or is there some other approach I should look into? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers