Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-30 Thread Robert Haas
On Sun, Aug 27, 2017 at 8:31 PM, Michael Malis  wrote:
> (Sorry David. I initially replied only to you)
>
> Ok. I've attached a patch of a proof-of-concept. I have a few
> questions about tests.
>
> What is typical workflow to add tests for changes to the planner?

Add submitted patches at commitfest.postgresql.org

> Also
> I ran make check and it appears one of the existing tests is failing.
> What is a typical way for going about discovering why the query plan
> for a specific query changed?

I don't have any magic answer on this point.

> Also, how should I go about changing the
> old test? Should I replace the old test output with the new test
> output or modify the old test slightly to get it to produce the same
> case as before?

That's a judgement call, based on what you think the point of the test was.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-28 Thread Alvaro Herrera
Michael Malis wrote:
> Hmm... It seems the command I used for obtaining a patch I got from
> here https://wiki.postgresql.org/wiki/Working_with_Git truncated part
> of the patch. I've attached the file generated from git diff
> --patience master improve-partial-index-correlation-calculation
> --no-color > improve-correlated-partial-index-cost-v2.patch to this
> email. What is the correct command for generating a context diff?

Yeah, I've had patches truncated by that too and I've never cared enough
to see about getting it fixed.  I think it's a bug in the filterdiff
utility, but I got a stupid answer from the Debian maintainer when I
reported it and didn't care to follow up any further.  Eventually I gave
up on using context diffs because of this problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-27 Thread Michael Malis
Hmm... It seems the command I used for obtaining a patch I got from
here https://wiki.postgresql.org/wiki/Working_with_Git truncated part
of the patch. I've attached the file generated from git diff
--patience master improve-partial-index-correlation-calculation
--no-color > improve-correlated-partial-index-cost-v2.patch to this
email. What is the correct command for generating a context diff?
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..58224e6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -163,7 +163,10 @@ static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
-
+static double ordered_page_read_cost(double pages_fetched,
+	   int baserel_pages,
+	   double spc_seq_page_cost,
+	   double spc_random_page_cost);
 
 /*
  * clamp_row_est
@@ -652,9 +655,26 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 
 		if (pages_fetched > 0)
 		{
-			min_IO_cost = spc_random_page_cost;
-			if (pages_fetched > 1)
-min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+			if (index->indpred == NIL)
+			{
+min_IO_cost = spc_random_page_cost;
+if (pages_fetched > 1)
+	min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+			}
+			else
+			{
+/*
+ * For a partial index perfectly correlated with the table
+ * ordering, consecutive pages fetched are not guarenteed to
+ * be adjacent in the table. Instead use
+ * ordered_page_read_cost to estimate the cost of reading
+ * pages from the heap.
+ */
+min_IO_cost = ordered_page_read_cost(pages_fetched,
+	 baserel->pages,
+	 spc_seq_page_cost,
+	 spc_seq_page_cost);
+			}
 		}
 		else
 			min_IO_cost = 0;
@@ -934,13 +954,11 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 	Cost		indexTotalCost;
 	QualCost	qpqual_cost;
 	Cost		cpu_per_tuple;
-	Cost		cost_per_page;
 	Cost		cpu_run_cost;
 	double		tuples_fetched;
 	double		pages_fetched;
 	double		spc_seq_page_cost,
 spc_random_page_cost;
-	double		T;
 
 	/* Should only be applied to base relations */
 	Assert(IsA(baserel, RelOptInfo));
@@ -961,28 +979,16 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 		 _fetched);
 
 	startup_cost += indexTotalCost;
-	T = (baserel->pages > 1) ? (double) baserel->pages : 1.0;
 
 	/* Fetch estimated page costs for tablespace containing table. */
 	get_tablespace_page_costs(baserel->reltablespace,
 			  _random_page_cost,
 			  _seq_page_cost);
 
-	/*
-	 * For small numbers of pages we should charge spc_random_page_cost
-	 * apiece, while if nearly all the table's pages are being read, it's more
-	 * appropriate to charge spc_seq_page_cost apiece.  The effect is
-	 * nonlinear, too. For lack of a better idea, interpolate like this to
-	 * determine the cost per page.
-	 */
-	if (pages_fetched >= 2.0)
-		cost_per_page = spc_random_page_cost -
-			(spc_random_page_cost - spc_seq_page_cost)
-			* sqrt(pages_fetched / T);
-	else
-		cost_per_page = spc_random_page_cost;
-
-	run_cost += pages_fetched * cost_per_page;
+	run_cost += ordered_page_read_cost(pages_fetched,
+	   baserel->pages,
+	   spc_seq_page_cost,
+	   spc_random_page_cost);
 
 	/*
 	 * Estimate CPU costs per tuple.
@@ -5183,3 +5189,34 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
 
 	return pages_fetched;
 }
+
+
+/*
+ * Estimate the total cost of reading possibly nonconsecutive pages from the
+ * heap in order.
+ */
+static double
+ordered_page_read_cost(double pages_fetched, int baserel_pages,
+	   double spc_seq_page_cost, double spc_random_page_cost)
+{
+	double cost_per_page;
+	double T;
+
+	T = (baserel_pages > 1) ? (double) baserel_pages : 1.0;
+
+	/*
+	 * For small numbers of pages we should charge spc_random_page_cost
+	 * apiece, while if nearly all the table's pages are being read, it's more
+	 * appropriate to charge spc_seq_page_cost apiece.  The effect is
+	 * nonlinear, too. For lack of a better idea, interpolate like this to
+	 * determine the cost per page.
+	 */
+	if (pages_fetched >= 2.0)
+		cost_per_page = spc_random_page_cost -
+			(spc_random_page_cost - spc_seq_page_cost)
+			* sqrt(pages_fetched / T);
+	else
+		cost_per_page = spc_random_page_cost;
+
+	return pages_fetched * cost_per_page;
+}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-27 Thread Michael Malis
(Sorry David. I initially replied only to you)

Ok. I've attached a patch of a proof-of-concept. I have a few
questions about tests.

What is typical workflow to add tests for changes to the planner? Also
I ran make check and it appears one of the existing tests is failing.
What is a typical way for going about discovering why the query plan
for a specific query changed? Also, how should I go about changing the
old test? Should I replace the old test output with the new test
output or modify the old test slightly to get it to produce the same
case as before?

Thanks,
Michael
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..58224e6 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
***
*** 163,169  static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
  static double relation_byte_size(double tuples, int width);
  static double page_size(double tuples, int width);
  static double get_parallel_divisor(Path *path);
! 
  
  /*
   * clamp_row_est
--- 163,172 
  static double relation_byte_size(double tuples, int width);
  static double page_size(double tuples, int width);
  static double get_parallel_divisor(Path *path);
! static double ordered_page_read_cost(double pages_fetched,
! 	   int baserel_pages,
! 	   double spc_seq_page_cost,
! 	   double spc_random_page_cost);
  
  /*
   * clamp_row_est
***
*** 652,660  cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
  
  		if (pages_fetched > 0)
  		{
! 			min_IO_cost = spc_random_page_cost;
! 			if (pages_fetched > 1)
! min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
  		}
  		else
  			min_IO_cost = 0;
--- 655,680 
  
  		if (pages_fetched > 0)
  		{
! 			if (index->indpred == NIL)
! 			{
! min_IO_cost = spc_random_page_cost;
! if (pages_fetched > 1)
! 	min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
! 			}
! 			else
! 			{
! /*
!  * For a partial index perfectly correlated with the table
!  * ordering, consecutive pages fetched are not guarenteed to
!  * be adjacent in the table. Instead use
!  * ordered_page_read_cost to estimate the cost of reading
!  * pages from the heap.
!  */
! min_IO_cost = ordered_page_read_cost(pages_fetched,
! 	 baserel->pages,
! 	 spc_seq_page_cost,
! 	 spc_seq_page_cost);
! 			}
  		}
  		else
  			min_IO_cost = 0;
***
*** 934,946  cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
  	Cost		indexTotalCost;
  	QualCost	qpqual_cost;
  	Cost		cpu_per_tuple;
- 	Cost		cost_per_page;
  	Cost		cpu_run_cost;
  	double		tuples_fetched;
  	double		pages_fetched;
  	double		spc_seq_page_cost,
  spc_random_page_cost;
- 	double		T;
  
  	/* Should only be applied to base relations */
  	Assert(IsA(baserel, RelOptInfo));
--- 954,964 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-27 Thread David Fetter
On Sat, Aug 26, 2017 at 05:50:26PM -0700, Michael Malis wrote:
> Do you think this is a reasonable approach? Should I start working
> on a patch based on the solution I described or is there some other
> approach I should look into?

You'll get more traction with a proof-of-concept patch accompanying
the plan than without.  Don't bother with any level of care past
proof-of-concept until you get positive feedback.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-26 Thread Michael Malis
Do you think this is a reasonable approach? Should I start working on
a patch based on the solution I described or is there some other
approach I should look into?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers