Re: [PERFORM] Query planner wants to use seq scan

2015-10-29 Thread Alex Ignatov



On 27.10.2015 23:56, Bertrand Paquet wrote:

So,

Tonight, the index on the three field is used, may be my yesterday 
vacuum updated stats.


Thx you for your help.

Regards,

Bertrand




2015-10-27 18:33 GMT+01:00 Bertrand Paquet 
>:


Hi tom,

I did the test yesterday with an index on the three fields, and
with a partial index on organization and status and where is null
condition on handled. I saw no modification on query plan.
May be I forgot to analyze vacuum after. I will retry tonight.

I use a btree index. Is it the good solution, even with the In
clause ?

Regards,

Bertrand

Le mardi 27 octobre 2015, Tom Lane > a écrit :

Bertrand Paquet  writes:
> We have a slow query. After analyzing, the planner decision
seems to be
> discutable : the query is faster when disabling seqscan. See
below the two
> query plan, and an extract from pg_stats.

> Any idea about what to change to help the planner ?

Neither one of those plans is very good: you're just hoping
that the
Filter condition will let a tuple through sooner rather than
later.

If you care about the performance of this type of query, I'd
consider
creating an index on (organization_id, status, handled_by) so
that all
the conditions can be checked in the index.

regards, tom lane



Hello Bertrand once again!
What's your status? Does the plan changed after deploying three field 
index ?


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [PERFORM] Query planner wants to use seq scan

2015-10-29 Thread Bertrand Paquet
Yes, the three fields index AND vacuum solve the issue.

Regards,

Bertrand

2015-10-29 13:27 GMT+01:00 Alex Ignatov :

>
>
> On 27.10.2015 23:56, Bertrand Paquet wrote:
>
> So,
>
> Tonight, the index on the three field is used, may be my yesterday vacuum
> updated stats.
>
> Thx you for your help.
>
> Regards,
>
> Bertrand
>
>
>
>
> 2015-10-27 18:33 GMT+01:00 Bertrand Paquet :
>
>> Hi tom,
>>
>> I did the test yesterday with an index on the three fields, and with a
>> partial index on organization and status and where is null condition on
>> handled. I saw no modification on query plan.
>> May be I forgot to analyze vacuum after. I will retry tonight.
>>
>> I use a btree index. Is it the good solution, even with the In clause ?
>>
>> Regards,
>>
>> Bertrand
>>
>> Le mardi 27 octobre 2015, Tom Lane < t...@sss.pgh.pa.us>
>> a écrit :
>>
>>> Bertrand Paquet  writes:
>>> > We have a slow query. After analyzing, the planner decision seems to be
>>> > discutable : the query is faster when disabling seqscan. See below the
>>> two
>>> > query plan, and an extract from pg_stats.
>>>
>>> > Any idea about what to change to help the planner ?
>>>
>>> Neither one of those plans is very good: you're just hoping that the
>>> Filter condition will let a tuple through sooner rather than later.
>>>
>>> If you care about the performance of this type of query, I'd consider
>>> creating an index on (organization_id, status, handled_by) so that all
>>> the conditions can be checked in the index.
>>>
>>> regards, tom lane
>>>
>>
> Hello Bertrand once again!
> What's your status? Does the plan changed after deploying three field
> index ?
>
> --
> Alex Ignatov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>


Re: [PERFORM] Scalability to more than 64 cores With PG 9.4 and RHEL 7.1 Kernel 3.10

2015-10-29 Thread Alex Ignatov


On 28.10.2015 8:04, Javier Muro wrote:
Has PostgreSQL 9.4  official Support on a system with more than 64 
legitimate cores? ( 72 Cores , 4 CPUs Intel(R) Xeon(R) CPU E7-8890 )




The  work Robert Haas did to fix the CPU locking way back when showed
significant improvements up to 64, but so far as I know.


Thanks in Advance.

Hello Javier!
Our tests shows that PG 9.4  scales well up to 60 Intel cores. I.E 
pgbech -S and  DB on tmpfs gave us 700 000 tps. After 60 соres s_lock is 
dominating  in cpu usage%. 9.5 scales way better.


--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query optimizer plans with very small selectivity estimates

2015-10-29 Thread Matthew Bellew
This related to a post in the general bugs forum, but I found this forum,
and
 this seems more appropriate.  This is my second attempt to post, I believe
the first attempt last week did not work, apologies if I'm duplicating.

http://comments.gmane.org/gmane.comp.db.postgresql.bugs/39011

I made have several users encounter performance problems, which all
seem to come down to this problem: multiplying selectivity estimates can
cause tuple estimates to grow very small very quickly, once the estimator
gets to 1 row, the planner may choose plans that are very good ONLY WHEN
there is exactly 1 row (maybe even O(N^large)).  Unfortunately, these may
be the worst plans if the estimate is even slightly off (even just
returning
2 or 3 rows versus 1).

Using the patch below, I discovered that clamping relation tuple estimates
to a number as small as 2 seemed to avoid all the catastrophic query
plans.

In the scenarios I'm seeing, I have several examples of queries
that take >1m to run that should run in <1s. The estimate of 1 row
(versus thousands actual) leads the planner to tee up several nest loop
joins
which causes thousands of table scans.

I have been working on a more complete which tracks uniqueness along
with selectivity so that optimizer can benefit from knowing when a
relation must have 1 (or fewer) tuples, while clamping all other relations
to 2 rather than 1.

typedef struct
{
double selectivity;
boolean unique;
} Selectivity;

I am interested in hearing discussion about this problem, and if the
community
is open to a patch if I continue pursuing the development.

Matt



FIRST ARTIFACT

plan with expensive (80s join) and join estimate of 1
note the first Nested Loop join and 81s join
(I gave up trying to post the full explain, because of the 80 char limit)

"Sort  (cost=7000.04..7000.04 rows=1 width=49)
(actual time=81739.426..81740.023 rows=5091 loops=1)"
"  Sort Key: c.ten DESC"
"  Sort Method: quicksort  Memory: 948kB"
"  CTE cte"
"->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=4)
(actual time=0.001..0.001 rows=5 loops=1)"
"  ->  Nested Loop  (cost=1.36..6999.97 rows=1 width=49)
(actual time=0.059..81725.475 rows=5091 loops=1)"
"Planning time: 1.912 ms"
"Execution time: 81740.328 ms"


SECOND ARTIFACT

force join row estimate to be minimun of 2
query completes very quickly

"Sort  (cost=7000.06..7000.06 rows=2 width=49)
(actual time=84.610..85.192 rows=5142 loops=1)"
"  Sort Key: c.ten DESC"
"  Sort Method: quicksort  Memory: 956kB"
"  CTE cte"
"->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=4)
(actual time=0.002..0.003 rows=5 loops=1)"
"  ->  Hash Join  (cost=2518.99..6999.98 rows=2 width=49)
(actual time=17.629..82.886 rows=5142 loops=1)"

"Planning time: 2.982 ms"
"Execution time: 85.514 ms"


THIRD ARTIFACT

patch I used to make experimenting easier w/o recompiling


index 1b61fd9..444703c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -68,6 +68,12 @@
  *-
  */

+
+
+/* These parameters are set by GUC */
+int join_row_estimate_clamp=1;
+
+
 #include "postgres.h"

 #ifdef _MSC_VER
@@ -175,6 +181,17 @@ clamp_row_est(double nrows)
 }


+double
+clamp_join_row_est(double nrows)
+{
+ nrows = clamp_row_est(nrows);
+ if (nrows >= (double)join_row_estimate_clamp)
+ return nrows;
+return (double)join_row_estimate_clamp;
+}
+
+
+
 /*
  * cost_seqscan
  *  Determines and returns the cost of scanning a relation sequentially.
@@ -3886,7 +3903,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,
  break;
  }

- return clamp_row_est(nrows);
+ return clamp_join_row_est(nrows);
 }

 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 71090f2..fabb8ac 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2664,6 +2664,16 @@ static struct config_int ConfigureNamesInt[] =
  NULL, NULL, NULL
  },

+ {
+ {"join_row_estimate_clamp", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Set the minimum estimated size of a join result."),
+NULL
+ },
+ _row_estimate_clamp,
+ 1, 1, 1,
+ NULL, NULL, NULL
+ },
+
  /* End-of-list marker */
  {
  {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 25a7303..0161c4b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,8 +67,10 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern int constraint_exclusion;
+extern int join_row_estimate_clamp;

 extern double clamp_row_est(double nrows);
+extern double clamp_join_row_est(double nrows);
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
  double index_pages, PlannerInfo *root);
 extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo
*baserel,


Re: [PERFORM] Query optimizer plans with very small selectivity estimates

2015-10-29 Thread Tom Lane
Matthew Bellew  writes:
> I made have several users encounter performance problems, which all
> seem to come down to this problem: multiplying selectivity estimates can
> cause tuple estimates to grow very small very quickly, once the estimator
> gets to 1 row, the planner may choose plans that are very good ONLY WHEN
> there is exactly 1 row (maybe even O(N^large)).  Unfortunately, these may
> be the worst plans if the estimate is even slightly off (even just
> returning 2 or 3 rows versus 1).

Yeah, this is a well-known problem.  There has been prior discussion along
the same lines as you mention (only believe 1-row estimates when it's
provably true that there's at most one row), but it hasn't looked like an
easy change.  See the pgsql-hackers archives for previous threads.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query optimizer plans with very small selectivity estimates

2015-10-29 Thread Josh Berkus
On 10/29/2015 11:24 AM, Tom Lane wrote:
> Matthew Bellew  writes:
>> I made have several users encounter performance problems, which all
>> seem to come down to this problem: multiplying selectivity estimates can
>> cause tuple estimates to grow very small very quickly, once the estimator
>> gets to 1 row, the planner may choose plans that are very good ONLY WHEN
>> there is exactly 1 row (maybe even O(N^large)).  Unfortunately, these may
>> be the worst plans if the estimate is even slightly off (even just
>> returning 2 or 3 rows versus 1).
> 
> Yeah, this is a well-known problem.  There has been prior discussion along
> the same lines as you mention (only believe 1-row estimates when it's
> provably true that there's at most one row), but it hasn't looked like an
> easy change.  See the pgsql-hackers archives for previous threads.

Also see Tomas's correlated stats patch submitted for 9.6.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance