Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Matthew Bellew
This looks like the same optimizer problem that occasionally plagues our
customers.  Whenever the estimated rows of a join==1, but the actual rows
is higher, the optimizer may choose very poor plans.  I made some attempts
to fix.  The very simple fix is to never estimate 1 for a join result.
Even using 2 works remarkably well as a defense against this problem.


https://github.com/labkey-matthewb/postgres/commit/b1fd99f4deffbbf3db2172ccaba51a34f18d1b1a

I also made a much more correct but complicated patch to track both
uniqueness and selectivity thought the optimizer, but I didn't quite push
that over the finish line (I made a mistake in the hash join code, and got
distracted by my day job before finishing it).

  https://github.com/labkey-matthewb/postgres/commits/struct_selectivity

The second path is certainly better approach, but needs someone to pick up
the mission.

Matt

On Wed, Apr 26, 2017 at 8:00 AM, Gerardo Herzig  wrote:

> Some other approaches you could try:
>
> 1) What about an hashed index? You could make
> CREATE INDEX ON FIELD (unit_id, hashtext(field_name))
>
> and changing your query accordingly
>
> "where hashtext(FIELD.FIELD_NAME)=hashtext('SHEETS_PRESENT') "
>
> 2) Partitioning (not native yet, but can be simulated through
> inheritance), like in
> https://www.postgresql.org/docs/current/static/ddl-partitioning.html
> This could work well if you have a sort of limited different values in
> FIELD.FIELD_NAME
>
> Gerardo
>
> - Mensaje original -
> > De: "Alessandro Ferrucci" 
> > Para: pgsql-performance@postgresql.org
> > Enviados: MiƩrcoles, 26 de Abril 2017 0:19:37
> > Asunto: Re: [PERFORM] Slow query with 3 table joins
> >
> >
> >
> > After about 40 inutes the slow query finally finished and the result
> > of the EXPLAIN plan can be found here:
> >
> >
> > https://explain.depesz.com/s/BX22
> >
> >
> > Thanks,
> > Alessandro Ferrucci
> >
> >
> > On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci <
> > alessandroferru...@gmail.com > wrote:
> >
> >
> >
> >
> > Hello - I am migrating a current system to PostgreSQL and I am having
> > an issue with a relatively straightforward query being extremely
> > slow.
> >
> >
> > The following are the definitions of the tables:
> >
> >
> > CREATE TABLE popt_2017.unit
> > (
> > id serial NOT NULL,
> > unit_id text,
> > batch_id text,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT unit_pkey PRIMARY KEY (id)
> > )
> > WITH (
> > OIDS=FALSE
> > );
> >
> >
> > CREATE TABLE popt_2017.field
> > (
> > id serial NOT NULL,
> > unit_id integer,
> > subunit_data_id integer,
> > field_name character varying(50),
> > page_id character varying(20),
> > page_type character varying(20),
> > batch_id character varying(20),
> > file_name character varying(20),
> > data_concept integer,
> > "GROUP" integer,
> > omr_group integer,
> > pres integer,
> > reg_data text,
> > ocr_conf text,
> > ocr_dict text,
> > ocr_phon text,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT field_pkey PRIMARY KEY (id),
> > CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id)
> > REFERENCES popt_2017.subunit (id) MATCH SIMPLE
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id)
> > REFERENCES popt_2017.unit (id) MATCH FULL
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id)
> > REFERENCES popt_2017.unit (id) MATCH SIMPLE
> > ON UPDATE NO ACTION ON DELETE NO ACTION
> > )
> > WITH (
> > OIDS=FALSE
> > );
> >
> >
> > CREATE TABLE popt_2017.answer
> > (
> > id serial NOT NULL,
> > field_id integer,
> > ans_status integer,
> > ans text,
> > luggage text,
> > arec text,
> > kfi_partition integer,
> > final boolean,
> > length integer,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT answer_pkey PRIMARY KEY (id),
> > CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id)
> > REFERENCES popt_2017.field (id) MATCH FULL
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id)
> > REFERENCES popt_2017.field (id) MATCH SIMPLE
> > ON UPDATE NO ACTION ON DELETE NO ACTION
> > )
> > WITH (
> > OIDS=FALSE
> > );
> >
> >
> > Below are the index definitions for those tables:
> >
> >
> > UNIT:
> > CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id);
> > CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id);
> >
> >
> > FIELD:
> > CREATE UNIQUE INDEX field_pkey ON field USING btree (id)
> > CREATE INDEX field_unit_id_idx ON field USING btree (unit_id)
> > CREATE INDEX field_subunit_id_idx ON field USING btree
> > (subunit_data_id)
> > CREATE INDEX field_field_name_idx ON field USING btree (field_name)
> >
> >
> > ANSWER:
> > CREATE UNIQUE INDEX answer_pkey ON answer 

[PERFORM] Query optimizer plans with very small selectivity estimates

2015-10-29 Thread Matthew Bellew
This related to a post in the general bugs forum, but I found this forum,
and
 this seems more appropriate.  This is my second attempt to post, I believe
the first attempt last week did not work, apologies if I'm duplicating.

http://comments.gmane.org/gmane.comp.db.postgresql.bugs/39011

I made have several users encounter performance problems, which all
seem to come down to this problem: multiplying selectivity estimates can
cause tuple estimates to grow very small very quickly, once the estimator
gets to 1 row, the planner may choose plans that are very good ONLY WHEN
there is exactly 1 row (maybe even O(N^large)).  Unfortunately, these may
be the worst plans if the estimate is even slightly off (even just
returning
2 or 3 rows versus 1).

Using the patch below, I discovered that clamping relation tuple estimates
to a number as small as 2 seemed to avoid all the catastrophic query
plans.

In the scenarios I'm seeing, I have several examples of queries
that take >1m to run that should run in <1s. The estimate of 1 row
(versus thousands actual) leads the planner to tee up several nest loop
joins
which causes thousands of table scans.

I have been working on a more complete which tracks uniqueness along
with selectivity so that optimizer can benefit from knowing when a
relation must have 1 (or fewer) tuples, while clamping all other relations
to 2 rather than 1.

typedef struct
{
double selectivity;
boolean unique;
} Selectivity;

I am interested in hearing discussion about this problem, and if the
community
is open to a patch if I continue pursuing the development.

Matt



FIRST ARTIFACT

plan with expensive (80s join) and join estimate of 1
note the first Nested Loop join and 81s join
(I gave up trying to post the full explain, because of the 80 char limit)

"Sort  (cost=7000.04..7000.04 rows=1 width=49)
(actual time=81739.426..81740.023 rows=5091 loops=1)"
"  Sort Key: c.ten DESC"
"  Sort Method: quicksort  Memory: 948kB"
"  CTE cte"
"->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=4)
(actual time=0.001..0.001 rows=5 loops=1)"
"  ->  Nested Loop  (cost=1.36..6999.97 rows=1 width=49)
(actual time=0.059..81725.475 rows=5091 loops=1)"
"Planning time: 1.912 ms"
"Execution time: 81740.328 ms"


SECOND ARTIFACT

force join row estimate to be minimun of 2
query completes very quickly

"Sort  (cost=7000.06..7000.06 rows=2 width=49)
(actual time=84.610..85.192 rows=5142 loops=1)"
"  Sort Key: c.ten DESC"
"  Sort Method: quicksort  Memory: 956kB"
"  CTE cte"
"->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=4)
(actual time=0.002..0.003 rows=5 loops=1)"
"  ->  Hash Join  (cost=2518.99..6999.98 rows=2 width=49)
(actual time=17.629..82.886 rows=5142 loops=1)"

"Planning time: 2.982 ms"
"Execution time: 85.514 ms"


THIRD ARTIFACT

patch I used to make experimenting easier w/o recompiling


index 1b61fd9..444703c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -68,6 +68,12 @@
  *-
  */

+
+
+/* These parameters are set by GUC */
+int join_row_estimate_clamp=1;
+
+
 #include "postgres.h"

 #ifdef _MSC_VER
@@ -175,6 +181,17 @@ clamp_row_est(double nrows)
 }


+double
+clamp_join_row_est(double nrows)
+{
+ nrows = clamp_row_est(nrows);
+ if (nrows >= (double)join_row_estimate_clamp)
+ return nrows;
+return (double)join_row_estimate_clamp;
+}
+
+
+
 /*
  * cost_seqscan
  *  Determines and returns the cost of scanning a relation sequentially.
@@ -3886,7 +3903,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,
  break;
  }

- return clamp_row_est(nrows);
+ return clamp_join_row_est(nrows);
 }

 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 71090f2..fabb8ac 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2664,6 +2664,16 @@ static struct config_int ConfigureNamesInt[] =
  NULL, NULL, NULL
  },

+ {
+ {"join_row_estimate_clamp", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Set the minimum estimated size of a join result."),
+NULL
+ },
+ _row_estimate_clamp,
+ 1, 1, 1,
+ NULL, NULL, NULL
+ },
+
  /* End-of-list marker */
  {
  {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 25a7303..0161c4b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,8 +67,10 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern int constraint_exclusion;
+extern int join_row_estimate_clamp;

 extern double clamp_row_est(double nrows);
+extern double clamp_join_row_est(double nrows);
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
  double index_pages, PlannerInfo *root);
 extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo
*baserel,