Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-11 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Say if cost of best plan >= N then recheck query for strangeness. If
> anything found, re-plan query.

Whatever makes you think that would be useful?

The usual result of undetected duplicate WHERE clauses is an
*underestimate* of runtime, not an overestimate (because it thinks
too few tuples will be selected).

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-11 Thread Simon Riggs
On Wed, 2008-02-06 at 11:00 -0500, Tom Lane wrote:
> Theo Kramer <[EMAIL PROTECTED]> writes:
> > On Wed, 2008-02-06 at 11:53 +, Simon Riggs wrote:
> >> Since the SQL is not your fault and difficult to control, it is an
> >> argument in favour of an optional planner mode that would perform
> >> additional checks for redundant clauses of various kinds. The default
> >> for that would be "off" since most people don't suffer from this
> >> problem. BO isn't the only SQL generating-client out there, so I think
> >> this is a fairly wide problem.

> We used to have code that removed duplicate WHERE clauses (check the
> revision history for prepqual.c).  It was taken out because it consumed
> excessive amounts of planning time without accomplishing a darn thing
> for most queries.  There is no chance that it will be put back in as the
> only behavior, or even the default behavior, but I can see the reasoning
> for offering an option as Simon suggests.

I was wondering if we might do that automatically? It seems easy to
imagine a switch, but I wonder if we'd be able to set it correctly in
enough situations to make it worthwhile.

Say if cost of best plan >= N then recheck query for strangeness. If
anything found, re-plan query.

That way we only pay the cost of checking for longer queries and we only
actually re-plan for queries that will benefit.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Tom Lane
Theo Kramer <[EMAIL PROTECTED]> writes:
> On Wed, 2008-02-06 at 11:53 +, Simon Riggs wrote:
>> Since the SQL is not your fault and difficult to control, it is an
>> argument in favour of an optional planner mode that would perform
>> additional checks for redundant clauses of various kinds. The default
>> for that would be "off" since most people don't suffer from this
>> problem. BO isn't the only SQL generating-client out there, so I think
>> this is a fairly wide problem.

> I would have to disagree. I spend a lot of time writing code that
> generates SQL from a business app and feel strongly that any
> optimisation is my responsibility.

Disagree with what?  If that's your feeling then you'd leave the setting
"off", and no harm done.

We used to have code that removed duplicate WHERE clauses (check the
revision history for prepqual.c).  It was taken out because it consumed
excessive amounts of planning time without accomplishing a darn thing
for most queries.  There is no chance that it will be put back in as the
only behavior, or even the default behavior, but I can see the reasoning
for offering an option as Simon suggests.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Roberts, Jon
> >
> >>> Since the SQL is not your fault and difficult to control, it is an
> >>> argument in favour of an optional planner mode that would perform
> >>> additional checks for redundant clauses of various kinds. The
> > default
> >>> for that would be "off" since most people don't suffer from this
> >>> problem. BO isn't the only SQL generating-client out there, so I
> > think
> >>> this is a fairly wide problem.
> >>
> >> I would have to disagree. I spend a lot of time writing code that
> >> generates SQL from a business app and feel strongly that any
> >> optimisation is my responsibility.
> >>
> >
> > The point to a BI tool like BO is to abstract the data collection
> > and do
> > it dynamically.  The SQL is built at run time because the tool is
> > designed to give the end user as much flexibility as the data
> > structure
> > allows to query the data however they want.
> >
> > It isn't feasible, possible, or recommended to rewrite all of the
> > possible generated SQL that could be designed at runtime by the
tool.
> 
> No, but it is feasible to expect the tool to generate well-formed
> queries without redundant clauses.  There are plenty that do.
> 


Agreed.


Jon

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Erik Jones


On Feb 6, 2008, at 7:35 AM, Roberts, Jon wrote:




Since the SQL is not your fault and difficult to control, it is an
argument in favour of an optional planner mode that would perform
additional checks for redundant clauses of various kinds. The

default

for that would be "off" since most people don't suffer from this
problem. BO isn't the only SQL generating-client out there, so I

think

this is a fairly wide problem.


I would have to disagree. I spend a lot of time writing code that
generates SQL from a business app and feel strongly that any
optimisation is my responsibility.



The point to a BI tool like BO is to abstract the data collection  
and do

it dynamically.  The SQL is built at run time because the tool is
designed to give the end user as much flexibility as the data  
structure

allows to query the data however they want.

It isn't feasible, possible, or recommended to rewrite all of the
possible generated SQL that could be designed at runtime by the tool.


No, but it is feasible to expect the tool to generate well-formed  
queries without redundant clauses.  There are plenty that do.


Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Roberts, Jon

> > Since the SQL is not your fault and difficult to control, it is an
> > argument in favour of an optional planner mode that would perform
> > additional checks for redundant clauses of various kinds. The
default
> > for that would be "off" since most people don't suffer from this
> > problem. BO isn't the only SQL generating-client out there, so I
think
> > this is a fairly wide problem.
> 
> I would have to disagree. I spend a lot of time writing code that
> generates SQL from a business app and feel strongly that any
> optimisation is my responsibility.
> 

The point to a BI tool like BO is to abstract the data collection and do
it dynamically.  The SQL is built at run time because the tool is
designed to give the end user as much flexibility as the data structure
allows to query the data however they want.

It isn't feasible, possible, or recommended to rewrite all of the
possible generated SQL that could be designed at runtime by the tool.  



Jon

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Theo Kramer
On Wed, 2008-02-06 at 11:53 +, Simon Riggs wrote:
> On Wed, 2008-02-06 at 09:42 +0100, SURANTYN Jean François wrote:
> 
> > That issue is very annoying because with generated SQL queries (from
> > Business Objects for example) on big tables, it is possible that some
> > queries have several times the same "where" condition ("where n=1 and
> > n=1" for example), and as the optimizer is under-estimating the number
> > of returned rows, some bad execution plans can be chosen (nested loops
> > instead of hash joins for example)
> 
> I can see the annoyance there.
> 
> There's a balance in the planner between time spent to optimize the
> query and time spent to correct mistakes. If we looked continually for
> mistakes then planning time would increase for everybody that didn't
> suffer from this problem.
> 
> Since the SQL is not your fault and difficult to control, it is an
> argument in favour of an optional planner mode that would perform
> additional checks for redundant clauses of various kinds. The default
> for that would be "off" since most people don't suffer from this
> problem. BO isn't the only SQL generating-client out there, so I think
> this is a fairly wide problem.

I would have to disagree. I spend a lot of time writing code that
generates SQL from a business app and feel strongly that any
optimisation is my responsibility.

Having to re-configure PG to switch on a planner mode, as suggested
above, to address badly generated SQL is not a good idea.

This with experience on having to talk business application developers
through re-configuring a database.

-- 
Regards
Theo


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 09:42 +0100, SURANTYN Jean François wrote:

> That issue is very annoying because with generated SQL queries (from
> Business Objects for example) on big tables, it is possible that some
> queries have several times the same "where" condition ("where n=1 and
> n=1" for example), and as the optimizer is under-estimating the number
> of returned rows, some bad execution plans can be chosen (nested loops
> instead of hash joins for example)

I can see the annoyance there.

There's a balance in the planner between time spent to optimize the
query and time spent to correct mistakes. If we looked continually for
mistakes then planning time would increase for everybody that didn't
suffer from this problem.

Since the SQL is not your fault and difficult to control, it is an
argument in favour of an optional planner mode that would perform
additional checks for redundant clauses of various kinds. The default
for that would be "off" since most people don't suffer from this
problem. BO isn't the only SQL generating-client out there, so I think
this is a fairly wide problem.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Richard Huxton
SURANTYN Jean François wrote:
> Many thanks for your quick reply
> 
> In fact, that issue comes from a recent migration from Oracle to
> Postgresql, and even if some queries were not optimized by the past
> (example: where n=1 and n=1), Oracle was able to rewrite them and to
> "hide" the bad queries". But now that we have migrated to Postgresql,
> we have discovered that some queries were indeed badly wroten I will
> tell to the developpers to try to optimize their queries for them to
> work efficiently on Postgresql

If nothing else it will help when / if you decide to use prepared
queries - there's no way to optimise "n=$1 or n=$2" at planning time.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread SURANTYN Jean François
Many thanks for your quick reply 

In fact, that issue comes from a recent migration from Oracle to Postgresql, 
and even if some queries were not optimized by the past (example: where n=1 and 
n=1), Oracle was able to rewrite them and to "hide" the bad queries". But now 
that we have migrated to Postgresql, we have discovered that some queries were 
indeed badly wroten
I will tell to the developpers to try to optimize their queries for them to 
work efficiently on Postgresql

Thanks again for your help

Regards

Jean-Francois SURANTYN


-Message d'origine-
De : Richard Huxton [mailto:[EMAIL PROTECTED] 
Envoyé : mercredi 6 février 2008 10:47
À : SURANTYN Jean François
Cc : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] Optimizer : query rewrite and execution plan ?

SURANTYN Jean François wrote:
> my_db=# explain select * from test where n = 1;

> my_db=# explain select * from test where n = 1 and n = 1;

> In the first SELECT query (with "where n=1"), the estimated number of 
> returned rows is correct (10), whereas in the second SELECT query 
> (with "where n=1 and n=1"), the estimated number of returned rows is
> 5 (instead of 10 !) So the optimizer has under-estimated the number of 
> rows returned

That's because it's a badly composed query. The planner is guessing how much 
overlap there would be between the two clauses. It's not exploring the option 
that they are the same clause repeated.

> That issue is very annoying because with generated SQL queries (from 
> Business Objects for example) on big tables, it is possible that some 
> queries have several times the same "where"
> condition ("where n=1 and n=1" for example), and as the optimizer is 
> under-estimating the number of returned rows, some bad execution plans 
> can be chosen (nested loops instead of hash joins for example)

Sounds like your query-generator needs a bit of an improvement, from my end.

> Is the estimated number of returned rows directly linked to the 
> decision of the optimizer to chose Hash Joins or Nested Loops in join 
> queries ?

Yes, well the cost determines a plan and obviously number of rows affects the 
cost.

> Is there a way for the Postgresql optimizer to be able to simplify and 
> rewrite the SQL statements before running them ?

It does, just not this one. It spots things like a=b and b=c implies a=c (for 
joins etc).

> Are
> there some parameters that could change the execution plans ?

Not really in this case.

The root of your problem is that you have a query with an irrelevant clause 
(AND n=1) and you'd like the planner to notice that it's irrelevant and remove 
it from the query. There are two problems with this:

1. It's only going to be possible in simple cases. It's unlikely the planner 
would ever spot "n=2 AND n=(10/5)"
2. Even in the simple case you're going to waste time checking *every
query* to see if clauses could be eliminated.

Is there any way to improve your query generator?

--
  Richard Huxton
  Archonet Ltd

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

Supermarchés MATCH, Société Par Actions Simplifiée au capital de 10 420 100 €, 
immatriculée au RCS de LILLE sous le Numéro B 785 480 351
Siège : 250, rue du Général de Gaulle - BP 201 - 59 561 LA MADELEINE Cedex
**


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Richard Huxton
SURANTYN Jean François wrote:
> my_db=# explain select * from test where n = 1;

> my_db=# explain select * from test where n = 1 and n = 1;

> In the first SELECT query (with "where n=1"), the estimated number of
> returned rows is correct (10), whereas in the second SELECT query
> (with "where n=1 and n=1"), the estimated number of returned rows is
> 5 (instead of 10 !) So the optimizer has under-estimated the number
> of rows returned

That's because it's a badly composed query. The planner is guessing how
much overlap there would be between the two clauses. It's not exploring
the option that they are the same clause repeated.

> That issue is very annoying because with generated
> SQL queries (from Business Objects for example) on big tables, it is
> possible that some queries have several times the same "where"
> condition ("where n=1 and n=1" for example), and as the optimizer is
> under-estimating the number of returned rows, some bad execution
> plans can be chosen (nested loops instead of hash joins for example)

Sounds like your query-generator needs a bit of an improvement, from my end.

> Is the estimated number of returned rows directly linked to the
> decision of the optimizer to chose Hash Joins or Nested Loops in join
> queries ? 

Yes, well the cost determines a plan and obviously number of rows
affects the cost.

> Is there a way for the Postgresql optimizer to be able to
> simplify and rewrite the SQL statements before running them ? 

It does, just not this one. It spots things like a=b and b=c implies a=c
(for joins etc).

> Are
> there some parameters that could change the execution plans ?

Not really in this case.

The root of your problem is that you have a query with an irrelevant
clause (AND n=1) and you'd like the planner to notice that it's
irrelevant and remove it from the query. There are two problems with this:

1. It's only going to be possible in simple cases. It's unlikely the
planner would ever spot "n=2 AND n=(10/5)"
2. Even in the simple case you're going to waste time checking *every
query* to see if clauses could be eliminated.

Is there any way to improve your query generator?

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread SURANTYN Jean François
Hi
 
I have discovered an issue on my Postgresql database recently installed : it 
seems that the optimizer can not, when possible, simplify and rewrite a simple 
query before running it. Here is a simple and reproducible example :
 
my_db=# create table test (n numeric);
CREATE
my_db=# insert into test values (1); --> run 10 times
INSERT
my_db=# insert into test values (0); --> run 10 times
INSERT
my_db=# select count(*) from test;
count
---
20
(1 row)
my_db=# vacuum full analyze test;
VACUUM
my_db=# explain select * from test where n = 1;
QUERY PLAN
--
Seq Scan on test (cost=0.00..1.25 rows=10 width=9)
Filter: (n = 1::numeric)
(2 rows)
 
my_db=# explain select * from test where n = 1 and n = 1;
QUERY PLAN
-
Seq Scan on test (cost=0.00..1.30 rows=5 width=9)
Filter: ((n = 1::numeric) AND (n = 1::numeric))
(2 rows)
 
In the first SELECT query (with "where n=1"), the estimated number of returned 
rows is correct (10), whereas in the second SELECT query (with "where n=1 and 
n=1"), the estimated number of returned rows is 5 (instead of 10 !)
So the optimizer has under-estimated the number of rows returned
That issue is very annoying because with generated SQL queries (from Business 
Objects for example) on big tables, it is possible that some queries have 
several times the same "where" condition ("where n=1 and n=1" for example), and 
as the optimizer is under-estimating the number of returned rows, some bad 
execution plans can be chosen (nested loops instead of hash joins for example)
 
Is the estimated number of returned rows directly linked to the decision of the 
optimizer to chose Hash Joins or Nested Loops in join queries ?
Is there a way for the Postgresql optimizer to be able to simplify and rewrite 
the SQL statements before running them ? Are there some parameters that could 
change the execution plans ?
 
Thanks by advance for your help
 
Jean-Francois SURANTYN
 

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

Supermarchés MATCH, Société Par Actions Simplifiée au capital de 10 420 100 €, 
immatriculée au RCS de LILLE sous le Numéro B 785 480 351
Siège : 250, rue du Général de Gaulle - BP 201 - 59 561 LA MADELEINE Cedex
**