Re: [HACKERS] fool-toleranced optimizer

2005-03-23 Thread Bruce Momjian

Added to TODO:

* Add GUC to issue notice about queries that use unjoined tables


---

Kevin Brown wrote:
 Greg Stark wrote:
  
  Kevin Brown [EMAIL PROTECTED] writes:
  
   Hence, it makes sense to go ahead and run the query, but issue a
   warning at the very beginning, e.g. WARNING: query JOINs tables list
   of tables without otherwise referencing or making use of those
   tables.  This may cause excessively poor performance of the query.
  
  Well the problem with a warning is what if it *is* intentional? It's
  not ok to fill my logs up with warnings for every time the query is
  executed. That just forces me to turn off warnings.
 
 WARNING is probably the wrong level (I wasn't thinking in terms of PG
 logging, though I probably should have been).  What about NOTICE?
 Basically, you want something that will alert the interactive user
 that what they're doing is likely to be stupid, but at the same time
 won't be a burden on the system or the DBA...
 
  It would be ok to have an option to block cartesian joins entirely. I might
  even choose to run with that enabled normally. I can always disable it for
  queries I know need cartesion joins.
 
 Which wouldn't work all that well for people who are trying to write
 their software in a reasonably portable fashion, unfortunately.
 However, the number of people who care would now be much smaller.
 
  For that matter, I wonder whether it's time to consider an option to
  disable implicit (ie, pre-ansi join syntax) joins entirely. It seems
  like lots of shops are likely imposing coding standards that require
  ansi join syntax anyways. In environments like that you would expect
  a CROSS JOIN b not just select * from a,b anyways.
  
  Shops like that might appreciate the ability to enforce a blanket
  coding standard on that point and get protection from accidental
  cartesian joins as a side benefit.
 
 That could be handy, but of course it should default to off, which
 with respect to cross joins would unfortunately wind up benefitting
 only those people who already are potentially aware of the issue and
 care about it (or, at least, those people who have DBAs that care
 about it).
 
 
 
 -- 
 Kevin Brown [EMAIL PROTECTED]
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] fool-toleranced optimizer

2005-03-11 Thread Richard Huxton
Greg Stark wrote:
Kevin Brown [EMAIL PROTECTED] writes:

Hence, it makes sense to go ahead and run the query, but issue a
warning at the very beginning, e.g. WARNING: query JOINs tables list
of tables without otherwise referencing or making use of those
tables.  This may cause excessively poor performance of the query.

Well the problem with a warning is what if it *is* intentional? It's not ok to
fill my logs up with warnings for every time the query is executed. That just
forces me to turn off warnings.
It would be ok to have an option to block cartesian joins entirely. I might
even choose to run with that enabled normally. I can always disable it for
queries I know need cartesion joins.
I'm not sure the cartesian join is the problem - it's the explosion in 
number of rows. Which suggests you want something analogous to 
statement_timeout. Perhaps something like:
  statement_max_select_rows = 0  # 0=disabled
  statement_max_update_rows = 0  # applies to insert/delete too

That has the bonus of letting you set statement_max_update_rows=1 in an 
interactive session and catching WHERE clause typos.

On the down-side, it means 2 more GUC variables and I'm not sure how 
practical/efficient it is to detect a resultset growing beyond that size.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [HACKERS] fool-toleranced optimizer

2005-03-10 Thread Kevin Brown
Greg Stark wrote:
 
 Kevin Brown [EMAIL PROTECTED] writes:
 
  Hence, it makes sense to go ahead and run the query, but issue a
  warning at the very beginning, e.g. WARNING: query JOINs tables list
  of tables without otherwise referencing or making use of those
  tables.  This may cause excessively poor performance of the query.
 
 Well the problem with a warning is what if it *is* intentional? It's
 not ok to fill my logs up with warnings for every time the query is
 executed. That just forces me to turn off warnings.

WARNING is probably the wrong level (I wasn't thinking in terms of PG
logging, though I probably should have been).  What about NOTICE?
Basically, you want something that will alert the interactive user
that what they're doing is likely to be stupid, but at the same time
won't be a burden on the system or the DBA...

 It would be ok to have an option to block cartesian joins entirely. I might
 even choose to run with that enabled normally. I can always disable it for
 queries I know need cartesion joins.

Which wouldn't work all that well for people who are trying to write
their software in a reasonably portable fashion, unfortunately.
However, the number of people who care would now be much smaller.

 For that matter, I wonder whether it's time to consider an option to
 disable implicit (ie, pre-ansi join syntax) joins entirely. It seems
 like lots of shops are likely imposing coding standards that require
 ansi join syntax anyways. In environments like that you would expect
 a CROSS JOIN b not just select * from a,b anyways.
 
 Shops like that might appreciate the ability to enforce a blanket
 coding standard on that point and get protection from accidental
 cartesian joins as a side benefit.

That could be handy, but of course it should default to off, which
with respect to cross joins would unfortunately wind up benefitting
only those people who already are potentially aware of the issue and
care about it (or, at least, those people who have DBAs that care
about it).



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] fool-toleranced optimizer

2005-03-09 Thread Simon Riggs
On Wed, 2005-03-09 at 11:02 +1100, Neil Conway wrote:
 Oleg Bartunov wrote:
  I just noticed a little optimizer problem - in second query there is
  unused 'tycho t2' table alias which gets backend buried.
 
 It's not an unused table alias, it is specifying the cartesian product 
 of `tycho' with itself. I don't see how this is an optimizer problem: 
 it's a perfectly legitimate query, albeit one that is unlikely to 
 execute very quickly.

Turn this thought around a bit and the request makes sense.

Oleg is saying that the optimizer doesn't protect against foolish SQL
requests. His query is an example of a foolishly written query.

It is reasonably common SQL mistake to inadvertently request a cartesian
product join, when that was not actually desired. This is mostly
prevalent in Data Warehouse situations where people are attempting to
request complex result sets.

It seems a reasonable that there might be a GUC such as 
enable_cartesian = on (by default)

If an admin felt that this was a problem, they could enable it for their
novice users only, or perhaps across the whole system.

If enable_cartesian = off, then queries with cartesian product joins
would be made to fail. Which should be easy to detect in early stages of
optimization.

So, Oleg, for me, the request makes sense, though somebody would need to
code it...

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] fool-toleranced optimizer

2005-03-09 Thread Neil Conway
Simon Riggs wrote:
Oleg is saying that the optimizer doesn't protect against foolish SQL
requests. His query is an example of a foolishly written query.
IMHO calling this a foolishly written query is completely arbitrary. I 
can imagine plenty of applications for which a cartesian join makes 
sense. In this case the user didn't write the query they meant to write 
-- but it is surely hopeless to prevent that in the general case :)

It seems a reasonable that there might be a GUC such as 
enable_cartesian = on (by default)
I think the bar for adding a new GUC ought to be significantly higher 
than that.

In any case, when this problem does occur, it is obvious to the user 
that something is wrong, and no harm is done. Given a complex SQL query, 
it might take a bit of examination to determine which join clause is 
missing -- but the proper way to fix that is better query visualization 
tools (perhaps similar RH's Visual Explain, for example). This would 
solve the general problem: the user didn't write the query they 
intended to write, rather than a very narrow subset (the user forgot a 
join clause and accidentally computed a cartesian product).

-Neil
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] fool-toleranced optimizer

2005-03-09 Thread Simon Riggs
Oleg, this idea doesn't seem destine for greatness, so it might be worth
adding that you can avoid the general case problem of incorrectly-
specified-but-long-running query by using statement_timeout...

On Wed, 2005-03-09 at 22:38 +1100, Neil Conway wrote:
 Simon Riggs wrote:
  Oleg is saying that the optimizer doesn't protect against foolish SQL
  requests. His query is an example of a foolishly written query.
 
 IMHO calling this a foolishly written query is completely arbitrary. 

Well, in this case foolish is defined by the person that wrote the
query, as an expression of regret.

 I 
 can imagine plenty of applications for which a cartesian join makes 
 sense. 

Yes, which is why I discussed using a GUC, set only by those people who
want to be protected *from themselves*. It's a safety harness that you
could choose to put on if you wished.

 In this case the user didn't write the query they meant to write 
 -- but it is surely hopeless to prevent that in the general case :)
 
  It seems a reasonable that there might be a GUC such as 
  enable_cartesian = on (by default)
 
 I think the bar for adding a new GUC ought to be significantly higher 
 than that.

Well, the point is moot until somebody writes the rest of the code
anyhow. So, add it to the ideas shelf...

 In any case, when this problem does occur, it is obvious to the user 
 that something is wrong, and no harm is done. Given a complex SQL query, 
 it might take a bit of examination to determine which join clause is 
 missing -- but the proper way to fix that is better query visualization 
 tools (perhaps similar RH's Visual Explain, for example). This would 
 solve the general problem: the user didn't write the query they 
 intended to write, rather than a very narrow subset (the user forgot a 
 join clause and accidentally computed a cartesian product).

This issue only occurs when using SQL as the user interface language,
which is common when using a database in iterative or exploratory mode
e.g. Data Warehousing. If you are using more advanced BI tools then they
seldom get the SQL wrong.

This is not useful in a situation where people are writing SQL for a
more static application.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] fool-toleranced optimizer

2005-03-09 Thread Oleg Bartunov
On Wed, 9 Mar 2005, Simon Riggs wrote:
Oleg, this idea doesn't seem destine for greatness, so it might be worth
adding that you can avoid the general case problem of incorrectly-
specified-but-long-running query by using statement_timeout...
I have no problem with that ! I just wanted to take a note of such
could be mistaken errors.

On Wed, 2005-03-09 at 22:38 +1100, Neil Conway wrote:
Simon Riggs wrote:
Oleg is saying that the optimizer doesn't protect against foolish SQL
requests. His query is an example of a foolishly written query.
IMHO calling this a foolishly written query is completely arbitrary.
Well, in this case foolish is defined by the person that wrote the
query, as an expression of regret.
I
can imagine plenty of applications for which a cartesian join makes
sense.
Yes, which is why I discussed using a GUC, set only by those people who
want to be protected *from themselves*. It's a safety harness that you
could choose to put on if you wished.
In this case the user didn't write the query they meant to write
-- but it is surely hopeless to prevent that in the general case :)
It seems a reasonable that there might be a GUC such as
enable_cartesian = on (by default)
I think the bar for adding a new GUC ought to be significantly higher
than that.
Well, the point is moot until somebody writes the rest of the code
anyhow. So, add it to the ideas shelf...
In any case, when this problem does occur, it is obvious to the user
that something is wrong, and no harm is done. Given a complex SQL query,
it might take a bit of examination to determine which join clause is
missing -- but the proper way to fix that is better query visualization
tools (perhaps similar RH's Visual Explain, for example). This would
solve the general problem: the user didn't write the query they
intended to write, rather than a very narrow subset (the user forgot a
join clause and accidentally computed a cartesian product).
This issue only occurs when using SQL as the user interface language,
which is common when using a database in iterative or exploratory mode
e.g. Data Warehousing. If you are using more advanced BI tools then they
seldom get the SQL wrong.
This is not useful in a situation where people are writing SQL for a
more static application.
Best Regards, Simon Riggs
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] fool-toleranced optimizer

2005-03-09 Thread Greg Stark

Neil Conway [EMAIL PROTECTED] writes:

 In any case, when this problem does occur, it is obvious to the user that
 something is wrong, and no harm is done. 

I don't see why you say that. The whole complaint here is that it's *not*
obvious something is wrong and there *is* damage until it's realized.

If I run a query like this on a busy database backing a web site it could
easily kill the web site.

Or if I start this query and expect it to take an hour then after 2-3 hours
when I finally get suspicious I've just wasted 2-3 hours...

Or if I add it to the list of nightly jobs I could lose all the other jobs
that night that are preempted by this heavy query running for too long.

 Given a complex SQL query, it might take a bit of examination to determine
 which join clause is missing -- but the proper way to fix that is better
 query visualization tools (perhaps similar RH's Visual Explain, for
 example). This would solve the general problem: the user didn't write the
 query they intended to write, rather than a very narrow subset (the user
 forgot a join clause and accidentally computed a cartesian product).

I'm unconvinced any tool can make humans infallible. 

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] fool-toleranced optimizer

2005-03-09 Thread Josh Berkus
Simon, Neil, all:

 IMHO calling this a foolishly written query is completely arbitrary. I
 can imagine plenty of applications for which a cartesian join makes
 sense. In this case the user didn't write the query they meant to write
 -- but it is surely hopeless to prevent that in the general case :)

Hey, this reminds me, it's about time for us to set ADD_MISSING_FROM=FALSE as 
the default, for 8.1, yes?   When we added the option in 7.4, it was with the 
expectation of changing the default.

The reason this is relevant is Missing FROM Clause is a frequent cause of 
cartesian joins, because a user mixed up their table aliases.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] fool-toleranced optimizer

2005-03-09 Thread Kevin Brown
Neil Conway wrote:
 Simon Riggs wrote:
 Oleg is saying that the optimizer doesn't protect against foolish SQL
 requests. His query is an example of a foolishly written query.
 
 IMHO calling this a foolishly written query is completely arbitrary. I 
 can imagine plenty of applications for which a cartesian join makes 
 sense. In this case the user didn't write the query they meant to write 
 -- but it is surely hopeless to prevent that in the general case :)

Sure, but this case, at least, is (hopefully) easily detectable (as
such things go), has a high cost when it occurs, and is *usually* not
what the user intended.

Hence, it makes sense to go ahead and run the query, but issue a
warning at the very beginning, e.g. WARNING: query JOINs tables list
of tables without otherwise referencing or making use of those
tables.  This may cause excessively poor performance of the query.


That said, the real question is whether or not it's worth putting in
the effort to detect this condition and issue the warning.  I'd say
probably not, but if this is a big enough itch for someone then why
should we discourage them from coding up a fix?



-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] fool-toleranced optimizer

2005-03-09 Thread Greg Stark

Kevin Brown [EMAIL PROTECTED] writes:

 Hence, it makes sense to go ahead and run the query, but issue a
 warning at the very beginning, e.g. WARNING: query JOINs tables list
 of tables without otherwise referencing or making use of those
 tables.  This may cause excessively poor performance of the query.

Well the problem with a warning is what if it *is* intentional? It's not ok to
fill my logs up with warnings for every time the query is executed. That just
forces me to turn off warnings.

It would be ok to have an option to block cartesian joins entirely. I might
even choose to run with that enabled normally. I can always disable it for
queries I know need cartesion joins.

But outputing a warning and then continuing on to destroy performance just
gets the worst of both worlds.

For that matter, I wonder whether it's time to consider an option to disable
implicit (ie, pre-ansi join syntax) joins entirely. It seems like lots of
shops are likely imposing coding standards that require ansi join syntax
anyways. In environments like that you would expect a CROSS JOIN b not just
select * from a,b anyways. 

Shops like that might appreciate the ability to enforce a blanket coding
standard on that point and get protection from accidental cartesian joins as a
side benefit.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] fool-toleranced optimizer

2005-03-08 Thread Oleg Bartunov
Hi there,
I just noticed a little optimizer problem - in second query there is
unused 'tycho t2' table alias which gets backend buried. This is 
artificial query, I just tried to check if optimizier could recognize
this.

tycho=# explain analyze select t.pm_ra,t.pm_dec from tycho t where t.pm_ra  20.2 and t.pm_ra 18;
 QUERY PLAN 
-
 Index Scan using pm_ra_idx on tycho t  (cost=0.00..9821.83 rows=2613 width=8) (actual time=0.061..12.518 rows=1466 loops=1)
   Index Cond: ((pm_ra  20.2::double precision) AND (pm_ra  18::double precision))
 Total runtime: 14.726 ms
(3 rows)

tycho=# explain analyze select t.pm_ra,t.pm_dec from tycho t, tycho t2 where t.pm_ra 
 20.2 and t.pm_ra 18;
...
It's doing Nested Loop, probably, so I didn' wait until it completed

Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] fool-toleranced optimizer

2005-03-08 Thread Neil Conway
Oleg Bartunov wrote:
I just noticed a little optimizer problem - in second query there is
unused 'tycho t2' table alias which gets backend buried.
It's not an unused table alias, it is specifying the cartesian product 
of `tycho' with itself. I don't see how this is an optimizer problem: 
it's a perfectly legitimate query, albeit one that is unlikely to 
execute very quickly.

-Neil
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster