Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Fri, Apr 08, 2005 at 20:50:09 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 Comments?  Anyone see anything I missed?

It should be possible to make this work for bool_and and bool_or as those
are equivalent to min and max for the boolean type.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Fri, Apr 08, 2005 at 20:50:09 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
   SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
  (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);
 
 Comments?  Anyone see anything I missed?

Are NULLs a problem? In the second case above, wouldn't you get NULL
instead of the value returned by max if there were some NULL and some
not NULL values?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Fri, Apr 08, 2005 at 20:50:09 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 Comments?  Anyone see anything I missed?

Thinking about the case for NULLs some more, I am wondering if you are
going to treat aggregates with strict state functions different than
those that don't? It seems for ones with strict state functions you need
to not include NULL values when doing using ORDER BY. For aggregates
that aren't strict it may be possible that it is desired that NULL
be returned if there is a NULL value in one of the rows.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Mark Kirkwood
Looks great! I had been slowly thinking along similar lines via the 
equivalence:

SELECT min(x) FROM tab WHERE...
SELECT min(x) FROM (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1) AS t
However, it looks like your approach is more flexible than this :-)
best wishes
Mark
Tom Lane wrote:
I realized today that this may not be as hard as I thought.
Specifically, I'm imagining that we could convert
SELECT min(x), max(y) FROM tab WHERE ...
into sub-selects in a one-row outer query:
SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
   (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);
Breaking it down like that means we can consider each aggregate
independently, which definitely simplifies matters.  
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
   Tom Lane [EMAIL PROTECTED] wrote:
 SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
 (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);

 Are NULLs a problem? In the second case above, wouldn't you get NULL
 instead of the value returned by max if there were some NULL and some
 not NULL values?

Hmm ... we might have to hack the LIMIT step to skip over NULLs.
Doesn't seem like an insurmountable issue though.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 Thinking about the case for NULLs some more, I am wondering if you are
 going to treat aggregates with strict state functions different than
 those that don't?

We only intend this to support MAX and MIN.  If you're inventing an
aggregate that doesn't play exactly by those rules, you don't get to
take advantage of the optimization.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 It should be possible to make this work for bool_and and bool_or as those
 are equivalent to min and max for the boolean type.

This would just be a matter of marking them properly in the catalogs.

However, are they really equivalent in the corner cases?  In particular,
I think boolean AND across zero input rows is probably supposed to
return TRUE, not NULL.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Fri, Apr 08, 2005 at 23:40:28 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 Bruno Wolff III [EMAIL PROTECTED] writes:
  It should be possible to make this work for bool_and and bool_or as those
  are equivalent to min and max for the boolean type.
 
 This would just be a matter of marking them properly in the catalogs.
 
 However, are they really equivalent in the corner cases?  In particular,
 I think boolean AND across zero input rows is probably supposed to
 return TRUE, not NULL.

I am not sure what the spec says, but according to how the seem to work,
the answer appears to be that they are equivalent.

area= select bool_and(true) where false;
 bool_and
--

(1 row)

area= select bool_or(true) where false;
 bool_or
-

(1 row)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Neil Conway
Tom Lane wrote:
Specifically, I'm imagining that we could convert
SELECT min(x), max(y) FROM tab WHERE ...
into sub-selects in a one-row outer query:
SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
   (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);
Does this transformation work for a query of the form:
SELECT min(x), max(y) FROM tab WHERE random()  0.5;
(which isn't a very useful query, but I'm sure you can imagine a more 
realistic example involving volatile functions in the WHERE clause.)

-Neil
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Does this transformation work for a query of the form:
  SELECT min(x), max(y) FROM tab WHERE random()  0.5;

I've been going back and forth on that.  We wouldn't lose a lot in the
real world if we simply abandoned the optimization attempt whenever we
find any volatile functions in WHERE.  OTOH you could also argue that
we have never guaranteed that volatile functions in WHERE would be
evaluated at every table row --- consider something like
SELECT ... WHERE x  42 AND random()  0.5;
All that this optimization might do is to further cut the fraction of
table rows at which the volatile function actually gets checked.  So
I'm not seeing that it would break any code that worked reliably before.

Still, if it makes you feel at all uncomfortable, we can just refuse
the optimization in such cases.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Neil Conway
Tom Lane wrote:
All that this optimization might do is to further cut the fraction of
table rows at which the volatile function actually gets checked.  So
I'm not seeing that it would break any code that worked reliably before.
Hmm; what about
SELECT min(x), min(x) FROM tab WHERE random()  0.5;
Applying the optimization would mean the two min(x) expressions would 
likely be different, which seems rather weird.

Still, if it makes you feel at all uncomfortable, we can just refuse
the optimization in such cases.
I'd say that's probably safest.
-Neil
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Hmm; what about
  SELECT min(x), min(x) FROM tab WHERE random()  0.5;
 Applying the optimization would mean the two min(x) expressions would 
 likely be different, which seems rather weird.

Actually not: my expectation is that identical aggregate calls will get
folded into one subplan.  This is what happens now (when you call min(x)
twice it's computed just once) and the subplan mechanism already has the
infrastructure needed to let us keep doing it.  I feel we need to be
able to do this in order to justify the assumption that evaluating each
aggregate separately is OK from the efficiency standpoint.

 Still, if it makes you feel at all uncomfortable, we can just refuse
 the optimization in such cases.

 I'd say that's probably safest.

I don't have a problem with that, but I haven't quite convinced myself
that we need to expend the cycles to check for it, either ...

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Sat, Apr 09, 2005 at 00:57:11 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 I don't have a problem with that, but I haven't quite convinced myself
 that we need to expend the cycles to check for it, either ...

You could have two different aggregates and end up with values
that could happen if the same set of rows was used to evaluate both.
I would expect that the sequential plan would be better for a volatile
where clause since you are going to execute it for every row anyway.
So disabling the optimization in that case isn't normally going to
slow things down.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote:
 I don't have a problem with that, but I haven't quite convinced myself
 that we need to expend the cycles to check for it, either ...

 I would expect that the sequential plan would be better for a volatile
 where clause since you are going to execute it for every row anyway.

Well, no, wait a minute.  We have never promised that we would
physically evaluate every volatile function at every table row.
What we promise is that we do not assume-without-proof that the
function's value will be the same at every table row.  I don't see
where this optimization breaks that promise.

Obviously, we do make such an assumption for WHERE clauses that actually
get taken into the indexscan condition.  But we already check volatility
before considering a clause as a possible indexscan condition.  The
question here is whether we have to reject the optimization if there are
additional WHERE clauses, not directly related to the proposed
indexscan, that contain volatile functions.  I'm not seeing the argument
that says we must do that.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org