Re: [HACKERS] prepared statements don't log arguments?

2005-04-08 Thread Simon Riggs
On Fri, 2005-04-08 at 00:51 +0200, Palle Girgensohn wrote:
 --On torsdag, april 07, 2005 23.31.52 +0100 Simon Riggs 
 [EMAIL PROTECTED] wrote:
 
  On Wed, 2005-04-06 at 15:01 +0200, Palle Girgensohn wrote:
  I really need to know the *real* arguments...
 
  Why do you need to log the arguments as well?
 
 Debugging purposes. If I fealize there are queries hogging the server, I'd 
 like to get them from a log so I can tune the system, maybe add an index or 
 find the qurey in the src code an rephrase it. It is *very* helpful to a 
 proper set of arguments for a slow query, since another set of arguments 
 will probably give a very speedy result. I need to find the hogs, basically.

OK, thats what I hoped you'd say. With a prepared query all of the
statements execute the same plan, so you don't need to know the exact
parameters. Before v3 the whole query was logged because the statements
were not prepared and each query might have been different. That is no
longer the case.

ISTM that for analysis purposes it is helpful to know that a particular
query is being repeated. Also, if you log the actual parameters, the log
gets unusefully large very quickly.

Anyway, I have a patch that I will be able to submit shortly in this
area. No doubt it will require further discussion.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] prepared statements don't log arguments?

2005-04-08 Thread Oliver Jowett
Simon Riggs wrote:

 OK, thats what I hoped you'd say. With a prepared query all of the
 statements execute the same plan, so you don't need to know the exact
 parameters.

This isn't true in 8.0 if you are using the unnamed statement (as the
JDBC driver does in some cases): the plan chosen depends on the
parameter values given in the first Bind.

-O

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] prepared statements don't log arguments?

2005-04-08 Thread Tom Lane
Oliver Jowett [EMAIL PROTECTED] writes:
 Simon Riggs wrote:
 OK, thats what I hoped you'd say. With a prepared query all of the
 statements execute the same plan, so you don't need to know the exact
 parameters.

 This isn't true in 8.0 if you are using the unnamed statement (as the
 JDBC driver does in some cases): the plan chosen depends on the
 parameter values given in the first Bind.

Also, what plan got chosen isn't the only question that a DBA might
want the log to answer.  Where did my data get screwed up is at least
as likely an application.

I'm a bit worried about the costs of converting binary-format parameters
into text form ...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Call for objections: merge Resdom with TargetEntry

2005-04-08 Thread Bernd Helmle
--On Donnerstag, April 07, 2005 20:48:12 -0400 Alvaro Herrera 
[EMAIL PROTECTED] wrote:

One piece of wisdom I've managed to grasp is that when Tom asks for
objections or comments, you better speak very quickly because he codes
way too fast (that, or he posts when the patch is almost ready.)
Hehe, i noticed that after hitting sent...it's not that bad.
I didn't manage to follow the lists the last 2 days,  and was awaiting 
objections from Jaime Casanova, first.

(I guess if you are following development closely you should be
subscribed to pgsql-committers.)
I am already, thanks :)
--
 Bernd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] Tab-completion feature ?

2005-04-08 Thread Sergey E. Koposov
Hi All, 

I observed an a bit strange behaviour of the Tab-completion in postgres
8.0.1

I have the following command 

leda=# ALTER TABLE any_table RENAME TO 

After the TO there is one space and the cursor is after that space
I press tab and I get

leda=# ALTER TABLE any_table RENAME TO TO 

I understand that this is a bit stupid and not very useful example, but
still this is probably not an expected behaviour. 

With Best Regards, 
Sergey

PS In Postgres 7.4.6 there is no such problem. 


Sergey E. Koposov
Sternberg Astronomical Institute, Moscow University (Russia)
Max-Planck Institute for Astronomy (Germany) 
Internet: [EMAIL PROTECTED], http://lnfm1.sai.msu.su/~math/



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
We've been talking about this long enough ... let's try to actually do it ...

In the last go-round, the thread starting here,
http://archives.postgresql.org/pgsql-hackers/2004-11/msg00371.php
we spent a lot of time agonizing over GROUP BY and whether the
optimization is usable for anything beside MAX/MIN.  However, no one
presented any concrete reasons for thinking we could use it for any
other aggregates.  And using it with GROUP BY would require some
fundamental restructuring of the way we do grouping --- right now we
have to look at every row anyway to compute the grouping sets, so
there's no possibility of gain from optimizing the aggregates.

But the real issue is that no one seems willing to tackle the complete
problem.  So, let's forget those things and see if there is something
within reach if we just focus on MAX/MIN without grouping.

I realized today that this may not be as hard as I thought.
Specifically, I'm imagining that we could convert

SELECT min(x), max(y) FROM tab WHERE ...

into sub-selects in a one-row outer query:

SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
   (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);

Breaking it down like that means we can consider each aggregate
independently, which definitely simplifies matters.  This still allows
us to handle complex combinations like

SELECT min(x), max(y) - min(y), max(z) FROM tab WHERE ...

as well as aggregates in the HAVING clause --- the restriction is just
that each aggregate in the query has to be an optimizable MIN or MAX.
(If any are not, we're going to end up scanning the table anyway, and I
see no point in doing separate subqueries for some of the aggregates in
that case.)  We replace each aggregate call with a sub-SELECT, drop the
outer FROM and WHERE clauses, and there we are.

So I propose the following restrictions:

1. We will consider only aggregate queries with no GROUP BY clause.

2. The query can only reference one table.  (There doesn't seem to be
any way to handle join conditions reasonably.  We could possibly handle
Cartesian-product cases, but I don't see the value.)

When we have such a query, we'll scan the targetlist (and HAVING clause
if any) to examine each aggregate function.  We can only optimize if every
aggregate is convertible into an index scan, which requires the following
conditions:

1. The aggregate has an associated sort operator according to pg_aggregate
(see below).

2. The aggregate's argument matches a column of some btree index of the
table, and the sort operator matches either the  or  member of the
column's opclass.

3. If the column is not the first one of its index, then we must have an
indexable equality constraint (x = something) in WHERE for each earlier
column.  We can also make use of an inequality on the target column
itself, if it is of the opposite direction from the aggregate operator
(for example, if the aggregate operator matches the opclass , we can
use x  something or x = something).  Such an inequality will form a
starting boundary for the index scan.

This means that we can for example optimize cases like

SELECT MIN(x) FROM tab WHERE x  42

which turns into a forward indexscan starting at the x  42 boundary, or

SELECT MIN(x) FROM tab WHERE y = 42

which needs an index on (y,x) and starts at the first y = 42 entry.
(Note: this doesn't work with a constraint like y = 42 ...)

The fact that an indexscan conversion is possible doesn't necessarily
make it a good idea.  For example consider the identical query

SELECT MIN(x) FROM tab WHERE y = 42

when we have separate indexes on x and y.  If the first row with y = 42
is far into the x index, the optimized version could be far slower than
our normal implementation (which would probably do an indexscan on y,
instead).  In general, any constraints that don't fit into our indexscan
plan have to be looked at with suspicion.  So AFAICS we have to plan the
query both ways and estimate which is cheaper.  But we are only going to
do this for single-table queries, so the extra planner overhead won't be
large.

Catalog representation:

We must have a way to identify an aggregate as being related to an
index's sort order.  I am envisioning adding a column to pg_aggregate
that is either 0 (for a non-max/min aggregate) or the OID of an operator
that represents the sort order associated with the aggregate.
The rule is that these two queries must give the same answer:

SELECT agg(col) FROM tab;

SELECT col FROM tab ORDER BY col USING operator LIMIT 1;

that is, the aggregate value is equivalent to the first value in the sort
order induced by the operator.  For example, we associate min(int) with
int4lt and max(int) with int4gt.  (We also assume the aggregate must return
NULL for no input.)

The reason for associating with operators, and not directly to opclasses,
is that this will successfully handle both regular and reverse-sort
opclasses.


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Fri, Apr 08, 2005 at 20:50:09 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 Comments?  Anyone see anything I missed?

It should be possible to make this work for bool_and and bool_or as those
are equivalent to min and max for the boolean type.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Fri, Apr 08, 2005 at 20:50:09 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
   SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
  (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);
 
 Comments?  Anyone see anything I missed?

Are NULLs a problem? In the second case above, wouldn't you get NULL
instead of the value returned by max if there were some NULL and some
not NULL values?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Fri, Apr 08, 2005 at 20:50:09 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 Comments?  Anyone see anything I missed?

Thinking about the case for NULLs some more, I am wondering if you are
going to treat aggregates with strict state functions different than
those that don't? It seems for ones with strict state functions you need
to not include NULL values when doing using ORDER BY. For aggregates
that aren't strict it may be possible that it is desired that NULL
be returned if there is a NULL value in one of the rows.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Mark Kirkwood
Looks great! I had been slowly thinking along similar lines via the 
equivalence:

SELECT min(x) FROM tab WHERE...
SELECT min(x) FROM (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1) AS t
However, it looks like your approach is more flexible than this :-)
best wishes
Mark
Tom Lane wrote:
I realized today that this may not be as hard as I thought.
Specifically, I'm imagining that we could convert
SELECT min(x), max(y) FROM tab WHERE ...
into sub-selects in a one-row outer query:
SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
   (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);
Breaking it down like that means we can consider each aggregate
independently, which definitely simplifies matters.  
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
   Tom Lane [EMAIL PROTECTED] wrote:
 SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
 (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);

 Are NULLs a problem? In the second case above, wouldn't you get NULL
 instead of the value returned by max if there were some NULL and some
 not NULL values?

Hmm ... we might have to hack the LIMIT step to skip over NULLs.
Doesn't seem like an insurmountable issue though.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 Thinking about the case for NULLs some more, I am wondering if you are
 going to treat aggregates with strict state functions different than
 those that don't?

We only intend this to support MAX and MIN.  If you're inventing an
aggregate that doesn't play exactly by those rules, you don't get to
take advantage of the optimization.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 It should be possible to make this work for bool_and and bool_or as those
 are equivalent to min and max for the boolean type.

This would just be a matter of marking them properly in the catalogs.

However, are they really equivalent in the corner cases?  In particular,
I think boolean AND across zero input rows is probably supposed to
return TRUE, not NULL.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Fri, Apr 08, 2005 at 23:40:28 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 Bruno Wolff III [EMAIL PROTECTED] writes:
  It should be possible to make this work for bool_and and bool_or as those
  are equivalent to min and max for the boolean type.
 
 This would just be a matter of marking them properly in the catalogs.
 
 However, are they really equivalent in the corner cases?  In particular,
 I think boolean AND across zero input rows is probably supposed to
 return TRUE, not NULL.

I am not sure what the spec says, but according to how the seem to work,
the answer appears to be that they are equivalent.

area= select bool_and(true) where false;
 bool_and
--

(1 row)

area= select bool_or(true) where false;
 bool_or
-

(1 row)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] [PATCHES] DELETE ... USING

2005-04-08 Thread Bruce Momjian
Neil Conway wrote:
 Bruce Momjian wrote:
  I just checked current CVS and see exactly what you describe:
  
  test= SELECT pg_class.* LIMIT 0;
  ERROR:  missing FROM-clause entry for table pg_class
  
  test= SET add_missing_from=true;
  SET
  test= SELECT pg_class.* LIMIT 0;
  NOTICE:  adding missing FROM-clause entry for table pg_class
  
   Is this what we want?  I don't think so.  I thought we wanted to
   maintain the backward-compatible syntax of no FROM clause.
 
 We do? Why?
 
 It is just as noncompliant with the SQL spec as other variants of this 
 behavior. add_missing_from would *always* have rejected those queries, 
 so ISTM we have been discouraging this case for as long as 
 add_missing_from has existed. If we want to allow this syntax by 
 default, we will need to effectively redefine the meaning of 
 add_missing_from -- which is fine, I just didn't think anyone wanted that.

Oh, so by setting add_missing_from to false, this query starts to fail.

I don't know how much people use that syntax.  I use it sometimes as
hack in psql to avoid typing FROM, but that's hardly a reason to support
it.

If everyone else is OK with having it fail, that is fine with me, but I
wanted to make sure folks saw this was happening.  I basically saw no
discussion that we were disabling that syntax.  [CC moved to hackers.]

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] [PATCHES] DELETE ... USING

2005-04-08 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  test= SELECT pg_class.* LIMIT 0;
  NOTICE:  adding missing FROM-clause entry for table pg_class
 
  Is this what we want?  I don't think so.  I thought we wanted to
  maintain the backward-compatible syntax of no FROM clause.
 
 Well, the discussion earlier in the week concluded that
 add_missing_from=true should emit a notice in every case where
 add_missing_from=false would fail.  Do you want to argue against
 that conclusion?

I didn't realize that SELECT pg_class.* was now going to fail because
add_missing_from is false.  I didn't link those two together in my head,
probably because the warning is not emitted if there is no FROM clause.

Anyway, I am fine either way but wanted to publicise it at least.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Neil Conway
Tom Lane wrote:
Specifically, I'm imagining that we could convert
SELECT min(x), max(y) FROM tab WHERE ...
into sub-selects in a one-row outer query:
SELECT (SELECT x FROM tab WHERE ... ORDER BY x LIMIT 1),
   (SELECT y FROM tab WHERE ... ORDER BY y DESC LIMIT 1);
Does this transformation work for a query of the form:
SELECT min(x), max(y) FROM tab WHERE random()  0.5;
(which isn't a very useful query, but I'm sure you can imagine a more 
realistic example involving volatile functions in the WHERE clause.)

-Neil
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Does this transformation work for a query of the form:
  SELECT min(x), max(y) FROM tab WHERE random()  0.5;

I've been going back and forth on that.  We wouldn't lose a lot in the
real world if we simply abandoned the optimization attempt whenever we
find any volatile functions in WHERE.  OTOH you could also argue that
we have never guaranteed that volatile functions in WHERE would be
evaluated at every table row --- consider something like
SELECT ... WHERE x  42 AND random()  0.5;
All that this optimization might do is to further cut the fraction of
table rows at which the volatile function actually gets checked.  So
I'm not seeing that it would break any code that worked reliably before.

Still, if it makes you feel at all uncomfortable, we can just refuse
the optimization in such cases.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Neil Conway
Tom Lane wrote:
All that this optimization might do is to further cut the fraction of
table rows at which the volatile function actually gets checked.  So
I'm not seeing that it would break any code that worked reliably before.
Hmm; what about
SELECT min(x), min(x) FROM tab WHERE random()  0.5;
Applying the optimization would mean the two min(x) expressions would 
likely be different, which seems rather weird.

Still, if it makes you feel at all uncomfortable, we can just refuse
the optimization in such cases.
I'd say that's probably safest.
-Neil
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [PATCHES] DELETE ... USING

2005-04-08 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Tom Lane wrote:
 Well, the discussion earlier in the week concluded that
 add_missing_from=true should emit a notice in every case where
 add_missing_from=false would fail.  Do you want to argue against
 that conclusion?

 I didn't realize that SELECT pg_class.* was now going to fail because
 add_missing_from is false.

It always has, though.  Neil hasn't changed the behavior when
add_missing_from is false ... he's only made add_missing_from=true
emit a notice in *every* case where add_missing_from=false would
fail.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Hmm; what about
  SELECT min(x), min(x) FROM tab WHERE random()  0.5;
 Applying the optimization would mean the two min(x) expressions would 
 likely be different, which seems rather weird.

Actually not: my expectation is that identical aggregate calls will get
folded into one subplan.  This is what happens now (when you call min(x)
twice it's computed just once) and the subplan mechanism already has the
infrastructure needed to let us keep doing it.  I feel we need to be
able to do this in order to justify the assumption that evaluating each
aggregate separately is OK from the efficiency standpoint.

 Still, if it makes you feel at all uncomfortable, we can just refuse
 the optimization in such cases.

 I'd say that's probably safest.

I don't have a problem with that, but I haven't quite convinced myself
that we need to expend the cycles to check for it, either ...

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Bruno Wolff III
On Sat, Apr 09, 2005 at 00:57:11 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 I don't have a problem with that, but I haven't quite convinced myself
 that we need to expend the cycles to check for it, either ...

You could have two different aggregates and end up with values
that could happen if the same set of rows was used to evaluate both.
I would expect that the sequential plan would be better for a volatile
where clause since you are going to execute it for every row anyway.
So disabling the optimization in that case isn't normally going to
slow things down.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Optimizing maximum/minimum queries (yet again)

2005-04-08 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote:
 I don't have a problem with that, but I haven't quite convinced myself
 that we need to expend the cycles to check for it, either ...

 I would expect that the sequential plan would be better for a volatile
 where clause since you are going to execute it for every row anyway.

Well, no, wait a minute.  We have never promised that we would
physically evaluate every volatile function at every table row.
What we promise is that we do not assume-without-proof that the
function's value will be the same at every table row.  I don't see
where this optimization breaks that promise.

Obviously, we do make such an assumption for WHERE clauses that actually
get taken into the indexscan condition.  But we already check volatility
before considering a clause as a possible indexscan condition.  The
question here is whether we have to reject the optimization if there are
additional WHERE clauses, not directly related to the proposed
indexscan, that contain volatile functions.  I'm not seeing the argument
that says we must do that.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org