[SQL] Ordering with GROUPs

2002-08-18 Thread Julian Scarfe

I'm struggling to find an appropriate efficient query for an aggregate-type
problem and I'd appreciate suggestions.

I have messages associated with a set of locations (zero or more messages
per location).  A cut down test is as follows:

CREATE TABLE location (ident char(4), node point);
INSERT INTO location values ('', '(1,1)');
INSERT INTO location values ('', '(1,2)');
INSERT INTO location values ('', '(2,1)');
INSERT INTO location values ('', '(2,2)');

CREATE TABLE message (ident char(4), message_text text);
INSERT INTO message values ('', 'foo');
INSERT INTO message values ('', 'bar');
INSERT INTO message values ('', 'baz');
INSERT INTO message values ('', 'abel');
INSERT INTO message values ('', 'baker');
INSERT INTO message values ('', 'charlie');

so each message is associated with a location as follows -- here's the
natural join

SELECT location.ident, node, message_text from location, message
WHERE location.ident = message.ident;
 ident | node  | message_text
---+---+--
   | (1,1) | foo
   | (1,1) | bar
   | (1,1) | baz
   | (1,2) | abel
   | (1,2) | baker
   | (2,1) | charlie
(6 rows)

What I want is to know how many messages are available for each location.
It's easy if I just want the ident:

SELECT location.ident, count(*) from location, message
WHERE location.ident = message.ident
GROUP BY location.ident;
 ident | count
---+---
   | 3
   | 2
   | 1
(3 rows)

But I'd like to return the "node" in the same query.  If I try:

SELECT location.ident, node, count(*) from location, message
WHERE location.ident = message.ident
GROUP BY location.ident;
ERROR:  Attribute location.node must be GROUPed or used in an aggregate
function

it obviously fails.  If node were an integer I could just use an aggregate
like max() or similar, but it's not, and there's no suitable aggregate for
point.  I can create a trivial one as a work around, but I hope I don't have
to.

But if I do it properly, it requires an ordering operator :

SELECT location.ident, node, count(*) from location, message
WHERE location.ident = message.ident
GROUP BY location.ident, location.node;
ERROR:  Unable to identify an ordering operator '<' for type 'point'
Use an explicit ordering operator or modify the query

So how do I specify the operator for GROUP BY?  If I compare with ORDER BY
(>> operator is "is right of" for type point):

SELECT location.ident, node from location, message
WHERE location.ident = message.ident
ORDER BY location.node USING >>, location.ident;
 ident | node
---+---
   | (2,1)
   | (1,1)
   | (1,1)
   | (1,1)
   | (1,2)
   | (1,2)
(6 rows)

...and use the same construction, I get a parser error...

SELECT location.ident, node from location, message
WHERE location.ident = message.ident
GROUP BY location.node USING >>, location.ident;
ERROR:  parser: parse error at or near ","

I can't find anything in the doc.

One alternative is to use a nested query:

SELECT location.ident, node, (
   select count(*) from message
   WHERE location.ident = message.ident
) as count
FROM location;
 ident | node  | count
---+---+---
   | (1,1) | 3
   | (1,2) | 2
   | (2,1) | 1
   | (2,2) | 0
(4 rows)

For the test that works fine, but for my real life situation, the nested
query seems to be very inefficient, taking vastly longer than the first
query illustrated above.  Since the information required is clearly
contained in the result of that query, it seems unfortunate to say the least
that I can't work the GROUP BY mechanism on it to give me what I want.

Any tips please?

Thanks

Julian Scarfe




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [SQL] Ordering with GROUPs

2002-08-18 Thread Bruno Wolff III

On Sun, Aug 18, 2002 at 12:49:18 +0100,
  Julian Scarfe <[EMAIL PROTECTED]> wrote:
> I'm struggling to find an appropriate efficient query for an aggregate-type
> problem and I'd appreciate suggestions.
> 
> I have messages associated with a set of locations (zero or more messages
> per location).  A cut down test is as follows:
> 
> CREATE TABLE location (ident char(4), node point);
> INSERT INTO location values ('', '(1,1)');
> INSERT INTO location values ('', '(1,2)');
> INSERT INTO location values ('', '(2,1)');
> INSERT INTO location values ('', '(2,2)');
> 
> CREATE TABLE message (ident char(4), message_text text);
> INSERT INTO message values ('', 'foo');
> INSERT INTO message values ('', 'bar');
> INSERT INTO message values ('', 'baz');
> INSERT INTO message values ('', 'abel');
> INSERT INTO message values ('', 'baker');
> INSERT INTO message values ('', 'charlie');
> 
> I can't find anything in the doc.
> 
> One alternative is to use a nested query:
> 

Instead of a nested query, how about doing a join to a group'd version
of message? While you didn't use a unique constraint on the location
table it seems likely that there should only be one location for each
ident. So you can do the join after counting the number of messages.
While this is similar to your example, it might run faster, especially
in you don't have the message table indexed by ident, so that you can
do one sort instead of a bunch of sequential scans.
For example:
area=> select location.ident, location.node, howmany from location
area-> natural join (select ident, count(*) as howmany from message
area(> group by ident) as messgroup;
 ident | node  | howmany
---+---+-
   | (1,1) |   3
   | (1,2) |   2
   | (2,1) |   1
(3 rows)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [SQL] Unexplained SQL behavior

2002-08-18 Thread Tom Lane

"JOE" <[EMAIL PROTECTED]> writes:
> I am debugging a performance problem with a view.  I have narrowed down the=
>  problem to when I adeed columns to my view.

Okay, I see the problem: it is in fact a bug, and one that's been around
for awhile.  (Curious no one's noticed before.)  When you wrote

> select distinct ... ,
> null::timestamp without time zone,
> null::timestamp without time zone, ...

the parser felt it could get away with creating only one sort column for
these two entries.  This is logically a valid optimization, but it
confused later stages of the system into thinking you'd written a
DISTINCT ON clause rather than plain DISTINCT.  And that suppresses an
important optimization, namely pushing down the outer query's WHERE
clause into the subselect.  (You'd also find that psql's \d would
display the view definition rather oddly.)

This is a bug and I will fix it for 7.3, but in the meantime the answer
is "don't do that".  I imagine this particular view definition is just a
placeholder until you get around to filling in non-null values for those
columns?  The problem will go away as soon as these two view columns
aren't obviously equal.  If you really need a view that works just like
this, you can work around the bug by making the null columns trivially
different, perhaps

> select distinct ... ,
> null::timestamp with time zone::timestamp without time zone,
> null::timestamp without time zone, ...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [SQL] Ordering with GROUPs

2002-08-18 Thread Tom Lane

"Julian Scarfe" <[EMAIL PROTECTED]> writes:
> SELECT location.ident, node, count(*) from location, message
> WHERE location.ident = message.ident
> GROUP BY location.ident;
> ERROR:  Attribute location.node must be GROUPed or used in an aggregate
> function

> it obviously fails.  If node were an integer I could just use an aggregate
> like max() or similar, but it's not, and there's no suitable aggregate for
> point.  I can create a trivial one as a work around, but I hope I don't have
> to.

ISTM the problem here is the lack of any ordering operator for POINT,
which defeats GROUP BY, *plus* the lack of any aggregate you might use
for an aggregate-based solution.  This is not really a language failing
but a problem with an impoverished datatype.  So, if you don't like
Bruno's subselect-based workaround, the dummy aggregate seems the way
to go.

SQL99 contains a whole bunch of verbiage whose intent seems to be that
if you GROUP BY a unique or primary-key column, you can reference the
other columns of that table without aggregation (essentially, the
system treats them as implicitly GROUP BY'd).  Sooner or later we'll
probably get around to implementing that, and that would solve your
problem as long as you declare location.ident properly.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [SQL] Need Help for select

2002-08-18 Thread Andre Schubert

On Thu, 15 Aug 2002 11:17:15 +0900
"Masaru Sugawara" <[EMAIL PROTECTED]> wrote:

> On Wed, 14 Aug 2002 16:04:21 +0200
> Andre Schubert <[EMAIL PROTECTED]> wrote:
> 
> 
> > I want to compare if a tuple in c exist in b for each c.d_id and b.a_id.
> > In c exists 3 tuples: (1,2), (3,4), (5)
> > and want to find these tuples in b.
> 
> 
> Probably I would think I have reached the correct query. Table b and c,
> however, must have unique indices like the following in order to get the
> result by using it, because it pays no attention to the duplicate keys.
> If there are no primary keys, it will become more complicated for eliminating
> duplicate keys. 
> 
> 
> create table b (
> a_id int,
> c_id int,
> constraint p_key_b primary key(a_id, c_id)
> );
> create table c (
> b_id int,
> d_id int,
> constraint p_key_c primary key(b_id, d_id)
> );
> 
> 
> SELECT a.name, d.name
> FROM (SELECT t2.a_id, t2.d_id
> FROM (SELECT b.a_id, t1.d_id, t1.n
>   FROM (SELECT c.b_id, c.d_id, t0.n
>   FROM c, (SELECT d_id, COUNT(*) AS n
>  FROM c GROUP BY d_id) AS t0
>  WHERE c.d_id = t0.d_id
>) AS t1
>LEFT OUTER JOIN b ON (t1.b_id = b.c_id)
>WHERE b.a_id IS NOT NULL
>   ) AS t2
> GROUP BY t2.a_id, t2.d_id, t2.n
>HAVING COUNT(*) = t2.n
>  ) AS t3,
>  a,
>  d
> WHERE a.id = t3.a_id
>   AND d.id = t3.d_id
> 

After days of studying this query and hours of testing i would say this query works 
for me very well.
Thank you very very much.

> 
> 
> Regards,
> Masaru Sugawara
> 
> 
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [EMAIL PROTECTED] so that your
> message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org