Re: [GENERAL] SQL question re aggregates & joins

2010-01-28 Thread Tom Lane
Scott Ribe  writes:
> Given that t2.id is the primary key, grouping by any other column of t2 is
> really redundant. I know *what* SQL won't allow me to do, I'm interested in
> knowing if there's some reason *why* other than historical...

SQL92 says so.  More recent versions of the SQL spec describe allowing
omission of grouping columns when one of them can be proven unique,
but it's complicated and we haven't got round to doing it.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SQL question re aggregates & joins

2010-01-28 Thread Dann Corbit
> -Original Message-
> From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
> ow...@postgresql.org] On Behalf Of Scott Ribe
> Sent: Thursday, January 28, 2010 2:10 PM
> To: Thom Brown
> Cc: pgsql-general
> Subject: Re: [GENERAL] SQL question re aggregates & joins
> 
> > You can't include an aggregate in the select if you don't group by
> > non-aggregates, so it should be:
> >
> > select max(t1."When"), t1."Pt_Id", t2."DateOfBirth"
> > from "PtStaffAccess" t1, "Person" t2
> > where t1."Pt_Id" = t2.id
> > group by t1."Pt_Id", t2."DateOfBirth";
> 
> I was aware that I could alternatively group by all the columns, but
> that
> actually just highlights the redundancy even more--consider your second
> example:
> 
> select max(t1."When"), t2.id , t2."DateOfBirth"
> from "PtStaffAccess" t1, "Person" t2
> where t1."Pt_Id" = t2.id
> group by t2.id, t2."DateOfBirth";
> 
> Given that t2.id is the primary key, grouping by any other column of t2
> is
> really redundant. I know *what* SQL won't allow me to do, I'm
> interested in
> knowing if there's some reason *why* other than historical...

Having a group by on a primary key in a single table is of no value.  After 
all, the group by accomplishes nothing at all.

Having a group by on the primary key of one table in a join does not guarantee 
uniqueness of the join result.

Consider the simplified schema
Table Customers:
Custname varchar(80)
Custaddr varchar(80)
Custid int primary key
Data:
{'Joe', 'Paris France', 1}
{'Fred', 'Seattle USA', 2}

Table Orders:
OrderID int
Custid int
OrderItem int
OrderQty int
OrderPrice Numeric 12.2
Primary key: OrderID, CustID, OrderItem
Foreign key: CustiID references Customers(CustID)
Data:
{1, 1, 17, 2, 12.92},
{1, 1, 12, 5, 14.75},
{2, 2, 9, 1, 27.45},
{2, 2, 2, 8, 11.23},
{2, 2, 1, 1, 123.45}

SELECT c.Custid, c.CustName, c.Custaddr, o.OrderID, o.OrderItem, sum(o.OrderQty 
* o.OrderPrice) as dollars
From Customers c, Orders o 
WHERE c.Custid = o.Custid
GROUP BY c.Custid

Now, c.Custid is the primary key for Customers and also a foreign key for 
Orders and yet the query is utter nonsense.
Had all of the primary key columns for the second table been included, then the 
group by is superfluous.  Having one or more of the primary key columns left 
out from the child table but added to the group by list changes the meaning and 
value of the result.

I have always thought this way for aggregate queries:
1. Add an aggregate function to each column that you are aggregating in some 
way.
2. Add a 'group by' for every column that is not being aggregated.
Anything else means that the query has not been thought through carefully.

IMO-YMMV


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SQL question re aggregates & joins

2010-01-28 Thread Scott Ribe
> You can't include an aggregate in the select if you don't group by
> non-aggregates, so it should be:
> 
> select max(t1."When"), t1."Pt_Id", t2."DateOfBirth"
> from "PtStaffAccess" t1, "Person" t2
> where t1."Pt_Id" = t2.id
> group by t1."Pt_Id", t2."DateOfBirth";

I was aware that I could alternatively group by all the columns, but that
actually just highlights the redundancy even more--consider your second
example:

select max(t1."When"), t2.id , t2."DateOfBirth"
from "PtStaffAccess" t1, "Person" t2
where t1."Pt_Id" = t2.id
group by t2.id, t2."DateOfBirth";

Given that t2.id is the primary key, grouping by any other column of t2 is
really redundant. I know *what* SQL won't allow me to do, I'm interested in
knowing if there's some reason *why* other than historical...

-- 
Scott Ribe
scott_r...@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SQL question re aggregates & joins

2010-01-28 Thread Thom Brown
On 28 January 2010 21:32, Scott Ribe  wrote:

> OK, this does not work:
>
> select max(t1."When"), t1."Pt_Id", t2."DateOfBirth"
> from "PtStaffAccess" t1, "Person" t2
> where t1."Pt_Id" = t2.id
> group by t1."Pt_Id";
>
> But this does:
>
> select max(t1."When"), t1."Pt_Id", min(t2."DateOfBirth")
> from "PtStaffAccess" t1, "Person" t2
> where t1."Pt_Id" = t2.id
> group by t1."Pt_Id";
>
> Now the error message was clear, and I think PG is following the standard
> here. But I have a question just for my own education.
>
> It seems to me, given that "Person".id is declared as the primary key, it
> should possible to deduce that there is no way that the 1st query could
> ever
> have multiple values of "DateOfBirth" to choose from when building a result
> row. Am I missing something? Or am I right, that this is something that SQL
> could do but simply doesn't, for whatever reason, historical, complexity...
>
> In fact, what's even more surprising to me, is that if I change the
> grouping
> to the other side of the join, it still doesn't work:
>
> select max(t1."When"), t2.id, t2."DateOfBirth"
> from "PtStaffAccess" t1, "Person" t2
> where t1."Pt_Id" = t2.id
> group by t2.id;
>
> Come on, I'm grouping on the primary key and it thinks that there might be
> multiple values for the other columns?
>
>
You can't include an aggregate in the select if you don't group by
non-aggregates, so it should be:

select max(t1."When"), t1."Pt_Id", t2."DateOfBirth"
from "PtStaffAccess" t1, "Person" t2
where t1."Pt_Id" = t2.id
group by t1."Pt_Id", t2."DateOfBirth";

and likewise

select max(t1."When"), t2.id, t2."DateOfBirth"
from "PtStaffAccess" t1, "Person" t2
where t1."Pt_Id" = t2.id
group by t2.id, t2."DateOfBirth";

PostgreSQL might already know that there is only 1 date of birth per ID, but
it still doesn't make sense to select them both and group by one of them
when an aggregate is present (even though MySQL will ignore that and just
fill out what it thinks you missed internally)

Regards

Thom