On Wed, 16 Aug 2006, Peter Nixonn wrote:
I am getting a result for an JOIN that I think is wrong. Maybe its my
understanding that is wrong here however, so please be gentle :-)
The phones table contains a list of phone numbers and an associated
customer ID. The radacct table contains a list of all calls made (RADIUS
Accounting records).
I am doing the following:
SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 GROUP BY
phones.CALLINGSTATIONID;
This query as expected returns 1386 rows (for customer ID 1) which includes
a number of rows which have a NULL sum as they have not ever connected.
Now, what I want to do is to return the same result set (of 1386 phones),
but only for a particular time period. I therefore do the following:
SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and radacct.ACCTSTARTTIME
BETWEEN '2006-05-17 15:16:42' AND '2006-08-16 15:16:42' GROUP BY
phones.CALLINGSTATIONID;
This returns 1280 rows, none of which are have a NULL value for sum. This
surprised me at first as I thought the WHERE clause should apply before the
OUTER JOIN but apparently not.
No, in fact it explicitly happens after the join (the order of evaluation
in the theoretical model is basically evaluate the from clause as a table
then apply where on that table to make a new table and so on). You can
either use a subselect in from list or put the condition into the ON to
make its evaluation earlier.
I then tried the following:
SELECT phones.CALLINGSTATIONID, sum(radacct.acctinputoctets) FROM phones
LEFT OUTER JOIN radacct ON (phones.CALLINGSTATIONID =
radacct.CALLINGSTATIONID) WHERE phones.CUSTID = 1 and
(radacct.ACCTSTARTTIME BETWEEN '2006-05-17 15:16:42' AND '2006-08-16
15:16:42' OR radacct.ACCTSTARTTIME ISNULL) GROUP BY
phones.CALLINGSTATIONID;
This query returns 1368 rows, which includes some NULL values for sum,
however still short of the 1386 rows I am looking for. Close, but no cigar!
Imagine you had
phones (callingstationid = 1, custid = 1)
phones (callingstationid = 2, custid = 1)
phones (callingstationid = 3, custid = 1)
radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00)
radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)
Now, I believe the outer join should above give you
(callingstationid = 1, custid = 1, acctstarttime 2006-05-10 00:00:00)
(callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid = 1, acctstarttime NULL)
Then apply the where clause
(callingstationid = 2, custid =1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid =1, acctstarttime NULL)
---
With the subselect in from you'd have
phones (callingstationid = 1, custid = 1)
phones (callingstationid = 2, custid = 1)
phones (callingstationid = 3, custid = 1)
radacct (callingstationid = 1, acctstarttime 2006-05-10 00:00:00)
radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)
The first radacct row doesn't pass, so it's not in the subselect output
which should then look like:
subsel_radacct (callingstationid = 2, acctstarttime 2006-08-10 00:00:00)
Then, I believe the outer join should give you
(callingstationid = 1, custid = 1, acctstarttime NULL)
(callingstationid = 2, custid = 1, acctstarttime 2006-08-10 00:00:00)
(callingstationid = 3, custid = 1, acctstarttime NULL)
And then you apply the where clause again and all the rows go through.
---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly