I have a view to generate a list of instructors and a count of their
> future classes.
> 
> "instructors" is a link table between "class" and "person".
> 
> CREATE VIEW future_instructor_counts
>     AS
>         SELECT  person.id AS person_id,
>                 first_name,
>                 last_name,
>                 count(instructors.class) AS class_count
> 
>           FROM  class, instructors, person
> 
>          WHERE  class.id    = instructors.class AND
>                 person.id   = instructors.person
>                 AND class_time > now()
> 
>       GROUP BY  person_id, first_name, last_name;

The trick is to do the data aggregation separately, then JOIN in whatever other 
fields you want.

Something like this:

CREATE VIEW future_instructor_counts
    AS
        SELECT  * FROM 

        (SELECT 
         person.id AS person_id,
                first_name,
                last_name) personinfo

        INNER JOIN 

        (SELECT class.id FROM class
        WHERE class_time > now() ) classes

        INNER JOIN

        (SELECT 
         id, count(class) AS class_count 
        FROM instructors GROUP BY id) classcount

        ON personinfo.person_id = instructors.id
        AND classes.id = instructors.id
            
In many cases when using aggregate functions you get just the fields you need 
from the agg function (typically an id plus the aggregate result) and JOIN with 
other tables (or even the same table) to get other info such as first_name, 
last_name, etc.

Otherwise, if you GROUP BY additional fields so you can get them in the output, 
you may be making the db do additional work.

> 1) With an aggregate function in the query, is there any way to remove
> the "AND class_time > now()" so that timestamp can be passed in the
> select?  That is, I'd like to be able to do this?
> 
>     select * from instructor_counts where class_time > now();
> 
> But class_time is not part of the VIEW so that's not valid.

No problem, just make it a part of the view. See the classes section below.

CREATE VIEW future_instructor_counts
    AS
        SELECT  * FROM 

        (SELECT 
         person.id AS person_id,
                first_name,
                last_name) personinfo

        INNER JOIN 

        -- Add class_time field!
        (SELECT class.id, class_time FROM class
        WHERE class_time > now() ) classes

        INNER JOIN

        (SELECT 
         id, count(class) AS class_count 
        FROM instructors GROUP BY id) classcount

        ON personinfo.person_id = instructors.id
        AND classes.id = instructors.id

[Disclaimer: I've not tested this code at all. It could help if you sent table 
definitions and maybe even dummy
data via insert commands.]

>  And if it was included then I don't have an aggregate function any more - no
> more grouping.

If you do the agg function separately like this that isn't an issue. You join 
tables to get whatever fields you'd like to have in your output.
 
> 2) I think I'm missing something obvious.  I know that I need to
> specify all my non-aggregate columns in the "GROUP BY", but I don't
> under stand why.  Really, the results are just grouped only by
> person.id so why the need to specify the other columns.
> 
> And if you don't specify all the columns then Postgresql reports:
> 
>   ERROR:  column "person.id" must appear in the GROUP BY 
>             clause or be used in an aggregate function
> 
> Is there a reason Postgresql doesn't just add the column
> automatically?  It does in other cases (like a missing table in a
> join).

As I mention above, if you GROUP BY additional fields just to get them in the 
output, you may be making the db do additional work.

I seem to remember that in a later SQL standard (ie, after SQL-99 but I could 
be wrong) I believe it allows you to specify additional fields in SELECT that 
are not in the GROUP BY clause. But PG isn't there yet. 

-Roger

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Bill Moseley
Sent: Friday, September 16, 2005 11:30 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Setting WHERE on a VIEW with aggregate function.


I have a view to generate a list of instructors and a count of their
future classes.

"instructors" is a link table between "class" and "person".

CREATE VIEW future_instructor_counts
    AS
        SELECT  person.id AS person_id,
                first_name,
                last_name,
                count(instructors.class) AS class_count

          FROM  class, instructors, person

         WHERE  class.id    = instructors.class AND
                person.id   = instructors.person
                AND class_time > now()

      GROUP BY  person_id, first_name, last_name;


I have two very basic SQL questions:

1) With an aggregate function in the query, is there any way to remove
the "AND class_time > now()" so that timestamp can be passed in the
select?  That is, I'd like to be able to do this?

    select * from instructor_counts where class_time > now();

But class_time is not part of the VIEW so that's not valid.  And if it
was included then I don't have an aggregate function any more - no
more grouping.


2) I think I'm missing something obvious.  I know that I need to
specify all my non-aggregate columns in the "GROUP BY", but I don't
under stand why.  Really, the results are just grouped only by
person.id so why the need to specify the other columns.

And if you don't specify all the columns then Postgresql reports:

  ERROR:  column "person.id" must appear in the GROUP BY 
            clause or be used in an aggregate function

Is there a reason Postgresql doesn't just add the column
automatically?  It does in other cases (like a missing table in a
join).

Thanks


-- 
Bill Moseley
[EMAIL PROTECTED]


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to