RE: group by issue...??

SGreen Mon, 13 Sep 2004 07:28:27 -0700

I believe that everyone is using the same words to mean different things.

Bruce, when you say "grouped around" what result would you like to see? 
Make up some sample data, or use some of your real data, and demonstrate 
what you expected from the database when you say "grouped around". When we 
understand what you want, we can help you get there.

In the SQL sense of the phrase "group by", the engine treats all rows with 
common values as a "set" of values and can apply any of the aggregating 
functions (and/or other functions if you phrase them correctly) to the 
"set" of values. Here's a simplified example (I left out a lot of usual 
things like indexes and primary keys but I think you will see what I 
mean):

CREATE TABLE student (
        ID int auto_increment,
        Name varchar(20)
)

CREATE TABLE subject (
        ID int auto_increment,
        Name varchar(20),
        Section int,
        Instructor varchar(20)
)

CREATE TABLE scores (
        ID int auto_increment,
        student_ID int,
        subject_ID int,
        score float,
        testdate datetime,
        scoredate datetime
)

Assume that we have 600 students and 40 classes (sample data omitted for 
brevity). In the scores table you can find entries like (1234, 15, 25, 
96.7, '2004-09-15 09:00:00', '2004-09-17 12:00:00') which translates to:

In score record #1234, student 15 received a grade in subject 25 on an 
exam taken at 9AM on the 15th day of September but the instructor didn't 
enter the grade until 2 days and 3 hours later.

Let's say the instructor wanted to average all of his student's grades for 
that class:

SELECT student_ID, AVG(score) as avgscore
from scores
GROUP BY student_ID
WHERE subject_ID = 25

We start with data that looks like:
+----------+-----+
|student_ID|Score|
+----------+-----+
|       15 | 100 | -+
|       15 |  90 |  |- one student's scores
|       15 |  80 | -+
|       16 |  81 |
|       16 |  85 |
|       16 |  83 |
|      350 |  86 | -+
|      350 |  60 |  | - another student's scores
|      350 |  70 | _+
...
+----------+-----+

And because we are "grouping" on student ID we get the behavior of 

+----------+----------------+
|student_ID|AVG(score)      |
+----------+----------------+
|       15 | AVG(100,90,80) |
|       16 | AVG(81,85,83)  |
|      350 | AVG(86,60,70)  |
...
+----------+----------------+

which returns the results as:

+----------+--------+
|student_ID|avgscore|
+----------+--------+
|       15 |     90 |
|       16 |     83 |
|      350 |     72 |
...
+----------+--------+

The GROUP BY tells the engine which scores get AVG()-ed together. For each 
unique combination of values that exists in the columns specified in the 
GROUP BY clause of the query, the engine applies the aggregate function(s) 
specified in the SELECT clause.  If you neglect to specify an aggregate 
function for a column NOT included in your GROUP BY clause, the engine can 
pick any (random) value from the list of values that meet that condition. 
That is why nearly every other SQL engine does not allow this behavior as 
that permits the same query to return different results for multiple 
executions against the same base data (the query becomes 
"non-deterministic").

SORTING and GROUPING are separate functions. Sorting is merely putting 
things into a specified order. Grouping is the formation of virtual "sets" 
of data based on certain combinations of data values so that functions 
like MIN(), MAX(), AVG(), STD(), and the others will have have specific 
data values to work with.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

"bruce" <[EMAIL PROTECTED]> wrote on 09/10/2004 07:58:44 PM:

> paul....
> 
> i do a
> 
> select ID from hTBL group by type
> 
> and i still only get a single row for each type, where i would expect to 
get
> the ~5000 rows, grouped around the 3 different types.
> 
> what's going on????
> 
> thanks...
> 
> -bruce
> 
> 
> -----Original Message-----
> From: Paul DuBois [mailto:[EMAIL PROTECTED]
> Sent: Friday, September 10, 2004 4:36 PM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: group by issue...??
> 
> 
> At 16:27 -0700 9/10/04, bruce wrote:
> >hi...
> >
> >if i do this...
> >
> >select
> >h1.itemID as hitem,
> >h1.process as process,
> >h1.status as status,
> >h1.tblType as tbl,
> >h1.date as date
> >from historyTBL as h1
> >   where (h1.tblType = '3' or h1.tblType = '4');
> >
> >i get a results table with ~5000 rows...
> >
> >if i add the 'group by' sql,
> >
> >select
> >h1.itemID as hitem,
> >h1.process as process,
> >h1.status as status,
> >h1.tblType as tbl,
> >h1.date as date
> >from historyTBL as h1
> >   where (h1.tblType = '3' or h1.tblType = '4')
> >group by tblType; <<<<<<<<<<<<<<<<
> >
> >i get a single line for each tblType...
> >
> >i was under the impression that i should get the same number of rows
> >returned, but that they would be grouped together.....
> >
> >can someone provide an explanation as to how this should work, and 
what's
> >going on????
> 
> If you use GROUP BY with selecting the value of any aggregate functions,
> you achieve the same effect as SELECT DISTINCT.
> 
> --
> Paul DuBois, MySQL Documentation Team
> Madison, Wisconsin, USA
> MySQL AB, www.mysql.com
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/[EMAIL PROTECTED]
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
>

RE: group by issue...??

Reply via email to