RE: Solved - median (was Re: mean/median/mode)

2003-12-12 Thread emierzwa
Using your exact dataset listed at the bottom, I tried using derived
tables to combine your implementation under Alpha 4.1.1 as a single
statement. I expected a message saying it was to complex or out of some
resource but the server goes to 100 percent cpu and I have to kill it. I
even got it to crash once. Funny thing is after I kill it my
client(mysqlcc) gets the correct result setgo figure. Thanks for the
post, I really like your histogram example.
Using mysqld-nt Alpha 4.1.1 on WinXP, Dell 2.6ghz 600mg ram.

Thanks,
Ed

SELECT s.name, AVG(s.val) AS median
FROM (SELECT x.name, x.val
  FROM data x, data y
  WHERE x.name=y.name
  GROUP BY x.name, x.val
  HAVING SUM(y.val = x.val) = COUNT(*)/2
 AND SUM(y.val = x.val) = COUNT(*)/2) AS s
GROUP BY s.name

-Original Message-
On Friday, December 5, Robert Citek wrote:
 
 On Wednesday, December 3, 2003, at 06:27  PM, Robert Citek wrote:
 
 How can I calculate the mean/median/mode from a set of data using
SQL?
 
 
 After a bit of googling, I found this link:
   http://mysql.progen.com.tr/doc/en/Group_by_functions.html
 and a few answers in the comments towards the bottom.
 
 Below I've included a sample table and the solution I used to
calculate 
 the median.
 
 Regards,
 - Robert
 
 -
 
 DROP TABLE IF EXISTS data;
 
 CREATE TABLE data (
   name char(1) default NULL,
   val int(5) default NULL
 ) TYPE=MyISAM;
 
 INSERT INTO data VALUES
   ('a',2), ('a',2), ('a',2), ('a',2), ('a',20),
   ('b',4), ('b',4), ('b',4), ('b',4), ('b',40);
 
 CREATE TEMPORARY TABLE medians SELECT x.name, x.val
  FROM data x, data y
  WHERE x.name=y.name
  GROUP BY x.name, x.val
  HAVING ((COUNT(*)-(SUM(SIGN(1-SIGN(y.val-x.val) =
floor((COUNT(*) 
 +1)/2)) and
   ((COUNT(*)-(SUM(SIGN(1+SIGN(y.val-x.val) = floor((COUNT(*)
+1)/2));
 
 SELECT name, AVG(val) AS median FROM medians group by name;
 
  Output
 
 +--++
 | name | median |
 +--++
 | a| 2. |
 | b| 4. |
 +--++

Robert,

I don't believe this is correct.  I think it only appears correct due to

the particular nature of your sample data.  Try it with different data 
to see what I mean.  For example,

DROP TABLE IF EXISTS data;

CREATE TABLE data (name char(1) default NULL, val int default NULL);

INSERT INTO data
VALUES ('a',1),  ('a',2),  ('a',2),  ('a',2),  ('a',3),  ('a',6),
('a',7),
('a',11), ('a',11), ('a',12), ('a',13), ('a',13), ('a',20);
INSERT INTO data VALUES ('b',2), ('b',3), ('b',4), ('b',5);

DROP TABLE IF EXISTS medians;

CREATE TEMPORARY TABLE medians SELECT x.name, x.val
FROM data x, data y
WHERE x.name=y.name
GROUP BY x.name, x.val
HAVING
((COUNT(*)-(SUM(SIGN(1-SIGN(y.val-x.val) = floor((COUNT(*) +1)/2))
AND
((COUNT(*)-(SUM(SIGN(1+SIGN(y.val-x.val) = floor((COUNT(*) +1)/2));

SELECT name, AVG(val) AS median FROM medians group by name;

 Output
+--++
| name | median |
+--++
| a| 6.5000 |
| b| 3.5000 |
+--++

As there are an odd number of values with name = 'a', we should choose 
the middle value, 7, as the median, but we got 6.5.  (I'm assuming we 
agree that the definition of median is the middle value for N odd and 
the average of the 2 middle values for N even.)

I tried to see if I could tweak your query to get the right result, but,

between the complexity of your HAVING clause and the wrinkle of getting 
separate answers for each value in the name column, I just couldn't wrap

my head around it.  So, I went looking and found a supposed solution at 
http://www.oreilly.com/catalog/transqlcook/chapter/ch08.html.

It is overly complex, purports to get an answer in a single select, and 
is simply wrong in the case where there are an even number of values. 
(The author also gives a definition of statistical median, as opposed to

financial median, which doesn't match my Intro Stats book.)  But I 
understood what it was trying to do and came up with the following:

 To get the median of the values in a column:

DROP TABLE IF EXISTS medians;

CREATE TEMPORARY TABLE medians
SELECT x.val medians
FROM data x, data y
GROUP BY x.val
HAVING SUM(y.val = x.val) = COUNT(*)/2
AND SUM(y.val = x.val) = COUNT(*)/2;

SELECT AVG(medians) AS median FROM medians;

 Output using my sample data given above
++
| median |
++
| 5. |
++


 To get the median of the values in a column for each value in
 another column:

DROP TABLE IF EXISTS medians;

CREATE TEMPORARY TABLE medians
SELECT x.name, x.val medians
FROM data x, data y
WHERE x.name=y.name
GROUP BY x.name, x.val
HAVING SUM(y.val = x.val) = COUNT(*)/2
AND SUM(y.val = x.val) = COUNT(*)/2;

SELECT name, AVG(medians) AS median FROM medians GROUP BY name;

 Output using my sample data given above
+--++
| name | median |
+--++
| a| 7. |
| b| 3.5000 |
+--++

I've tested these with several different data sets, so I'm fairly 

Re: Solved - median (was Re: mean/median/mode)

2003-12-11 Thread Michael Stassen
On Friday, December 5, Robert Citek wrote:
On Wednesday, December 3, 2003, at 06:27  PM, Robert Citek wrote:

How can I calculate the mean/median/mode from a set of data using SQL?


After a bit of googling, I found this link:
  http://mysql.progen.com.tr/doc/en/Group_by_functions.html
and a few answers in the comments towards the bottom.
Below I've included a sample table and the solution I used to calculate 
the median.

Regards,
- Robert
-

DROP TABLE IF EXISTS data;

CREATE TABLE data (
  name char(1) default NULL,
  val int(5) default NULL
) TYPE=MyISAM;
INSERT INTO data VALUES
  ('a',2), ('a',2), ('a',2), ('a',2), ('a',20),
  ('b',4), ('b',4), ('b',4), ('b',4), ('b',40);
CREATE TEMPORARY TABLE medians SELECT x.name, x.val
 FROM data x, data y
 WHERE x.name=y.name
 GROUP BY x.name, x.val
 HAVING ((COUNT(*)-(SUM(SIGN(1-SIGN(y.val-x.val) = floor((COUNT(*) 
+1)/2)) and
  ((COUNT(*)-(SUM(SIGN(1+SIGN(y.val-x.val) = floor((COUNT(*) +1)/2));

SELECT name, AVG(val) AS median FROM medians group by name;

 Output

+--++
| name | median |
+--++
| a| 2. |
| b| 4. |
+--++
Robert,

I don't believe this is correct.  I think it only appears correct due to 
the particular nature of your sample data.  Try it with different data 
to see what I mean.  For example,

DROP TABLE IF EXISTS data;

CREATE TABLE data (name char(1) default NULL, val int default NULL);

INSERT INTO data
VALUES ('a',1),  ('a',2),  ('a',2),  ('a',2),  ('a',3),  ('a',6),
   ('a',7),
   ('a',11), ('a',11), ('a',12), ('a',13), ('a',13), ('a',20);
INSERT INTO data VALUES ('b',2), ('b',3), ('b',4), ('b',5);
DROP TABLE IF EXISTS medians;

CREATE TEMPORARY TABLE medians SELECT x.name, x.val
FROM data x, data y
WHERE x.name=y.name
GROUP BY x.name, x.val
HAVING
((COUNT(*)-(SUM(SIGN(1-SIGN(y.val-x.val) = floor((COUNT(*) +1)/2)) AND
((COUNT(*)-(SUM(SIGN(1+SIGN(y.val-x.val) = floor((COUNT(*) +1)/2));
SELECT name, AVG(val) AS median FROM medians group by name;

 Output
+--++
| name | median |
+--++
| a| 6.5000 |
| b| 3.5000 |
+--++
As there are an odd number of values with name = 'a', we should choose 
the middle value, 7, as the median, but we got 6.5.  (I'm assuming we 
agree that the definition of median is the middle value for N odd and 
the average of the 2 middle values for N even.)

I tried to see if I could tweak your query to get the right result, but, 
between the complexity of your HAVING clause and the wrinkle of getting 
separate answers for each value in the name column, I just couldn't wrap 
my head around it.  So, I went looking and found a supposed solution at 
http://www.oreilly.com/catalog/transqlcook/chapter/ch08.html.

It is overly complex, purports to get an answer in a single select, and 
is simply wrong in the case where there are an even number of values. 
(The author also gives a definition of statistical median, as opposed to 
financial median, which doesn't match my Intro Stats book.)  But I 
understood what it was trying to do and came up with the following:

 To get the median of the values in a column:

DROP TABLE IF EXISTS medians;

CREATE TEMPORARY TABLE medians
SELECT x.val medians
FROM data x, data y
GROUP BY x.val
HAVING SUM(y.val = x.val) = COUNT(*)/2
AND SUM(y.val = x.val) = COUNT(*)/2;
SELECT AVG(medians) AS median FROM medians;

 Output using my sample data given above
++
| median |
++
| 5. |
++
 To get the median of the values in a column for each value in
 another column:
DROP TABLE IF EXISTS medians;

CREATE TEMPORARY TABLE medians
SELECT x.name, x.val medians
FROM data x, data y
WHERE x.name=y.name
GROUP BY x.name, x.val
HAVING SUM(y.val = x.val) = COUNT(*)/2
AND SUM(y.val = x.val) = COUNT(*)/2;
SELECT name, AVG(medians) AS median FROM medians GROUP BY name;

 Output using my sample data given above
+--++
| name | median |
+--++
| a| 7. |
| b| 3.5000 |
+--++
I've tested these with several different data sets, so I'm fairly 
confident they are correct.

Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Solved - median (was Re: mean/median/mode)

2003-12-04 Thread Robert Citek
On Wednesday, December 3, 2003, at 06:27  PM, Robert Citek wrote:
How can I calculate the mean/median/mode from a set of data using SQL?
After a bit of googling, I found this link:
  http://mysql.progen.com.tr/doc/en/Group_by_functions.html
and a few answers in the comments towards the bottom.
Below I've included a sample table and the solution I used to calculate 
the median.

Regards,
- Robert
-

DROP TABLE IF EXISTS data;

CREATE TABLE data (
  name char(1) default NULL,
  val int(5) default NULL
) TYPE=MyISAM;
INSERT INTO data VALUES
  ('a',2), ('a',2), ('a',2), ('a',2), ('a',20),
  ('b',4), ('b',4), ('b',4), ('b',4), ('b',40);
CREATE TEMPORARY TABLE medians SELECT x.name, x.val
 FROM data x, data y
 WHERE x.name=y.name
 GROUP BY x.name, x.val
 HAVING ((COUNT(*)-(SUM(SIGN(1-SIGN(y.val-x.val) = floor((COUNT(*) 
+1)/2)) and
  ((COUNT(*)-(SUM(SIGN(1+SIGN(y.val-x.val) = floor((COUNT(*) 
+1)/2));

SELECT name, AVG(val) AS median FROM medians group by name;

 Output

+--++
| name | median |
+--++
| a| 2. |
| b| 4. |
+--++
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]