[EMAIL PROTECTED] wrote:

news <[EMAIL PROTECTED]> wrote on 11/30/2004 07:58:18 AM:


Michael J. Pawlowsky wrote:

Because you will be downloading 500,000 rows... And I don't really think that was the point. Who cares what the example is. Personally I
was quite impressed with great explanation.

I second that it was a great explanation! Please don't get me wrong! And furthermore I will only download one row two times: select count -> one row select ... limit count/2, 1 -> one row

Who's to say that his SP can't use your method and work just that much faster?


The point was that stored procedures can "automate" complex processes and that the end user of the database doesn't necessarily need to know how the process works (or how to code that process in SQL) if they know how to call the procedure and interpret its results. Stored procedures provides a method for a DBA to efficiently provide DB users with results they would normally need to code by hand to achieve. How many database users do you know that understand how to correctly compute a median value or generate a cross tab query? If you, the DBA, write stored procedures or UDFs to perform these and other "complex" tasks (relatively speaking) then you have simplified the end user's data access in some significant ways. There are much more complex things you can do with SPs than just computing medians but it made an EXCELLENT example.

As a mathematician, I'd like to point out that medians aren't quite that simple. "select ... limit count/2, 1" will not work at least half the time. There are two possibilities:


* count is odd - The median is the value in the middle, but count/2 is a decimal, so you have something like LIMIT 13.5, 1. Mysql (4.1.7, anyway) handles this by ignoring the decimal and gives the correct answer, but this is problematic. The manual <http://dev.mysql.com/doc/mysql/en/SELECT.html> clearly states "LIMIT takes one or two numeric arguments, which must be integer constants." Hence we are relying on an undocumented feature which could easily disappear.

* count is even - In this case, there is no middle value! The median is the average of the 2 values on either side of the middle. count/2 is a positive integer, however, so "limit count/2, 1" will retrieve a row, but it is *not* the median.

Hence, network traffic is not an issue, but there is still work to be done. You have to get the count, check if it is even or odd, then proceed accordingly. In the even case, you have to retrieve two rows, then average them. You can do all this in code on the client end, or do it on the server in a stored procedure, making the client's life easier (and improving his/her chances of getting it right).

For completeness, here's a method to get the median in SQL:

#### To get the median of the values in a column (val):

CREATE TEMPORARY TABLE medians
SELECT x.val medians
FROM data x, data y
GROUP BY x.val
HAVING SUM(y.val <= x.val) >= COUNT(*)/2
AND SUM(y.val >= x.val) >= COUNT(*)/2;

SELECT AVG(medians) AS median FROM medians;

DROP TABLE medians;

#### To get the groupwise median of the values in a column (val) for each
#### value in another column (name):

CREATE TEMPORARY TABLE medians
SELECT x.name, x.val medians
FROM data x, data y
WHERE x.name=y.name
GROUP BY x.name, x.val
HAVING SUM(y.val <= x.val) >= COUNT(*)/2
AND SUM(y.val >= x.val) >= COUNT(*)/2;

SELECT name, AVG(medians) AS median FROM medians GROUP BY name;

DROP TABLE medians;

Michael


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to