[EMAIL PROTECTED] wrote:
Guys - I think this fusing of GROUP BY and ORDER BY is a bad thing - even if it works. GROUP should group and if you want an order - add an ORDER clause explicitly. Yes, the GROUP BY does a sort and you are tempted to leverage it, but it could lead to problems.
It may not be visible in single-threaded MySQL, but in Oracle for example, which allows you to run queries in parllel - it becomes clear. In parallel query - each query server does part of the work and one of them (the "uber" query server) glues the partial results together.
For example - Let's say you want to count students by class code. You run a 2-way query - SELECT CLASS_CODE, COUNT(*) FROM TABLE GROUP BY CLASS_CODE.
You can get this result:
CLASS1 111 CLASS2 222 CLASS3 333 CLASS4 444
(the first 2 rows came from query server 1 and the last 2 came from query server 2).
However, you can also get this result:
CLASS3 333 CLASS4 444 CLASS1 111 CLASS2 222
(the first 2 rows came from query server 2 and the last 2 came from query server 1).
Both satisfy the GROUP BY requirement. Within each sub-result it's sorted but the glued total is not. Oracle requires an explicit ORDER BY to make sure the entire result is sorted, which makes sense to me - gives you the option to save on cycles if you just need the grouping and not the ordering.
MySQL, which has big plans for the future - will no doubt introduce parallel query one day - it's a bread and butter requirement for data warehousing, and then the issue will come up. A little extra effort up front - and you won't get killed when the database engine tightens its standards.
IMHO.
To do a GROUP BY, you effectively have to sort the rows into their groups, so it makes sense to present the results sorted. And given mysql's emphasis on speed, I'd be surprised if there were any significant cost to displaying the results sorted.
But what of parallel processing? If each subserver returns its results sorted, it wouldn't be difficult or particularly costly for the uberserver to sort the complete set. In fact, since the uberserver assigns the groups to the subservers in the first place, it must have already sorted the groups, so it could simply glue them back together in order. In other words, I don't believe there are any cycles to be saved. The only advantage I can see to sending each subserver's results unsorted is that the uberserver can send them as they arrive from the subservers. This may add to the *apparent* speed, as you start seeing results sooner, but should take just about as long to get the whole result set.
I'm speculating, of course, as I don't have access to a parallel query Oracle setup. If you do, however, you could check this out. Taking your example, how does the speed of
SELECT class_code, COUNT(*) FROM table GROUP BY class_code
compare to
SELECT class_code, COUNT(*) FROM table GROUP BY class_code ORDER BY class_code
with a parallel query setup?
Michael
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]