Re: 'group by' does a free 'sort by'?

Michael Stassen Sat, 29 May 2004 14:50:11 -0700


[EMAIL PROTECTED] wrote:

Guys - I think this fusing of GROUP BY and ORDER BY is a bad thing - even if
it works. GROUP should group and if you want an order - add an ORDER clause
explicitly. Yes, the GROUP BY does a sort and you are tempted to leverage it,
but it could lead to problems.

It may not be visible in single-threaded MySQL, but in Oracle for example,
which allows you to run queries in parllel - it becomes clear. In parallel
query - each query server does part of the work and one of them (the "uber"
query server) glues the partial results together.

For example - Let's say you want to count students by class code.
You run a 2-way query - SELECT CLASS_CODE, COUNT(*) FROM TABLE GROUP BY
CLASS_CODE.

You can get this result:

CLASS1 111
CLASS2 222
CLASS3 333
CLASS4 444

(the first 2 rows came from query server 1 and the last 2 came from query
server 2).

However, you can also get this result:

CLASS3 333
CLASS4 444
CLASS1 111
CLASS2 222

(the first 2 rows came from query server 2 and the last 2 came from query
server 1).

Both satisfy the GROUP BY requirement. Within each sub-result it's sorted
but the glued total is not. Oracle requires an explicit ORDER BY to make
sure the entire result is sorted, which makes sense to me - gives you the
option to save on cycles if you just need the grouping and not the ordering.

MySQL, which has big plans for the future - will no doubt introduce parallel
query one day - it's a bread and butter requirement for data warehousing,
and then the issue will come up. A little extra effort up front - and you
won't get killed when the database engine tightens its standards.

IMHO.

To do a GROUP BY, you effectively have to sort the rows into their groups, so it makes sense to present the results sorted. And given mysql's emphasis on speed, I'd be surprised if there were any significant cost to displaying the results sorted.

But what of parallel processing? If each subserver returns its results sorted, it wouldn't be difficult or particularly costly for the uberserver to sort the complete set. In fact, since the uberserver assigns the groups to the subservers in the first place, it must have already sorted the groups, so it could simply glue them back together in order. In other words, I don't believe there are any cycles to be saved. The only advantage I can see to sending each subserver's results unsorted is that the uberserver can send them as they arrive from the subservers. This may add to the *apparent* speed, as you start seeing results sooner, but should take just about as long to get the whole result set.

I'm speculating, of course, as I don't have access to a parallel query Oracle setup. If you do, however, you could check this out. Taking your example, how does the speed of

  SELECT class_code, COUNT(*) FROM table
  GROUP BY class_code

compare to

  SELECT class_code, COUNT(*) FROM table
  GROUP BY class_code ORDER BY class_code

with a parallel query setup?

Michael


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: 'group by' does a free 'sort by'?

Reply via email to