Re: 'group by' does a free 'sort by'?

Udikarni Tue, 01 Jun 2004 16:03:57 -0700

Michael,

You are right - the cycle savings are minimal - usually because once the GROUPing is 
done - there aren't very many rows to be returned and sorting a few hundred of even a 
few thousand of them is pretty instantaneous.


It's really mostly philosophical - when you only specify GROUP BY - it means you want 
it grouped and the database developers have the liberty of returning the data in any 
which order. If the order is important - specify ORDER BY.

There's no right or wrong on philosophy. I was just trying to highlight the danger of 
relying on default functionality and not being explicit. Years down the road - the 
database changes its perspective and old code which still runs starts producing 
unexpected results.

Another example:

Oracle, for the longest time did not allow an ORDER BY at table create:

CREATE TABLE X AS SELECT * FROM Y ORDER BY Z;

There was no technical reason why they couldn't sort. Their rationale was that the 
physical order of the data in X is immaterial and should be controlled at the time of 
the SELECT from X - so it's inappropriate to request it at create time. They 
eventually bowed to  public pressure and allowed it, but it just serves as another 
example of SQL philosophy taking priority over convenience.
 


In a message dated 5/29/2004 5:50:11 PM Eastern Daylight Time, [EMAIL PROTECTED] 
writes:

> 
> 
> [EMAIL PROTECTED] wrote:
> 
> > Guys - I think this fusing of GROUP BY and ORDER BY is a bad thing - even if
> > it works. GROUP should group and if you want an order - add an ORDER clause
> > explicitly. Yes, the GROUP BY does a sort and you are tempted to leverage it,
> > but it could lead to problems.
> > 
> > It may not be visible in single-threaded MySQL, but in Oracle for example,
> > which allows you to run queries in parllel - it becomes clear. In parallel
> > query - each query server does part of the work and one of them (the "uber"
> > query server) glues the partial results together.
> > 
> > For example - Let's say you want to count students by class code.
> > You run a 2-way query - SELECT CLASS_CODE, COUNT(*) FROM TABLE GROUP BY
> > CLASS_CODE.
> > 
> > You can get this result:
> > 
> > CLASS1 111
> > CLASS2 222
> > CLASS3 333
> > CLASS4 444
> > 
> > (the first 2 rows came from query server 1 and the last 2 came from query
> > server 2).
> > 
> > However, you can also get this result:
> > 
> > CLASS3 333
> > CLASS4 444
> > CLASS1 111
> > CLASS2 222
> > 
> > (the first 2 rows came from query server 2 and the last 2 came from query
> > server 1).
> > 
> > Both satisfy the GROUP BY requirement. Within each sub-result it's sorted
> > but the glued total is not. Oracle requires an explicit ORDER BY to make
> > sure the entire result is sorted, which makes sense to me - gives you the
> > option to save on cycles if you just need the grouping and not the ordering.
> > 
> > MySQL, which has big plans for the future - will no doubt introduce parallel
> > query one day - it's a bread and butter requirement for data warehousing,
> > and then the issue will come up. A little extra effort up front - and you
> > won't get killed when the database engine tightens its standards.
> > 
> > IMHO.
> 
> To do a GROUP BY, you effectively have to sort the rows into their groups, 
> so it makes sense to present the results sorted.  And given mysql's emphasis 
> on speed, I'd be surprised if there were any significant cost to displaying 
> the results sorted.
> 
> But what of parallel processing?  If each subserver returns its results 
> sorted, it wouldn't be difficult or particularly costly for the uberserver 
> to sort the complete set.  In fact, since the uberserver assigns the groups 
> to the subservers in the first place, it must have already sorted the 
> groups, so it could simply glue them back together in order.  In other 
> words, I don't believe there are any cycles to be saved.  The only advantage 
> I can see to sending each subserver's results unsorted is that the 
> uberserver can send them as they arrive from the subservers.  This may add 
> to the *apparent* speed, as you start seeing results sooner, but should take 
> just about as long to get the whole result set.
> 
> I'm speculating, of course, as I don't have access to a parallel query 
> Oracle setup.  If you do, however, you could check this out.  Taking your 
> example, how does the speed of
> 
>    SELECT class_code, COUNT(*) FROM table
>    GROUP BY class_code
> 
> compare to
> 
>    SELECT class_code, COUNT(*) FROM table
>    GROUP BY class_code ORDER BY class_code
> 
> with a parallel query setup?
> 
> Michael
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:   
>  http://lists.mysql.com/[EMAIL PROTECTED]
> 
> 
> 
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: 'group by' does a free 'sort by'?

Reply via email to