It's a Big O of N.

DVP
----
Dathan Vance Pattishall     http://www.friendster.com

 

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, June 21, 2005 9:39 AM
> To: Dan Bolser
> Cc: mysql@lists.mysql.com
> Subject: Re: Query Complexity (big 'O')
> 
> Dan Bolser <[EMAIL PROTECTED]> wrote on 06/21/2005 09:51:06 AM:
> 
> > Hello,
> 
> > I am interested in the theoretical time / space complexity of SQL
> queries
> > on indexed / non-indexed data.
> 
> > I think I read somewhere that a JOIN on an indexed column 
> is something 
> > like O[mn*log(mn)] (m rows joined to n).
> 
> > I assume without an index it is just O[m*n]
> 
> > Specifically I want to know the complexity of a query that does a 
> > 'cross tabulation'
> 
> > SELECT
> > X,
> > SUM(if(Y=1,Z,0)) AS s1,
> > SUM(if(Y=2,Z,0)) AS s2,
> > SUM(if(Y=3,Z,0)) AS s3,
> > ...
> > FROM
> > T1
> > GROUP BY
> > X;
> > 
> > Assuming both X and Y are indexed, how does the complexity 
> grow with 
> > increasing 's' (more if clauses). More basic, what is the 
> complexity 
> > of the group by statement?
> 
> > Can anyone point me to a good online guide to complexity of SQL?
> 
> > Thanks very much for any suggestions :)
> 
> > Dan.
> 
> I believe you will see a nearly linear growth as you add 
> terms to your query (but only to a point). My reasoning:
> 
> I think I can accurately say the total time spent on any 
> query would be the sum of the times spent on each phase of 
> the query analysis
> 
> Tp - time required to parse the query
> Tj - time to process any table JOINS (initial column 
> identifications, too) Tw - time to process the WHERE clause 
> against your source tables.
> Tgb - time to process the GROUP BY clause Th - time to 
> process the HAVING clause Tob - time to perform the final 
> sort =================================================================
> Tquery - total time for query (sum of all terms above)
> 
> Each one of those phases has a BIG O expression associated 
> with it and each phase has at least one opportunity for query 
> optimization. Some are optimized by table design (field 
> definitions + indexes), others are optimized through 
> efficient SQL (proper JOIN definitions, multistage queries, etc.)
> 
> In your case you are doing at least four things when you add 
> another term to your query:
> 1) you have added another term that needs parsing
> 1) you add another IF() evaluation that must occur for each 
> rows evaluation during the Tw phase.
> 2) you increase how much data must be transferred from the 
> output of the Tw phase, and each phase after that, by one column.
> 3) you add another SUM() evaluation to the Tgb phase of the query.
> 
> So, incrementally you have increased the time required to 
> perform the query by adding another evaluated column but to 
> estimate how much depends on both your hardware and your 
> settings. If, for example, the next column you added forced 
> the engine to evaluate the Tj portion of this query using 
> paged memory, you have just changed the slope of your 
> equation based on meeting a physical limitation and your 
> "time estimate formula" would now have a different set of 
> coefficients. The only way to honestly know how many 
> summarized columns it takes to reach that point is to 
> benchmark the query on your equipment and with your settings 
> under your normal production load. 
> 
> I know I really didn't answer your theory question (not 
> exactly) but I believe that evaluating the "BIG O" of a SQL 
> query is a bit more complex than you presented in your post. 
> I encourage you to keep asking questions and I am sure that 
> you will either tap the list dry or find out what you want to know.
> 
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine
> 
> 

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to