It's a Big O of N.
DVP ---- Dathan Vance Pattishall http://www.friendster.com > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 21, 2005 9:39 AM > To: Dan Bolser > Cc: mysql@lists.mysql.com > Subject: Re: Query Complexity (big 'O') > > Dan Bolser <[EMAIL PROTECTED]> wrote on 06/21/2005 09:51:06 AM: > > > Hello, > > > I am interested in the theoretical time / space complexity of SQL > queries > > on indexed / non-indexed data. > > > I think I read somewhere that a JOIN on an indexed column > is something > > like O[mn*log(mn)] (m rows joined to n). > > > I assume without an index it is just O[m*n] > > > Specifically I want to know the complexity of a query that does a > > 'cross tabulation' > > > SELECT > > X, > > SUM(if(Y=1,Z,0)) AS s1, > > SUM(if(Y=2,Z,0)) AS s2, > > SUM(if(Y=3,Z,0)) AS s3, > > ... > > FROM > > T1 > > GROUP BY > > X; > > > > Assuming both X and Y are indexed, how does the complexity > grow with > > increasing 's' (more if clauses). More basic, what is the > complexity > > of the group by statement? > > > Can anyone point me to a good online guide to complexity of SQL? > > > Thanks very much for any suggestions :) > > > Dan. > > I believe you will see a nearly linear growth as you add > terms to your query (but only to a point). My reasoning: > > I think I can accurately say the total time spent on any > query would be the sum of the times spent on each phase of > the query analysis > > Tp - time required to parse the query > Tj - time to process any table JOINS (initial column > identifications, too) Tw - time to process the WHERE clause > against your source tables. > Tgb - time to process the GROUP BY clause Th - time to > process the HAVING clause Tob - time to perform the final > sort ================================================================= > Tquery - total time for query (sum of all terms above) > > Each one of those phases has a BIG O expression associated > with it and each phase has at least one opportunity for query > optimization. Some are optimized by table design (field > definitions + indexes), others are optimized through > efficient SQL (proper JOIN definitions, multistage queries, etc.) > > In your case you are doing at least four things when you add > another term to your query: > 1) you have added another term that needs parsing > 1) you add another IF() evaluation that must occur for each > rows evaluation during the Tw phase. > 2) you increase how much data must be transferred from the > output of the Tw phase, and each phase after that, by one column. > 3) you add another SUM() evaluation to the Tgb phase of the query. > > So, incrementally you have increased the time required to > perform the query by adding another evaluated column but to > estimate how much depends on both your hardware and your > settings. If, for example, the next column you added forced > the engine to evaluate the Tj portion of this query using > paged memory, you have just changed the slope of your > equation based on meeting a physical limitation and your > "time estimate formula" would now have a different set of > coefficients. The only way to honestly know how many > summarized columns it takes to reach that point is to > benchmark the query on your equipment and with your settings > under your normal production load. > > I know I really didn't answer your theory question (not > exactly) but I believe that evaluating the "BIG O" of a SQL > query is a bit more complex than you presented in your post. > I encourage you to keep asking questions and I am sure that > you will either tap the list dry or find out what you want to know. > > Shawn Green > Database Administrator > Unimin Corporation - Spruce Pine > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]