Dan Bolser wrote:
> Hello, 
> 
> I am interested in the theoretical time / space complexity of SQL queries
> on indexed / non-indexed data. 
> 
> I think I read somewhere that a JOIN on an indexed column is something
> like O[mn*log(mn)] (m rows joined to n).
> 
> I assume without an index it is just O[m*n]
> 
> Specifically I want to know the complexity of a query that does a
> 'cross tabulation'
> 
> SELECT
>   X, 
>   SUM(if(Y=1,Z,0)) AS s1,
>   SUM(if(Y=2,Z,0)) AS s2,
>   SUM(if(Y=3,Z,0)) AS s3,
>   ...
> FROM 
>   T1
> GROUP BY 
>   X;
> 
> Assuming both X and Y are indexed, how does the complexity grow with
> increasing 's' (more if clauses). More basic, what is the complexity of
> the group by statement?
> 
> Can anyone point me to a good online guide to complexity of SQL?
> 
> Thanks very much for any suggestions :)
> 
> Dan.
> 

It's a bit more complex than that, I'm not an expert of mathematics, so
here I'll try to explain things as I know them, hope to give you all the
elements needed to calculate the space complexity again.

First the previous query don't use indexes at all, you can see this from
the output of :
EXPLAIN SELECT ... GROUP BY X;

To take advantage from indexes the query could be written as:

SELECT X, 1 AS Y, SUM(Z) AS s1 FROM  T1 WHERE Y=1 GROUP BY X
UNION
SELECT X, 2 AS Y, SUM(Z) AS s1 FROM  T1 WHERE Y=2 GROUP BY X
UNION
SELECT X, 3 AS Y, SUM(Z) AS s1 FROM  T1 WHERE Y=2 GROUP BY X

this way whatever the complexity is it will end with the summa of all
query, in this case 3 * complexity

Now, how to build indexes:
The first place to look is the WHERE clause, it's the first used to cut
unwanted data.
The index will contain Y at his inside, to be more exact the index
*must* have Y as first member to be used.

Then examine the GROUP BY clause, to group the database must order for
the content of the groups. In this case we want to index for X.
There is a problem here, generally databases can't use two index for a
single table (not totally true, take it as is for now).
As a result of this we *must* create an index that contain the ordered
couple Y,X .

The analisys can finish here, but there is still space for another
optimization, this one must be evaluated every time knowing the shape of
the table, the amount of data contained etc.
We can see that the only other element of the query is Z .
Indexes are kept separated from data on the disk, so if all the data
needed is contained into the index we can avoid a second disk read for
the data.
having an index on (Y,X,Z) in this order permit to access only the
indexes and not the table data.

play with EXPLAIN to learn more on how indexes are used, it's very
informative.

HTH
Francesco Riosa



-- 
 ....................................................................
. These pages are best viewed by coming to my house and looking at   .
. my monitor. [S. Lucas Bergman (on his website)]                    .
 ....................................................................

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to