Re: Question on Composite Index

SGreen Thu, 07 Apr 2005 07:50:48 -0700

"ManojW" <[EMAIL PROTECTED]> wrote on 04/06/2005 10:09:31 PM:


> Dear All,
>     Just to get a better understanding of how indices work in MySQL - If 
I
> have a Innodb table with a composite primary key 
(fld1,fld2,fld3,fld4,fld5),
> then my understanding is that MySQL optimizes just the leftmost primary 
key
> (fld1 in this case).
> 
> Hence a query like select * from tbl1 where fld2 > 900 would result in a
> full table scan even though it's part of the composite key but select  *
> from tbl1 where fld1 > 900 would be extremely quicker since it would 
search
> based on Index pages.
> 
> Is my understanding correct? If so, how can we get around this issue ? 
In
> real-life databases you will always run in cases where you end up making 
a
> composite key on table. One possible solution would be to create 
non-unique,
> non-primary index on each of fld2,fld3,fld4,fld5 but then the inserts 
would
> be horribly slow  hence was wondering if I am totally missing a very 
clean
> solution to the whole issue.
> 
> Your kind help would be greatly appreciated!
> 
> Regards
> 
> Manoj
> 
> 

I think you have the basics down. I can show you something similar to what 
happens in an index when it is built. Maybe this will explain why you can 
only use an index to resolve the left most columns of a multi-column 
index.

Imagine you have a table, Example1, with columns A, B, C, D,  and E. For 
the purposes of this demonstration, the table will consist of data 
representing every possible combination of only 5 different values for 
each column (column A will only contain the values a1, a2, a3, a4, and a5. 
The same goes for each of the other columns). This means that a small 
section of the table could look like (this represents data rows 1470-1480)

Example1
+--------------+
| A| B| C| D| E|
+--------------+
|      ...     |
|a3|b2|c4|d5|e1|
|a3|b2|c4|d5|e2|
|a3|b2|c4|d5|e3|
|a3|b2|c4|d5|e4|
|a3|b2|c4|d5|e5|
|a3|b2|c5|d1|e1|
|a3|b2|c5|d1|e2|
|a3|b2|c5|d1|e3|
|a3|b2|c5|d1|e4|
|a3|b2|c5|d1|e5|
|a3|b2|c5|d2|e1|
|      ...     |
+--------------+
Now let's create an index on the columns A, B, and C. Each index row will 
contain the values of those columns for each row plus an offset into the 
datafile of where to locate that row (so that you can retrieve values from 
columns D or E). If each row's offset is an "o" value (o1 is where data 
row 1 starts, o2 is where data row 2 starts, etc...) then the index file 
looks something like this

KEY(A,B,C)
+----------+-------------+
|Key values|Offset values|
+----------+-------------+
|         ...            |
|  a3-b2-c4|        o1470|
|  a3-b2-c4|        o1471|
|  a3-b2-c4|        o1472|
|  a3-b2-c4|        o1473|
|  a3-b2-c4|        o1474|
|  a3-b2-c5|        o1475|
|  a3-b2-c5|        o1476|
|  a3-b2-c5|        o1477|
|  a3-b2-c5|        o1478|
|  a3-b2-c5|        o1479|
|  a3-b2-c5|        o1480|
|         ...            |
+------------------------+

If you declare an index on 3 columns, all 3 columns are "hashed" together 
to form the equivalent of a single value (this is not the ONLY way to hash 
values together but it works as an illustration for the purposes of 
answering your question). The index files are sorted according to their 
hashed values. That means that it is very easy to find where the a3 values 
are (they are all together) or the a3-b6 values (as they are also all 
together) but to find just the b5 values in the index, you end up 
searching the whole thing because there could be b5 values associated with 
ANY of the a* values.

So to answer your questions, in order for you to be able to use an index 
to locate a row where a column has a particular value (or range of values) 
you must be able to easily navigate to those values within the index. To 
do that in a reasonable amount of time you have to be able to resolve the 
values for every column that comes before it in the index as well. If the 
column you are trying to match is the leftmost (first listed) column in an 
index, you only need to match on the leftmost portion of the index's key 
(hash). If the column you want to match is the second or later column in 
the hash, this index can't help you make things work any faster. Yes, it 
may make INSERTs much slower if you put each column at the beginning of 
their own index but you save time during SELECT queries because you will 
probably be able to use an index to find the data you seek.

Another advantage to indexes is that under the right circumstances, all of 
the data you need to answer a query may be contained within the index 
itself. If this is the case, the database engine doesn't even try to read 
the actual data file, it just gets the data straight from the index file 
and finishes a few steps sooner. Fewer trips to the disk = much faster 
response = happy database users. Sure you are trading INSERT speed for 
SELECT speed (and potentially lots of free space on your disks) but you 
can't get it both ways. Tables with fewer indexes on them are less likely 
to use an index for any random query so they will generally respond more 
slowly. The upside is, they have a smaller footprint on the disk. If you 
can give up some INSERT speed and some free space on your hard drives, 
then the extra indexes are usually worth it for the performance boost they 
provide.

One way to keep indexes small is to only index the columns you need. How 
many queries actually specify all 16 columns of a table in their WHERE 
clause? Probably not too many. The engine can use the partial results of 
say a 3 column match from an index (it narrows down the search area to a 
"short list" of rows) and perform what is called a "range scan" on just 
those rows to evaluate the other values in your WHERE clause.

There is a great section in the manual on optimization:
http://dev.mysql.com/doc/mysql/en/mysql-optimization.html

And here is what it says about optimizing WHERE evaluation:
http://dev.mysql.com/doc/mysql/en/where-optimizations.html

And this is about how to avoid table scans:
http://dev.mysql.com/doc/mysql/en/how-to-avoid-table-scan.html

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Re: Question on Composite Index

Reply via email to