Re: Question on Composite Index

ManojW Wed, 13 Apr 2005 01:24:24 -0700

Dear Shawn,
    First off, apologies for the delay in reply to this email.


    Secondly, thanks a lot for a very illuminating dicussion on composite keys 
and the way MySQL handles them. Reading through the whole discussion, I have a 
minor question is popping up in my heads...it is as follows:

    If I have a table with composite key fld1,fld2,fld3,fld4. My normal way of 
handling the situation is to create a unique primary key on 
(fld1,fld2,fld3,fld4) and then create "single" non-primary indices on each of 
the remaining fields (so essentially three indices - fld2-idx, fld3-idx and 
fld4-idx). 

    Based on your experience, Is it more effective (in terms of speed of query 
and cost of insert) to create a composite primary index like 
(fld1-fld2-fld3-fld4), (fld2-fld1-fld-3-fld4) , (fld3-fld1-fld2-fld4) and 
(fld4-fld1-fld2-fld3) thereby bringing all fields (fld1 to fld 4) on the 
leftmost side.....OR Is it better to create one composite primary key index 
(fld1-fld2-fld3-fld4) and three single non-primary indices on fld2,fld3 and 
fld4 respectively ?  Any particular preference one way or another? 

   Thanks!

Cheers

Manoj

  ----- Original Message ----- 
  From: [EMAIL PROTECTED] 
  To: ManojW 
  Cc: MySQL List 
  Sent: Thursday, April 07, 2005 11:44 PM
  Subject: Re: Question on Composite Index




  "ManojW" <[EMAIL PROTECTED]> wrote on 04/06/2005 10:09:31 PM:

  > Dear All,
  >     Just to get a better understanding of how indices work in MySQL - If I
  > have a Innodb table with a composite primary key (fld1,fld2,fld3,fld4,fld5),
  > then my understanding is that MySQL optimizes just the leftmost primary key
  > (fld1 in this case).
  > 
  > Hence a query like select * from tbl1 where fld2 > 900 would result in a
  > full table scan even though it's part of the composite key but select  *
  > from tbl1 where fld1 > 900 would be extremely quicker since it would search
  > based on Index pages.
  > 
  > Is my understanding correct? If so, how can we get around this issue ? In
  > real-life databases you will always run in cases where you end up making a
  > composite key on table. One possible solution would be to create non-unique,
  > non-primary index on each of fld2,fld3,fld4,fld5 but then the inserts would
  > be horribly slow  hence was wondering if I am totally missing a very clean
  > solution to the whole issue.
  > 
  > Your kind help would be greatly appreciated!
  > 
  > Regards
  > 
  > Manoj
  > 
  > 

  I think you have the basics down. I can show you something similar to what 
happens in an index when it is built. Maybe this will explain why you can only 
use an index to resolve the left most columns of a multi-column index. 

  Imagine you have a table, Example1, with columns A, B, C, D,  and E. For the 
purposes of this demonstration, the table will consist of data representing 
every possible combination of only 5 different values for each column (column A 
will only contain the values a1, a2, a3, a4, and a5. The same goes for each of 
the other columns). This means that a small section of the table could look 
like (this represents data rows 1470-1480) 

  Example1 
  +--------------+ 
  | A| B| C| D| E| 
  +--------------+ 
  |      ...     | 
  |a3|b2|c4|d5|e1| 
  |a3|b2|c4|d5|e2| 
  |a3|b2|c4|d5|e3| 
  |a3|b2|c4|d5|e4| 
  |a3|b2|c4|d5|e5| 
  |a3|b2|c5|d1|e1| 
  |a3|b2|c5|d1|e2| 
  |a3|b2|c5|d1|e3| 
  |a3|b2|c5|d1|e4| 
  |a3|b2|c5|d1|e5| 
  |a3|b2|c5|d2|e1| 
  |      ...     | 
  +--------------+ 
  Now let's create an index on the columns A, B, and C. Each index row will 
contain the values of those columns for each row plus an offset into the 
datafile of where to locate that row (so that you can retrieve values from 
columns D or E). If each row's offset is an "o" value (o1 is where data row 1 
starts, o2 is where data row 2 starts, etc...) then the index file looks 
something like this 

  KEY(A,B,C) 
  +----------+-------------+ 
  |Key values|Offset values| 
  +----------+-------------+ 
  |         ...            | 
  |  a3-b2-c4|        o1470| 
  |  a3-b2-c4|        o1471| 
  |  a3-b2-c4|        o1472| 
  |  a3-b2-c4|        o1473| 
  |  a3-b2-c4|        o1474| 
  |  a3-b2-c5|        o1475| 
  |  a3-b2-c5|        o1476| 
  |  a3-b2-c5|        o1477| 
  |  a3-b2-c5|        o1478| 
  |  a3-b2-c5|        o1479| 
  |  a3-b2-c5|        o1480| 
  |         ...            | 
  +------------------------+ 

  If you declare an index on 3 columns, all 3 columns are "hashed" together to 
form the equivalent of a single value (this is not the ONLY way to hash values 
together but it works as an illustration for the purposes of answering your 
question). The index files are sorted according to their hashed values. That 
means that it is very easy to find where the a3 values are (they are all 
together) or the a3-b6 values (as they are also all together) but to find just 
the b5 values in the index, you end up searching the whole thing because there 
could be b5 values associated with ANY of the a* values. 

  So to answer your questions, in order for you to be able to use an index to 
locate a row where a column has a particular value (or range of values) you 
must be able to easily navigate to those values within the index. To do that in 
a reasonable amount of time you have to be able to resolve the values for every 
column that comes before it in the index as well. If the column you are trying 
to match is the leftmost (first listed) column in an index, you only need to 
match on the leftmost portion of the index's key (hash). If the column you want 
to match is the second or later column in the hash, this index can't help you 
make things work any faster. Yes, it may make INSERTs much slower if you put 
each column at the beginning of their own index but you save time during SELECT 
queries because you will probably be able to use an index to find the data you 
seek. 

  Another advantage to indexes is that under the right circumstances, all of 
the data you need to answer a query may be contained within the index itself. 
If this is the case, the database engine doesn't even try to read the actual 
data file, it just gets the data straight from the index file and finishes a 
few steps sooner. Fewer trips to the disk = much faster response = happy 
database users. Sure you are trading INSERT speed for SELECT speed (and 
potentially lots of free space on your disks) but you can't get it both ways. 
Tables with fewer indexes on them are less likely to use an index for any 
random query so they will generally respond more slowly. The upside is, they 
have a smaller footprint on the disk. If you can give up some INSERT speed and 
some free space on your hard drives, then the extra indexes are usually worth 
it for the performance boost they provide. 

  One way to keep indexes small is to only index the columns you need. How many 
queries actually specify all 16 columns of a table in their WHERE clause? 
Probably not too many. The engine can use the partial results of say a 3 column 
match from an index (it narrows down the search area to a "short list" of rows) 
and perform what is called a "range scan" on just those rows to evaluate the 
other values in your WHERE clause. 

  There is a great section in the manual on optimization: 
  http://dev.mysql.com/doc/mysql/en/mysql-optimization.html 

  And here is what it says about optimizing WHERE evaluation: 
  http://dev.mysql.com/doc/mysql/en/where-optimizations.html 

  And this is about how to avoid table scans: 
  http://dev.mysql.com/doc/mysql/en/how-to-avoid-table-scan.html 

  Shawn Green
  Database Administrator
  Unimin Corporation - Spruce Pine

Re: Question on Composite Index

Reply via email to