Re: Clustered Index - Where is data inserted ? Not Answered

Rhino Sat, 19 Jun 2004 21:12:37 -0700

----- Original Message ----- 
From: "Paul Chu" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, June 19, 2004 10:55 PM
Subject: RE: Clustered Index - Where is data inserted ? Not Answered



> Hi Rhino,
>
> Thanks for that explanation of how and why to use a clustered index ! :)
>
Again, I really haven't looked into how MySQL/InnoDB handles clustered
indexes. I told you how they work in DB2, which I know much better than
MySQL/InnoDB. Before using anything I say, I strongly advise you to check
the MySQL/InnoDB documentation to see how *they* handle clustered indexes,
it might not be the same.

> I am just a complete beginner with MySql. I have extensive experience
> with Sql Server, some Oracle and some DB2.
>
> It seems that if I want to create a very large master table with 300
> million rows using MYISAM
> I could use an autoincrement as the PK and create indexes on the columns
> I'm searching by:  e.g. order date
>
Remember, clustering indexes have nothing to do with your PK or other
indexes. Again, assuming MySQL/InnoDB works the same as DB2, a clustering
index has all the same attributes as any other index PLUS one other
attribute: it controls the placement of new rows in a table.

Which columns should comprise your primary key and what other non-clustering
indexes you should create still need to get decided. The primary key is
probably the most important decision you need to make: remember, it could be
single or multi-column but it must be unique and there can't be nulls in any
part of the primary key. (Again, those are the rules for DB2; things may
differ in MySQL for all I know.) The choice of primary key normally results
from normalization; typically, you put your data into Third (or Fourth)
Normal Form and then denormalize on a case by case basis if it seems
appropriate.

I've already explained the key thinking behind the selection of the
clustering index. There's one other thing to remember about clustering
indexes: you can only have one of them on a given table! That should be
obvious if you think about it for a second: since the clustering index
influences the physical placement of data and you can only store the data in
one order, naturally there can only be one clustering index.

Other non-clustering indexes are often chosen on the basis of normalization
or simply familiarity with the data. For instance, most people would
intuitively put an index on Social Security Number because they know it is
unique and it could easily be something that we know about an employee so
that we can look up his records.

Other non-clustering indexes will be less obvious. They may or may not be
unique. They may be single-column or multi-column. Their main job though
will be to help you find things faster. In other words, they are a
performance technique. In the case of DB2 (and hopefully MySQL/InnoDB), any
query will return the right answer, even if you don't have any indexes.
However, having an appropriate index should get you the answer
(considerably) more quickly, assuming the optimizer actually uses the index.
(That is not always a sure thing in DB2 - usually for good reasons - but I
*think* that MySQL lets you specify that you want to use a specific index in
a query.)

 Since I have no idea of how the MySQL optimizer works, I won't make any
further remarks about it at this time for fear of putting my foot really
deep into my mouth ;-)

> Accordingly, new records are inserted at the end of the table and this
> would make MASS inserts ( millions of rows ) very quick.
>
> I hope this assessment is correct ?
>
It *should* work the way you say, again assuming that clustering works the
same in MySQL as in DB2. But, as always with databases, you should
definitely test your design with a test database first and then monitor
carefully as you scale up the amount of data to be sure that you don't get
any rude surprises.

And be sure to get someone like Heikki to verify that what I've said is
applicable to MySQL/InnoDB. I don't want to sent you astray with
inappropriate extrapolations from DB2!

Rhino




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Clustered Index - Where is data inserted ? Not Answered

Reply via email to