Hi Rhino,

Thanks for that explanation of how and why to use a clustered index ! :)

I am just a complete beginner with MySql. I have extensive experience
with Sql Server, some Oracle and some DB2.

It seems that if I want to create a very large master table with 300
million rows using MYISAM
I could use an autoincrement as the PK and create indexes on the columns
I'm searching by:  e.g. order date 

Accordingly, new records are inserted at the end of the table and this
would make MASS inserts ( millions of rows ) very quick.

I hope this assessment is correct ?

Regards, Paul
 
 

-----Original Message-----
From: Rhino [mailto:[EMAIL PROTECTED] 
Sent: Saturday, June 19, 2004 2:19 PM
To: Heikki Tuuri; [EMAIL PROTECTED]
Subject: Re: Clustered Index - Where is data inserted ? Not Answered

Excuse me for jumping in but isn't a primary key likely to be a poor
choice
for a clustering index? Wouldn't it be better to choose a non-unique key
of
some kind?

I've used clustering indexes for years with DB2 and, assuming InnoDB
uses
the term in the same way, a clustering index has only one effect over
and
above the effect of any other index: it influences the placement of new
rows
when they are inserted in such a way that the database manager *tries*
to
put a new row on the same physical page as another row with the same key
value. This is potentially advantageous because it improves performance
in
the case where a user asks for the rows in the clustering sequence,
assuming
that a large majority of rows *were* actually stored in clustering
sequence.

To illustrate with an example, consider an ORDERS table which contains
orders placed with a manufacturing company. The table is likely to have
a
primary key and, more often than not, the primary key will be
ORDER_NUMBER.
The table is also likely to have a column like ORDER_DATE, which will
normally be non-unique. If I cluster the table on ORDER_NUMBER, the
clustering index is only going to be useful if I write a query that
displays
all my orders in order number sequence. Although that is certainly
something
we do often enough when messing around with test tables, just to be sure
the
data is inserted correctly, that doesn't seem particularly likely in the
real world for real reports. A report we're much more likely to do in
the
real world, with real production data, is to list the orders for a given
day
or week. In that case, that query would work very well if the table had
the
ORDER_DATE as the clustering index, assuming that most of the rows are
actually in clustering sequence.

Again, in the DB2 world, we have to make sure that we leave sufficient
freespace so that new rows *can* go on the same page as rows with the
same
or adjacent keys and we have to reorganize our tables occasionally to
get
all the rows back into clustering order and re-establish the necessary
margin of freespace.

Does InnoDB clustering work the same way? I'm afraid I haven't had any
need
for clustering indexes in my project so far so I haven't had to look for
the
information.

Rhino

----- Original Message ----- 
From: "Heikki Tuuri" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, June 19, 2004 6:12 AM
Subject: Re: Clustered Index - Where is data inserted ? Not Answered


> Paul,
>
> ----- Original Message ----- 
> From: ""Paul Chu"" <[EMAIL PROTECTED]>
> Newsgroups: mailing.database.myodbc
> Sent: Saturday, June 19, 2004 5:07 AM
> Subject: RE: Clustered Index - Where is data inserted ? Not Answered
>
>
> > Appreciate any help at all
> >
> > Thanks, Paul
> >
> >
> > -----Original Message-----
> > From: Paul Chu [mailto:[EMAIL PROTECTED]
> > Sent: Friday, June 18, 2004 10:16 AM
> > To: [EMAIL PROTECTED]
> > Subject: Clustered Index - Where is data inserted ?
> >
> > Hi,
> >
> > Can someone explain how the data is stored in a table using a
clustered
> > index.
> >
> > Does this mean that the data is inserted in the .myd file in sorted
> > index order ?
>
> MyISAM does not have clustered indexes. InnoDB has a clustered index
on
> every table, and normally it is on the PRIMARY KEY.
>
> > If so, how is space made to insert the new records ?
> > Does this make inserting records slow because data is being inserted
in
> > physical sort order in the table ?
>
> For large tables, people usually insert new rows at the high end of
the
> PRIMARY KEY, since the key is usually a generated id. Splitting B-tree
pages
> at the index end is very efficient. In this case having a clustered
index
> poses no overhead.
>
> But if you insert records randomly on the PRIMARY KEY, then B-tree
page
> splits waste disk space, and that will reduce performance somewhat.
>
> > Thanks, Paul
>
> Best regards,
>
> Heikki Tuuri
> Innobase Oy
> Foreign keys, transactions, and row level locking for MySQL
> InnoDB Hot Backup - a hot backup tool for InnoDB which also backs up
MyISAM
> tables
> http://www.innodb.com/order.php
>
> Order MySQL technical support from https://order.mysql.com/
>
>
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]
>
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to