Re: how optimize and put in order the row?

DL Neil Thu, 08 Nov 2001 16:05:17 -0800

> I mean a database is like a file.
> a file in which there are written the rows in sequentially way.
> for example:
>
> NUMBER
> 2
> 65
...
> 45
> 5
>
> If I tell to mysql to store these rows in order like:
>
> NUMBER
> 1
> 4
...
> 55
> 99
>
> NOTE: not ORDER BY but store in the database (or in file) in this way:
> ordered!
>
> It will be more speed for Mysql if I'll ask a query like that:
>
> SELECT... FROM .... WHERE 50>x>10;
>
> I'm not talking about index but how to say to mysql
> the way of storing the data ( orderly ).
>
> How can I do that?



I feel a tutorial coming on. Please let me know if my English is not good for you...


Usually a database.table consists of two parts. The data part, and the index part. 
There can be more than one
index part. There can be a data part with no index part.

Let's think of the data part as a bookshelf. The shelf is the home of a number of 
books. Think of each book as a
data row or record. If you remove one book from the shelf, and later want to add a new 
book, it might be able to
fit in the space left. If it is too wide, it will have to be placed at the end of the 
shelf. If we only buy thin
books, and we remove a large number of fat books, pretty soon the books along the 
shelf will be in no particular
sequence - there will be lots of empty spaces in between and the books left on the 
shelf will be in an almost
random order. A bit of a mess!

Your first question: what you would like to do, is to organise the physical 
arrangement of data in the data part
of the database. As you say, it would make it easier to find data in the 
database.table. The bookshelf is the
same - if we rearrange the books on the shelf so that they are in some sequence (or 
'order') it becomes easier
to find the one on SQL database theory.

Here's the bad news: we (users and programmers) have no ability to sequence the 
records/rows of data in the data
part of a database. Why not? Because a database is not a bookshelf that you and I can 
view directly and use to
select books/data. We go to the database management system (DBMS), in this case MySQL, 
and ask it to provide us
with data - in other words we go to a librarian and say "please find me the book on 
SQL database theory". The
librarian is happy to do this for us, but how the books are kept/arranged is none of 
our business (your database
is not a public library!).

Now let's talk about the index part. A database table can have no index part, one 
index part, or many indexes.
As you know the object of an index is to make a 'lookup' of data faster. An index is 
made up of one or more
columns from the data part - if more than one column is used, then the different 
fields are concatenated to make
one larger single field as the index. The other half of an index is a series of 
'pointers'. We can't see them.
These pointers belong to the librarian (the DBMS). A pointer connects a key value in 
the index part to the
corresponding data row in the data part. Think of the (old) catalog cards that index 
books in a library - or at
least they used to before computers came along and spoiled a good story...

Because indexes are used to retrieve data, they ARE kept in sequence - and this is 
possible because they take up
less space (than a whole data record). If a data row is removed from the data part, 
the corresponding index
entry must be removed too - and all the other index entries in the index part are 
'moved up' to make a new
sequence. Similarly if a new data row is added, whilst it can go in at the 'end of the 
shelf' or in any
available/spare space in the data part, its corresponding index must be inserted into 
the sequence of index
entries and the rest of the index part of the database.table is 'moved up' to make 
enough free space so that
this can happen. Fortunately all of that is the responsibility of the librarian 
(DBMS), and I for one am quite
glad that we don't have to worry about it.

Now let's get really adventurous. Let's have two index parts for the one data part of 
our database.table! Let's
have one index that is sequenced according to the book's title, and a second which is 
sequenced according to its
author's name. Now when a book is removed from the shelf/the data part, the 
librarian/DBMS must remove not just
the data row, but TWO index entries - one from the list of authors, and one from the 
list of book titles.
Similarly when a new book is added to the shelf/data part, the librarian must make a 
new entry in each of the
two indexes in the index part.

Now we can answer the question you thought up a few paragraphs ago: why are we not 
able/allowed to sequence the
rows in the data part? Because if your database.table has more than one index, which 
of the two or more
sequences will you choose to use to determine that order?

To add to that, imagine the effort that would be required when a new book is added or 
an old one removed, if the
entire database.table had to be reorganised to make free-space and to close up the 
spaces left. It's a quick
task with short indexes, but a much longer/greater effort for columns and columns with 
many rows full of data!

So that's why we use indexes and the DBMS maintains those (hidden) pointers and that's 
why the manual tells you
that an index will help speed up SELECT queries but will slow down UPDATEs, INSERTs, 
and DELETEs.

Now the gloves really come off: If working out that 'trade-off' wasn't enough to get 
you worrying, let's add yet
more to consider. If you think about it, the data values for each column that is also 
an index are effectively
stored twice - once in the data part and secondly in the index part; and with those 
invisible 'pointers' keeping
them 'connected'. That means that every index defined, increases the amount of disk 
(and buffer) space required
to store and run the database.table!

Now to your second question: if the data part of the database.table could be 
sequenced, whenever a SELECT is
performed on that 'key' field, it would be faster because searching could start at one 
value (greater than), and
finish at another (less than) - and indeed that would be true, thus that is exactly 
why we have indexes and what
they are for! Because the data part is not sequenced (and if it were, could only be 
sequenced according to one
index) every SELECT query will need to look at EVERY row in the table - because it 
can't say "I've seen enough"
or "there'll be no more (hits) after this one". Depending upon table size, this could 
take a while!

Yes but - and there's always a "but" isn't there? What about a primary index? This is 
a very good "but", and
it's where I run out of knowledge about the specifics of how MySQL is implemented - 
compared to how other
RDBMSes might work! Despite what some people believe, the data part is not stored in 
the sequence of the primary
index. A primary index is an index, just like any other index. However, some early 
DBMSes (remember those "good
old days"?) decided to have a rule that the primary index had to be the first column 
or series of columns in the
table. That meant that they could put all the data from those columns entirely in the 
index part, and NOT
reproduce it in the data part - saving storage space but relying entirely on those 
invisible 'pointers' to keep
the two 'halves' of each row linked. Indeed, if you study database normalisation, 
second normal form talks of a
"key part" and a "data part" to each row. The primary key is what the mathematicians 
are talking about when they
say "key part"!

Now MySQL says you don't need to have a primary key for a table. The manual also says 
that you can have one or
more indexes, even without (first) having to have a primary key - in which case the 
first-defined index will be
treated as the primary key. I don't know if this means that MySQL will then divide the 
data of each row, as
described, or not. There are some much more technical people on this list who can fill 
in that blank for us...

However the converse also appears to be true. That there is no difference between a 
primary key and an
'ordinary' index. Therefore, there are no efficiency gains for choosing one particular 
index to be the primary
key, over any other index. (again I'll defer to other people's applied expertise)

Finally, back to your overall consideration: If your table has few SELECT operations 
and many more INSERTs,
DELETEs, and/or UPDATEs, then you should consider leaving things as they are and 
running with no indexes.
However if your data usage involves many SELECTs and few of the other operations, 
indexes might give you a
considerable speed advantage, but at an increased cost of storage space!

Hope it helps!
=dn



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: how optimize and put in order the row?

Reply via email to