RE: Proper use of indexes / keys

Roger Baklund Sun, 16 Dec 2001 08:39:33 -0800

* Sebastian Bergmann
>   I am currently trying to optimize the indexes / keys for the MySQL
>   schema of phpOpenTracker.


I don't know phpOpenTracker, but I have a few comments.

>   I'm wondering if multiple keys, like
>
>     CREATE TABLE pot_accesslog (
>       accesslog_id int(10) unsigned NOT NULL,
>       client_id    int(10) unsigned NOT NULL,
>       document_id  int(11)          NOT NULL,
>       timestamp    int(10) unsigned NOT NULL,
>
>       PRIMARY KEY (accesslog_id, client_id, document_id, timestamp)
>     );

This primary key makes it possible to have multiple records with the same
accesslog_id, multiple records with the same accesslog_id/client_id
combination, and multiple records with the same
accesslog_id/client_id/document_id combination... it does however not allow
two records with the same accesslog_id/client_id/document_id combination at
the exact same time. Is this what you want?

>   would be better than single keys for each column, like
>
>     CREATE TABLE pot_accesslog (
>       accesslog_id int(10) unsigned NOT NULL,
>       client_id    int(10) unsigned NOT NULL,
>       document_id  int(11)          NOT NULL,
>       timestamp    int(10) unsigned NOT NULL,
>
>       KEY accesslog_id (accesslog_id),
>       KEY client_id    (client_id),
>       KEY document_id  (document_id),
>       KEY timestamp    (timestamp)
>     );

This statement does not define a primary key. Any combination and
duplication is allowed. The restriction on two records with the same
accesslog_id/client_id/document_id combination at the same time does not
apply.

It is normally best to define a primary key on a table, and it is very often
used to restrict what kind of combinations must be unique. However, if the
behaviour with no restrictions on duplicates is correct, you may be better
of without a primary key in this case.

Also, the second create statement defines indexes on all fields, making
almost any query reasonably fast. The first statement only defined one
index: the one on accesslog_id/client_id/document_id/timestamp. Any query
_not_ providing an accesslog_id can not use the index, which is probably a
bad thing. (I have not studied your queries, maybe you allways use the
accesslog_id?)

You can almost always improve the performance of queries by adding an index.
The trick is to add the right index... :) If you for instance often did a
"SELECT * FROM pot_accesslog WHERE client_id=$id AND timestamp > NOW() -
INTERVAL 24 HOUR", you could add an index with the involved fields: KEY
client_time (client_id,timestamp).

You should not create too many indexes, though. Inserts, updates and deletes
are slower when you have more indexes. You need to find indexes that may be
used by different queries: the client_time (client_id,timestamp) index can
also be used when you have a client_id, but not the timestamp. This is
because client_id is the first part of the index.

>   Have a look at
>
>     http://www.sebastian-bergmann.de/single_keys.htm

CREATE TABLE pot_add_data (
  accesslog_id int(10)      unsigned NOT NULL,
  data_field   varchar(32)           NOT NULL,
  data_value   varchar(255)          NOT NULL,

  PRIMARY KEY    (accesslog_id),
  KEY data_field (data_field),
  KEY data_value (data_value)
);

Very often it is better to define this last index like this:

  KEY data_value (data_value(20))

This will save a lot of resources, and most likely improve the performance
of your server...! this is however not allways true: if the data_value is
equal for the first 20 chars this index is useless...

It is almost never a god thing to have an index on the full 255 bytes of the
varchar. The trick here is to find the correct value to use, 20 is of course
not always the best.

The point is that each of these characters of an index must be read into the
memory of the server, if your tables are big a couple of characters can
become megabytes of ram... when a predefined amount (key_buffer_size) is
used, old keys is overwritten by new ones as they are read from disc. When
keys are in ram, the server performs much faster than when it has to read
them from disc. This means that if you have smaller keys, the overall
preformance of the server is improved (unless the server have a lot of
memory and key_buffer_size set at a value higher than the combined size of
all used indexes for all used tables for all used databases on the
server...)

You can check the key_buffer_size of your surver with the statement:

  show variables like "key_buffer_size";

I think the default (on 3.23.x) is 8388600, but I'm not sure about this, it
seems to be pretty low... up to half the amount of total memory is suggested
in the documentation. <URL: http://www.mysql.com/doc/S/H/SHOW_VARIABLES.html
>

>   for a version of the schema that uses single keys, together with
>   the output of EXPLAIN(query) for some of phpOpenTracker most commonly
>   used queries.
>
>   Have a look at
>
>     http://www.sebastian-bergmann.de/multi_keys.htm
>
>   for a version of the schema that uses multiple keys, together with
>   the output of EXPLAIN(query) for some of phpOpenTracker most commonly
>   used queries.
>
>   It'd be really great if some kind soul could have a look at this and
>   tell me what indexing strategy is suited best for this problem.

I hope the above suggestions was usefull :)

--
Roger


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

RE: Proper use of indexes / keys

Reply via email to