RE: How to index this properly (to make it go faster) ?

Nemholt, Jesper Frank Sat, 24 Mar 2001 17:01:43 -0800
> -----Original Message-----
> From: Benjamin Pflugmann [mailto:[EMAIL PROTECTED]]
> Sent: sabado, 24 de marzo de 2001 23:31
> To: Nemholt, Jesper Frank
> Cc: '[EMAIL PROTECTED]'
> Subject: Re: How to index this properly (to make it go faster) ?
> 
> First, you should run an analyize on your indexes (e.g. with
> (my-)isamchk). This will give MySQL a value for "Cardinality" and
> enable the optimizer to do its work better.

Allready doing that. It was missing from the first mail since I'd just
created the indexes.

Here's from the real one :

mysql> show index from proc;
+-------+------------+-------------+--------------+-------------+-----------
+-------------+----------+--------+---------+
| Table | Non_unique | Key_name    | Seq_in_index | Column_name | Collation
| Cardinality | Sub_part | Packed | Comment |
+-------+------------+-------------+--------------+-------------+-----------
+-------------+----------+--------+---------+
| proc  |          1 | timecode    |            1 | timecode    | A
|        4220 |     NULL | NULL   |         |
| proc  |          1 | system_id   |            1 | system_id   | A
|           1 |     NULL | NULL   |         |
| proc  |          1 | processname |            1 | processname | A
|          88 |     NULL | NULL   |         |
+-------+------------+-------------+--------------+-------------+-----------
+-------------+----------+--------+---------+

> A good index would probably be 
> (system_id,username,processname,timecode).

The users hardly use 'username' in their where clause. That's why it's left
out (the index started to get rather big also).

> Maybe even 
> (system_id,username,processname,timecode,usertime,systemtime),
> which should enable "using index" only.

Why that, usertime & systemtime are data values, they are never used in any
where clause. As far as I can see it's complete waste to put index on those.
 
> You may want to use partial keys in order to reduce space 
> consumption, e.g.
> Maybe even (system_id,username(5),processname(5),timecode),

This is a no go since alot of processes are identical on the first 5
characters.
The username could be used, but I decided not to, since Oracle (which the
solution also has to be compatible with) is smart enough to only store the
characters in use. I actually think MySQL does the same.

> Depending on the Cardinality of the different columns you may want to
> change the order to (username,processname,system_id,timecode) 
> or alike.

Aha, I'll have to test more with the individual order of combined indexes to
see what I can get from it.

[clip]
> No index is chosen, because you obviouly SELECTed a major part of the
> table. This probably won't happen, if you got these 100 million rows
> are talking about below (because you won't select about 20.000.000
> rows, will you?).

Well, that's the problem. I will ;-(
The database contain performance info for a collection of very large unix
servers. Most of them have more than 1000 concurrent processes, more than
100 disk devices and so on. For 34 machines the amount of data in a day is
more than 100 MB, when sampled every 10 minute.

I currently keep 3 months of history.

The database is used to study performance on the machines, like memory use,
disk use and so on.
The interface is either Microsoft Excel or a PHP/GD based web interface I've
made.

Most selects go only a few days back, since it's mostly a matter of finding
out what caused the machines to be slow, out of memory or whatever the night
before, but sometimes we need more than a month to find out about processes
leaking memory slowly, development of memory usage, disk I/O development and
so on.

Performance has been pretty good until now except on the disk table (due to
> 100 unique disk devices on most machines).
Performance for process went all bad when I recently changed from collecting
only top 20 processes to collecting all (those 500 - 2000 processes).
I could ofcourse go back to top20, but the benefit of having all processes
makes me want to avoid that, and the stuff running in Oracle is not hurt due
to the partitioning.

> Reasonable indexes would be the same as above, which should reduce the
> number of rows to read far enough to prevent a full table scan.
> 
> [...]
> > Now, the problem is that I have about 30 system_id's in the 
> database. Each
> > system insert about 500 rows into the proc table every 10 minute.
> > ...so after a month or so I have alot of rows (> 100 million), and a
> > performance problem.
> > In Oracle I have fixed this by partitioning on system_id 
> and/or timecode,
> > but this is not possible to do transparently in MySQL,
> 
> You could probably use the MERGE table type.

Aha, I'll take a look on that in the manual.

[clip]
> > I have tried to combine the indexes, but then MySQL fails using them
> > properly, 
> 
> Would you please elaborate on this? This is IMHO the main point of
> your problem. You need these indexes.

Problem is that MySQL seems to do a full table scan as soon as I start to
combine the indexes, even after running optimize, analyze & sort after
index.
This is my experience after looking at explain decisions and the time it
takes to execute.
Every explain ends up saying "where used" and the showing the full number of
rows. This is in 3.22.32. My test setup is the latest, it might be different
there.


l8r/Jspr

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: How to index this properly (to make it go faster) ?

Reply via email to