Re: What to index?

2006-08-22 Thread Chris

Chris Knipe wrote:

Hi,

I got 4 relatively big (for me at least) queries.  At the moment, the 
data in the tables are merely test data, but once the system goes into 
production, I'm expecting millions of records in most of the tables.  
I'm trying very hard thus to optimise my queries and tables to ensure I 
get a reasonably good throughput once these tables start filling up, as 
altering the tables for indexes once populated is obviously something I 
want to try and avoid.


It's the best way unfortunately because then you can see exactly how the 
database will react with the amount of data you're using.




Is the above optimised?  What can be done here to improve things


None of that is using indexes.

The "possible keys" column means "I have these indexes and I may use them".

The "keys" index shows which one it's actually using (and in your case 
they are all null which means none are chosen).


First rule:

any joins should be indexed, so index these columns (some may be primary 
keys and they are already indexed):


`UserChecks`.UserID
`User`.EntryID
`User`.GroupID
`Group`.EntryID
`GroupTimes`.GroupID
`Group`.EntryID
`Group`.RealmID
`Realm`.EntryID
`Realm`.ClientID
`Client`.EntryID


Are you expecting lots of people to be active or inactive?

If you're expecting a lot of people to be inactive, you could index the 
columns in this part:


   ->  WHERE `User`.isActive='y' AND
   ->`Group`.isActive='y' AND
   ->`Realm`.isActive='y' AND
   ->`Client`.isActive='y' AND

but if a lot (more than 30%) are going to be active, then it's not worth 
it (mysql will ignore the index because it returns too many results).



I'm not sure whether the rest of your query can be optimized because of 
the function calls:


   ->`GroupTimes`.DOW=DAYOFWEEK(DATE_ADD('1970-01-01', INTERVAL 
UNIX_TIMESTAMP() SECOND)) AND
   ->`GroupTimes`.StartTime < 
DATE_FORMAT(DATE_ADD('1970-01-01', INTERVAL UNIX_TIMESTAMP() SECOND), 
'%H:%i:%S') AND
   ->`GroupTimes`.StopTime > DATE_FORMAT(DATE_ADD('1970-01-01', 
INTERVAL UNIX_TIMESTAMP() SECOND), '%H:%i:%S') AND


What language are you using? PHP?

You could work out part of this in php:

$dayofweek = date('w'); (see http://www.php.net/date)
$starttime = date('H:i:s');

etc.

I'm sure other languages have similar ways of doing this.

and that would give mysql a constant to work with (which should be 
indexable) instead of having to work it out for each row it finds.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



What to index?

2006-08-22 Thread Chris Knipe

Hi,

I got 4 relatively big (for me at least) queries.  At the moment, the data 
in the tables are merely test data, but once the system goes into 
production, I'm expecting millions of records in most of the tables.  I'm 
trying very hard thus to optimise my queries and tables to ensure I get a 
reasonably good throughput once these tables start filling up, as altering 
the tables for indexes once populated is obviously something I want to try 
and avoid.


Having looked at one of the four queries (they're very similar), EXPLAIN 
jeilds the following:

mysql> EXPLAIN SELECT `UserChecks`.EntryID,
   ->`UserChecks`.Attribute,
   ->`UserChecks`.op,
   ->`UserChecks`.Value
   ->   FROM `UserChecks`
   ->   LEFT JOIN `User` ON `UserChecks`.UserID=`User`.EntryID
   ->   LEFT JOIN `Group` ON `User`.GroupID=`Group`.EntryID
   ->   LEFT JOIN `GroupTimes` ON `GroupTimes`.GroupID=`Group`.EntryID
   ->   LEFT JOIN `Realm` ON `Group`.RealmID=`Realm`.EntryID
   ->   LEFT JOIN `Client` ON `Realm`.ClientID=`Client`.EntryID
   ->  WHERE `User`.isActive='y' AND
   ->`Group`.isActive='y' AND
   ->`Realm`.isActive='y' AND
   ->`Client`.isActive='y' AND
   ->`GroupTimes`.DOW=DAYOFWEEK(DATE_ADD('1970-01-01', INTERVAL 
UNIX_TIMESTAMP() SECOND)) AND
   ->`GroupTimes`.StartTime < DATE_FORMAT(DATE_ADD('1970-01-01', 
INTERVAL UNIX_TIMESTAMP() SECOND), '%H:%i:%S') AND
   ->`GroupTimes`.StopTime > DATE_FORMAT(DATE_ADD('1970-01-01', 
INTERVAL UNIX_TIMESTAMP() SECOND), '%H:%i:%S') AND
   ->`Realm`.Realmname=SUBSTRING(TRIM(LOWER('[EMAIL PROTECTED]')), 
(LOCATE('@', TRIM(LOWER('[EMAIL PROTECTED]'))) +1)) AND
   ->`User`.Username=LEFT(TRIM(LOWER('[EMAIL PROTECTED]')), 
(LOCATE('@', TRIM(LOWER('[EMAIL PROTECTED]'))) -1))

   ->  GROUP BY `UserChecks`.Attribute
   ->  ORDER BY `UserChecks`.Attribute;
++-+++---+--+-+--+--+-+
| id | select_type | table  | type   | possible_keys | key  | key_len | 
ref  | rows | Extra   |

++-+++---+--+-+--+--+-+
|  1 | SIMPLE  | User   | system | PRIMARY   | NULL | NULL| 
NULL |1 | Using temporary; Using filesort |
|  1 | SIMPLE  | Group  | system | PRIMARY   | NULL | NULL| 
NULL |1 | |
|  1 | SIMPLE  | Realm  | system | PRIMARY   | NULL | NULL| 
NULL |1 | |
|  1 | SIMPLE  | Client | system | PRIMARY   | NULL | NULL| 
NULL |1 | |
|  1 | SIMPLE  | UserChecks | ALL| NULL  | NULL | NULL| 
NULL |3 | Using where |
|  1 | SIMPLE  | GroupTimes | ALL| NULL  | NULL | NULL| 
NULL |5 | Using where |

++-+++---+--+-+--+--+-+
6 rows in set (0.02 sec)

There are two items in my WHERE clause that does not use indexes.  One on 
the UserCheck table, and another on the GroupTimes table.


On the GroupTimes, I have DOW as a ENUM([1..7]) - I can more than likely add 
a index, but with a cardinality of 7, I doubt it will be worth it. 
StartTime and StopTime is a TIME field, I can add indexes on those.  I'm 
thus pretty sorted on the GroupTimes table (I think).


I have NO idea where in the WHERE clause I am referencing the UserChecks 
tables (except for the GROUP BY and ORDER BY).  I've added an index on the 
Attribute column (VARCHAR(32))  but the EXPLAIN still shows that it is not 
using the index


Is the above optimised?  What can be done here to improve things


Thanks allot,
Chris.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]