That is excellent. I was not expecting such a good response.

So with the info you now provide I am right in saying that on each table MySQL will only use 1 index, the one that the table analyzer chooses to be the most suited. If I was to have a separate index, 1 on cat and 1 on date it couldnt use both together, I would have to have a combined one of (cat, date). Please tell me I am understanding this correctly.

So for my query I would set up indexes as follows:

SELECT p.* FROM p_cat c, p_ad p WHERE p.cat = c.id AND c.lft
BETWEEN 4 AND 5 ORDER BY p.date DESC LIMIT 0,30;

c = PRIMARY (c.id) INDEX (c.lft)
p = PRIMARY (p.id) INDEX (p.cat, p.date)

That performs well only if I remove the ORDER BY DESC part. Now I have optimized my table with your explanation of indexes as soon as I add ORDER BY DESC it goes straight back to using filesort, using temporary with no index chosen for the p table.

Any ideas to optimize the ORDER BY DESC is warmly welcome...

And thanks again for your detailed response.

Cheers

Steve....



sean c peters wrote:

(note: all terminology is made up, and not necessarily standard, but the concepts should be fine)

It may help to think about indexes as if you were accessing physical records, such as in a library. For instance, if you want to find all the Kurt Vonnegut, Jr books, and there is an index by author, just by looking at the index, you would get a list of all the books.

Now imagine trying to find only the Kurt Vonnegut, Jr books published between 1970 and 1982 for instance. Just by looking at the author index, you will not be able to do this. The author index would again tell you all the Kurt Vonnegut, Jr books, and then you'd need to look at each of those books information to determine the publication year.

So lets say that in addition to the Author index, we have an index that references books by their publication year. By looking at the published year index, you could get all the books published between 1970 and 1982, but you would not be able to tell who the author is, except by looking at the proper info for all of the books the index told you were published between 1970 and 1982. That probably isnt very helpful.

If you look at the author index and get all the vonnegut books, and look at the published year index and get all books published between 1970 and 1982, you could take the common members of both those sets (set intersection) and that would be the books you want. But again, this a a time consuming process.

But, you can create an index on multiple columns, for instance (author, published_year). This index looks something like this (i made up the dates, i dont know when each was published)

...
Von Neuman, 1942        -> Qunatum Theory
Vonnegut,1965   -> Slaughterhouse 5
Vonnegut,1979   -> Cats Cradle
Vonnegut,1999   -> God Bless You, Dr Kevorkian
Vonnegut,2000   -> Timequake
Vonden, 1922            -> Something
...

So with this index, you can look up by author, and then directly by year, instead of having to actually examine each record. Indexes like like are really the concatenation of the columns involved, in the order specified.

Everything said above is for doing a query on only 1 table, but most of the concepts transfer to more complicated queries.

When querying multiple tables, more stuff is going on.
I'll try to explain in terms of your sample query.



SELECT p.* FROM p_cat c, p_ad p WHERE p.cat = c.id AND c.lft
BETWEEN 4 AND 5 ORDER BY p.date DESC LIMIT 0,30;




Generally the query optimizer will choose one of the WHERE conditions, (or a few if there is an appropriate multi-column index) to create an initial match set.

I would guess that the query optimizer chooses to use the index c.lft, if it exists. By using this index, we'd get all the rows in table 'c' where c.lft is BETWEEN 4 AND 5, as specified. (the initial results set)

Eventually all the WHERE (and HAVING) conditions will be satisfied, but the rest will require examining the actual rows that the initial result set determined as possible matches. At some point, in a multi table query, the rows in the tables must be joined. To do this, the join conditions must be satisfied between the two tables. The rows in the initial result set (one table), will try to join with rows in a second table as specified.

For your query, the only thing - to get all the proper rows (not necessarily in your order) is to join the rows from 'c' in the initial result set with rows from 'p'. They are joined via p.cat = c.id
From the c rows, all the c.id's are available. If there is an index on p.cat,
the proper rows cans be looked up through the index, if not, the table will need to be scanned for each c.id to join with p.cat

If there were other where conditions, such as 'p.blah = 7', each joined row would need to be examined to determine if the value of p.blah is appropriate or not.

So, to summarize, to get a reduced match set, an index may be used. For each table join, an index may be used.

To apply other where clauses, etc, each row must be examined.

Hope this helps, its long and off the cuff, and if its wrong, please come shoot me.

regards,
sean peters
[EMAIL PROTECTED]


On Friday 16 April 2004 13:51, Steven Ducat wrote:


I have 2 tables and 1 query. The problem is when I implement ORDER BY
p.date DESC it hits the wall.

I understand that MySQL is not the best at ORDER BY DESC so I am after
some tips on possible workarounds to avoid using ORDER BY DESC.
The site will list classifieds ads so I need to display them from newest
to oldest using a timestamp.

INDEXES
Also as I have been playing with indexes for so long now still trying to
understand them. I still have mixed signals to the way they work. When
MySQL performs a query can it only use 1 index at a time or can it use
several individual indexes. Advice I have been given is to place a
seperate index on each column.
If I was to perform the following query:
EXPLAIN SELECT p.* FROM p_cat c, p_ad p WHERE p.cat = c.id AND c.lft
BETWEEN 4 AND 5 ORDER BY p.date DESC LIMIT 0,30;
then I would expect it to use the cat_date index on table p_ad as it can
only use 1 index but if I am to believe others I should place a seperate
index each on p_ad.cat and p_ad.date and it could use both in the same
query. I look forward to some facts on this issue as I cant seem to
catch on.

Sorry to go on a bit but this is doing my head in.

I look forward to your help..

Cheers

Steven.....





2 Tables

CREATE TABLE `p_ad` (
`id` int(11) NOT NULL auto_increment,
`cat` mediumint(9) NOT NULL default '0',
`title` varchar(50) default NULL,
`description` text,
`location` varchar(50) default NULL,
`pcode` varchar(8) default NULL,
`pcode_id` smallint(4) default NULL,
`ph` varchar(50) default NULL,
`email` varchar(50) default NULL,
`user_id` int(11) NOT NULL default '0',
`date` timestamp(14) NOT NULL,
`price` decimal(10,2) default NULL,
`email_priv` tinyint(1) default '0',
PRIMARY KEY  (`id`),
KEY `cat_pc_date` (`cat`,`pcode_id`,`date`),
KEY `c_p_d` (`cat`,`pcode`,`date`),
KEY `user` (`user_id`),
KEY `cat_date` (`cat`,`date`)
) TYPE=MyISAM;

CREATE TABLE `p_cat` (
`id` mediumint(9) NOT NULL auto_increment,
`name` varchar(50) NOT NULL default '',
`parent` mediumint(11) default '0',
`lft` mediumint(11) NOT NULL default '0',
`rgt` mediumint(11) NOT NULL default '0',
PRIMARY KEY  (`id`),
KEY `LFT` (`lft`),
KEY `PARENT` (`parent`)
) TYPE=MyISAM;

Query as follows:

EXPLAIN SELECT p.* FROM p_cat c, p_ad p WHERE p.cat = c.id AND c.lft
BETWEEN 4 AND 5 ORDER BY p.date DESC LIMIT 0,30;
+-------+-------+----------------------------+------+---------+------+-----
--+---------------------------------+

| table | type | possible_keys | key | key_len | ref |

rows  | Extra                           |
+-------+-------+----------------------------+------+---------+------+-----
--+---------------------------------+

| p | ALL | cat_pc_date,c_p_d,cat_date | NULL | NULL | NULL |

60002 | Using temporary; Using filesort |

| c | range | PRIMARY,LFT | LFT |

3 | NULL | 1 | Using where |
+-------+-------+----------------------------+------+---------+------+-----
--+---------------------------------+


Reply via email to