Hi!
Andrey is right, if I understood correctly. MySQL should notice that
the index with two columns (color, randomkey) is the best because we
have an equality condition on the first column color (= "black")
and an inequality condition on the second column randomkey (> 300000).
Then MySQL can start the search with a key value
("black", 300000)
from the index and step forward in the index until the color value
changes from "black" to something else.
To estimate the number of rows in the result set, MySQL should
estimate how many index records fall in the range
("black", 300000) ... ("black", MAXINT)
Thus it is an issue of optimizing also inequality conditions into
the search parameters when that is possible. It does not happen very often
that inequalities are useful but in this example it would be.
If we only would have indexes on single columns color and randomkey,
then there is not too much help from optimization. IBM's DB2 in these
cases performs a 'hybrid join' which means that it does first
a search on the first one-column index, then sorts the row pointers,
then repeats the same for the second one-column index, and then
compares which row pointers are in both sets.
Regards,
Heikki
>Hi!
>>>>> "Andrey" == Andrey Gubarev <[EMAIL PROTECTED]> writes:
>Andrey> Here is a very easily reproduceable problem with mysql indices:
>Andrey> Suppose we have two columns, color and randomkey. color is
enum("black",
>Andrey> "red", "blue"), black being 95% of all color values, and randomkey
is an
>Andrey> integer between 1 and 1000000. We also have two indices, one on
randomkey,
>Andrey> and one on (color, randomkey).
>Andrey> The query
>Andrey> select * from my_table where color = "black" and randomkey > 300000
>Andrey> limit 10
>Andrey> looks simple enough. One would expect mysql to use either the randomkey
>Andrey> index (because the vast majority of rows are black anyways), or the
color-
>Andrey> randomkey index.Andrey> In fact, the query above takes minutes.
>Andrey> Mysql uses only the first part of the color-randomkey index (i.e.
color)
>Andrey> and examines 300000 rows before it finds randomkey > 300000
(assuming the
>Andrey> table has one million rows). Why? Wouldn't it make more sense to
use all
>Andrey> available indices when LIMIT is used?
>
>An index is bascily just a sorted list of keys and pointers to rows.
>You can't search on two of these at the same time.
>MySQL works the following way to solve the above query:
>- Check how many rows satisifes the condition: colur="black"
>- Check how many rows satisifes the condition: randomkey > 0
>- Use the key that is smaller.If you compare this to a telephone book:
>Assuming you have two telephone books, one where all entries are
>ordered by first name and a second where all names are ordered withlast name.
>How would you use the fact that you have two books when you are
>searching after 'John Smith' and there is one million Johns and one
>million Smiths? Note that under 'John' the last name comes in random
>order and under Smith all the first names comes in random order.
>Regards,
>Monty
---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)
To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php