OR isn't optimised

Taras D Wed, 07 Jun 2006 22:26:36 -0700

Hi everyone,

I have the following schema:


create table l
(
 aId int unsigned not null auto_increment primary key,
 sId smallint unsigned not null,
 dId smallint unsigned,
 index(sId,dId)
);

create table d
(
 sId smallint unsigned not null,
 dId smallint unsigned not null,
 primary key(sId, dId)
);

The d table has 10 000 entries, the l table has 100 entries.

The following query, which consist of an OR. Each part of the OR
specifies exactly one row in the d table.

explain select straight_join * from l,d where (l.sId = d.dId and d.sId
= 1) or (l.sId = d.dId and d.sId = 2);

+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len
| ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | l     | index | sId           | sId     | 5
| NULL |  100 | Using index              |
|  1 | SIMPLE      | d     | range | PRIMARY       | PRIMARY | 2
| NULL |  200 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+

So why is only half of the index being used to reduce the d table? I
would think if it used both indices, at most two matching rows would
be found, reducing the  number of rows that have to be inspected from
20 000 to 100. In fact, UNIONing the two queries shows just this:

select straight_join * from l,d where (l.sId = d.dId and d.sId = 1)
union select straight_join * from l,d where (l.sId = d.dId and d.sId =
2);
+----+--------------+------------+--------+---------------+---------+---------+----------------------------------------+------+-------------+
| id | select_type  | table      | type   | possible_keys | key     |
key_len | ref                                    | rows | Extra
|
+----+--------------+------------+--------+---------------+---------+---------+----------------------------------------+------+-------------+
|  1 | PRIMARY      | l          | index  | sId           | sId     |
5       | NULL                                   |  100 | Using index
|
|  1 | PRIMARY      | d          | eq_ref | PRIMARY       | PRIMARY |
4       | const,radius_searching_test_case.l.sId |    1 | Using index
|
|  2 | UNION        | l          | index  | sId           | sId     |
5       | NULL                                   |  100 | Using index
|
|  2 | UNION        | d          | eq_ref | PRIMARY       | PRIMARY |
4       | const,radius_searching_test_case.l.sId |    1 | Using index
|
|    | UNION RESULT | <union1,2> | ALL    | NULL          | NULL    |
NULL    | NULL                                   | NULL |
|
+----+--------------+------------+--------+---------------+---------+---------+----------------------------------------+------+-------------+

So why isn't the optimiser picking up that it can use both parts of
the keys and significantly reduce the number of rows it must inspect?
I have tried inserting 'force index(primary) after 'from l, d', but
that doesn't help. A work around *could* be to just union the results,
but in the real case the 'l' table can have upto 100 000 entries, and
there can be upto 16 independant ORs, which means that I would be
looking at 100 000 * 16 = 1.6 million rows (using UNIONs), instead of
100 000 rows (if I could get this OR stuff to work). (PHP code to
insert data into the tables is appended)

Does anyone know what's happening?

Thanks,
Taras

php code
---------------------------------------------------------------------------------------------------------------------
<?php

mysql_pconnect('localhost','root',xxxxxxxxxxx);
mysql_select_db('radius_searching_test_case');
$INSERT_AT_A_TIME = 1000;

// do d table first
$done = 0;
$query = 'INSERT INTO d VALUES';
for($i = 1; $i <= 100; $i++)
{
 for($j = 1; $j <= 100; $j++)
 {
   $query .= "($i,$j)";
   $done++;
   if($done >= $INSERT_AT_A_TIME)
   {
     mysql_query($query) or die(mysql_error().'::::'.$query);
     echo "Done another 1000\n\r";
     $query = 'INSERT INTO d VALUES';
     $done = 0;
   }
   else
   {
     $query .= ', ';
   }
 }
}

// now do the l table
for($j = 1; $j <= 100; $j++)
{
 $s = mt_rand(1,100);
 $d = mt_rand(1,100);
 $query = "INSERT INTO l VALUES(NULL,$s,$d)";
 mysql_query($query);
}

echo 'FINISHED';

?>

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

OR isn't optimised

Reply via email to