Janusz Krzysztofik wrote:
> ...
> I am trying to optimize MySQL (3.23.49 from Debian stable) setup for
> ASPseek application. I decided to try InnoDB in order to be able
> to update tables while performing time consuming selects.
> After converting all tables to InnoDB I noticed a big difference
> in processing speed of one of the SELECT queries performed by the application.
> 
> Query:
> select url_id from urlword where deleted=0 and status=200 and origin=1
> 
> MyISAM:
> Query OK, 14274315 rows affected (4 min 54.88 sec)
> 
> InnoDB:
> Query OK, 14274315 rows affected (1 day 8 hours 46 min 46.70 sec)

Mechain Marc wrote:
> ...
> Have you done an EXPLAIN on your query ?
> 
> May be an index on (origin,status,deleted) could help.

Janusz Krzysztofik wrote:
> ...
> mysql> explain select url_id from urlword where deleted=0 and status=200 and 
> origin=1;
> 
> MyISAM (fast):
> +---------+------+---------------+------+---------+------+----------+------------+
> | table   | type | possible_keys | key  | key_len | ref  | rows     | Extra      |
> +---------+------+---------------+------+---------+------+----------+------------+
> | urlword | ALL  | crc           | NULL |    NULL | NULL | 46648925 | where used |
> +---------+------+---------------+------+---------+------+----------+------------+
> 
> InnoDB (very slow):
> +---------+------+---------------+------+---------+-------+----------+------------+
> | table   | type | possible_keys | key  | key_len | ref   | rows     | Extra      |
> +---------+------+---------------+------+---------+-------+----------+------------+
> | urlword | ref  | crc           | crc  |       5 | const | 16951116 | where used |
> +---------+------+---------------+------+---------+-------+----------+------------+
> 
> So I retried InnoDB with "ignore index (crc)" and got:
> Query OK, 14274315 rows affected (5 min 43.23 sec)
> 
> Next I found that this issue is not related to InnoDB. On my second server
> with almost the same data in MyISAM tables the query also uses the index
> and lasts forever.
> 
> Now the question is: how should I set up (and maintain?) my MySQL server
> to prevent it from using indexes inefficiently?

Mechain Marc wrote:
> ...
> Quite a good question, no idea.
> 
> But if the query
> select url_id from urlword where deleted=?? and status=??? and origin=?;
> is a query that you will use very often, I still think that creating an index on 
> (origin,status,deleted)
> is the good answer.

gerald_clark wrote:
> ...
> Try adding an index on (status,deleted)

Martijn Tonies wrote:
> ...
> > > I wonder: how many possible different values would such an index
> > > return?
> >
> > mysql> select distinct status, deleted from urlword;
> > ...
> > 13 rows in set (6 min 55.94 sec)
> >
> > mysql> select distinct status, deleted, origin from urlword;
> > ...
> > 23 rows in set (7 min 9.90 sec)
> >
> > > If this is a (very) low value, won't the index make things
> > > slower (if it's being used) compared to a full table scan?
> >
> > I guess these values are very low for a table with 46 milion records,
> > so I understand it is better not to use such indicies, right?
> 
> Well, that's what I have understood from many db engines :-)
> ...
> The idea is, that fetching rows in index order takes make time then fetching
> rows in storage order.

Let's try Marc's idea first:

   mysql> create index x1 on urlword (deleted,status,origin);
   Query OK, 46648925 rows affected (3 hours 56 min 36.17 sec)

Oops, Marc said "(origin,status,deleted)", not
"(deleted,status,origin)", but we'll see.

   mysql> explain select url_id from urlword where deleted=0 and
status=200 and origin=1;
  
+---------+------+---------------+------+---------+-------+----------+------------+
   | table   | type | possible_keys | key  | key_len | ref   | rows    
| Extra      |
  
+---------+------+---------------+------+---------+-------+----------+------------+
   | urlword | ref  | crc,x1        | crc  |       5 | const | 16934432
| where used |
  
+---------+------+---------------+------+---------+-------+----------+------------+
   1 row in set (0.27 sec)             ^
                                       |
Why crc again, not x1? ----------------+

   mysql> show index from urlword;
  
+---------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+---------+
   | Table   | Non_unique | Key_name        | Seq_in_index |
Column_name     | Collation | Cardinality | Sub_part | Packed | Comment
|
  
+---------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+---------+
   | urlword |          0 | PRIMARY         |            1 |
url_id          | A         |    46411809 |     NULL | NULL   |        
|
   | urlword |          0 | url             |            1 |
url             | A         |    46411809 |     NULL | NULL   |        
|
   | urlword |          1 | next_index_time |            1 |
next_index_time | A         |    15470603 |     NULL | NULL   |        
|
   | urlword |          1 | hops            |            1 |
hops            | A         |          17 |     NULL | NULL   |        
|
   | urlword |          1 | hops            |            2 |
next_index_time | A         |    46411809 |     NULL | NULL   |        
|
   | urlword |          1 | crc             |            1 |
origin          | A         |          17 |     NULL | NULL   |        
|
   | urlword |          1 | crc             |            2 |
crc             | A         |    46411809 |     NULL | NULL   |        
|
   | urlword |          1 | x1              |            1 |
deleted         | A         |          17 |     NULL | NULL   |        
|
   | urlword |          1 | x1              |            2 |
status          | A         |          17 |     NULL | NULL   |        
|
   | urlword |          1 | x1              |            3 |
origin          | A         |          17 |     NULL | NULL   |        
|
  
+---------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+---------+
   10 rows in set (2.16
sec)                                                                    
^
                                                                                       
         
|
Because of high cardinality of key crc on column crc? Is the optimizer
behaviour correct? -------+
Let's check what happens if I try to force use the new index x1:

   mysql> explain select url_id from urlword use index (x1) where
deleted=0 and status=200 and origin=1;
  
+---------+------+---------------+------+---------+-------------------+---------+-------------------------+
   | table   | type | possible_keys | key  | key_len | ref              
| rows    | Extra                   |
  
+---------+------+---------------+------+---------+-------------------+---------+-------------------------+
   | urlword | ref  | crc,x1        | x1   |      10 | const,const,const
| 9400212 | where used; Using index |
  
+---------+------+---------------+------+---------+-------------------+---------+-------------------------+
   1 row in set (0.35
sec)                                                    ^
                                                                             
|
Hm, the estimated row count is lower than using crc index.
-------------------+
So again, why does the optimizer insist on using the crc index despite
its poor selectivity?

Nevertheless, let's check if using the new index is efficient or not.

   mysql> select url_id from urlword use index (x1) where deleted=0 and
status=200 and origin=1 into outfile '/tmp/select.out';
   Query OK, 14274315 rows affected (1 min 32.08 sec)
                                     ---------------  not 32 hours, nor
even 5 minutes.

So, Marc and Gerald were right, creating additional index for use in
this query was the good answer.
And thanks to Martijn I had to learn why Marc's advice was better.
Now I know: the query does not have to refer the table, only the index.
Thanks for your time, guys.

Adding this index could be my option, even at the cost of some slowdown
of inserts/updates,
but only if I wouldn't have to patch and recompile every new version of
ASPseek package
or persuade the authors to use non-standard SQL extension (use index) in
their sources.

So, back to the optimizer problem.
Reverting the columns order in the additional index changed nothing.
I assumed creating the index proposed by Gerald would either not help
the optimizer.
So I changed the columns order in the crc index from (origin,crc) to
(crc,origin).
Now I get:

   mysql> show index from urlword;
  
+---------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+---------+
   | Table   | Non_unique | Key_name        | Seq_in_index |
Column_name     | Collation | Cardinality | Sub_part | Packed | Comment
|
  
+---------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+---------+
   | urlword |          0 | PRIMARY         |            1 |
url_id          | A         |    44864749 |     NULL | NULL   |        
|
   | urlword |          0 | url             |            1 |
url             | A         |    44864749 |     NULL | NULL   |        
|
   | urlword |          1 | next_index_time |            1 |
next_index_time | A         |     8972949 |     NULL | NULL   |        
|
   | urlword |          1 | hops            |            1 |
hops            | A         |          18 |     NULL | NULL   |        
|
   | urlword |          1 | hops            |            2 |
next_index_time | A         |    14954916 |     NULL | NULL   |        
|
   | urlword |          1 | crc             |            1 |
crc             | A         |    44864749 |     NULL | NULL   |        
|
   | urlword |          1 | crc             |            2 |
origin          | A         |    44864749 |     NULL | NULL   |        
|
  
+---------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+---------+
   7 rows in set (1.04 sec)

   mysql> explain select url_id from urlword where deleted=0 and
status=200 and origin=1;
  
+---------+------+---------------+------+---------+------+----------+------------+
   | table   | type | possible_keys | key  | key_len | ref  | rows     |
Extra      |
  
+---------+------+---------------+------+---------+------+----------+------------+
   | urlword | ALL  | NULL          | NULL |    NULL | NULL | 43628280 |
where used |
  
+---------+------+---------------+------+---------+------+----------+------------+
   1 row in set (0.04 sec)

   mysql> select url_id from urlword where deleted=0 and status=200 and
origin=1 into outfile '/tmp/select.out';
   Query OK, 14274315 rows affected (5 min 5.61 sec)

I am quite satisfied with this result, and now I find things rather
obvious.
However, regarding the optimizer behaviour, I would not call it a
solution, but a workaround.

Now I am going to ask ASPseek developers if reverting the columns order
in the crc index
could introduce any problems.

Sorry for the length of this post.

Regards,
Janusz

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to