Hello.

I am playing around with MySQL 3.23.33 (on a Linux Redhat 6.2 system)
and wondered about the following:

shell> time mysql yasg -e "SELECT eid,parent FROM entry_busy1 WHERE gid=413456 AND 
deleted != 'true' AND offsite = 'no' ORDER BY gid, offsite, deleted, parent, eid" > 
/dev/null
0.130u 0.120s 0:00.56 44.6%     0+0k 0+0io 154pf+0w

shell> > time mysql yasg -e "SELECT eid,parent FROM entry_busy1 WHERE gid=413456 AND 
deleted != 'true' AND offsite = 'no' ORDER BY parent, eid" > /dev/null
0.130u 0.120s 0:01.28 19.5%     0+0k 0+0io 154pf+0w

You see? The second version takes about 0.7 seconds longer although
the same index is used (see below). I redirect the output, because you
don't really want to see the 31029 selected rows. ;-)

All the query data fits into cache and the above are the results after
running the query several times, so no disks accesses are involved
here. I verfied this with vmstat.

The reason for the difference seems to be, that the optimizer chooses
to do a filesort in the second case. Everything else looks the same to
me (after tweaking the query in a way, that only the problem was
left... my original query looks different):

mysql> EXPLAIN SELECT eid,parent FROM entry_busy1 WHERE gid=413456 AND deleted != 
'true' AND offsite = 'no' ORDER BY gid, offsite, deleted, parent, eid \G
*************************** 1. row ***************************
        table: entry_busy1
         type: ref
possible_keys: PRIMARY,list
          key: list
      key_len: 5
          ref: const,const
         rows: 23208
        Extra: where used; Using index
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT eid,parent FROM entry_busy1 WHERE gid=413456 AND deleted != 
'true' AND offsite = 'no' ORDER BY parent, eid \G
*************************** 1. row ***************************
        table: entry_busy1
         type: ref
possible_keys: PRIMARY,list
          key: list
      key_len: 5
          ref: const,const
         rows: 23208
        Extra: where used; Using index; Using filesort
1 row in set (0.00 sec)

The used (unique) index list is (somewhat packed to fit into 80 chars): 
+----------+-----+-------------+-------------+----------+--------+
| Key_name | Seq | Column_name | Cardinality | Sub_part | Packed |
+----------+-----+-------------+-------------+----------+--------+
| list     |   1 | gid         |         243 |     NULL | NULL   |
| list     |   2 | offsite     |         424 |     NULL | NULL   |
| list     |   3 | deleted     |         650 |     NULL | NULL   |
| list     |   4 | parent      |      448638 |     NULL | NULL   |
| list     |   5 | eid         |      448638 |     NULL | NULL   |
+----------+-----+-------------+-------------+----------+--------+

Now, the interesting part is that the rows are already in sorted order
(even if I leave away the ORDER BY clause) due to index usage and the
fact, that the chosen rows have only one value for each, gid, deleted
and offsite rows. (to be sure, I verfied this by redirecting the
output and using diff).

In other words: The filesort has to do nothing (except assuring that
the order is already correct). So why does that take 0.7 seconds for
only about 23000 rows? Considering the fact that about half the time
is spend by the client receiving the result, this means that more than
75% of the query time is spend in sorting the result?!

I am well aware of the fact, that there are sorting algorithms which
don't benefit from an already sorted data set. But it makes neither
sense that MySQL needs 0.7 seconds to sort about 30.000 integer pairs,
nor that MySQL would use such a sorting algorithm (because often rows
are read in partially pre-sorted order). Even if MySQL would use such
an algorithm due to its sorting speed or memory performance, it IMHO
shouldn't - as I said - need so long.


Well, for now, I will just rewrite my queries to specify the full
ORDER BY clause. But this is no reasonable solution, because the
constraints will change (and therefore deleted won't give back only
one value in the future).

Some other tests showed that the time filesort needs depends on the
number of rows selected, i.e. it is appropriately faster for only 1000
rows.

I am thankful for any comment.


Some additional notes (I wrote this mail already 10 days before, but
it didn't make it to the list due to technical problems):

I meanwhile got my hands on MySQL 3.23.38 and verified that the
behaviour is the same. The time needed differs only marginally.

Moving sorting from MySQL to my application (i.e. using the faster of
the two queries above, copying the data and doing a sort using C++ STL
sort() algorithm on the copy), increased the overall speed by 0.4
seconds. Note that this was not optimized for speed yet (except the
STL sort, probably :-).

Bye,

        Benjamin.





---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to