Re: How to optimize a slow query?

Jia Chen Sun, 06 Sep 2009 11:51:50 -0700

Hi Mike,

Thanks for your detailed answer. Now, I understand what you mean. And,yes, I agree with you that keeping all data in one table works betterfor a bunch of 1:1 relationship tables. Actually, this is what I wastrying to do with that query.

Since you mention "They all had a 1:1 relationship and occasionally someof the tables did not have a corresponding row." and "From then on I'vemerged all 8 tables into one and if any of the subordinate table dataisn't available for a row, its columns are set to NULL", I do want toask you about how you set the columns to NULL for rows in subordinatetable data unavailable because I have similar situation.

If I want to combine two tables into one, I think that a full outer joincan achieve what you did. However, MySQL does not have full outer join.So, I am using


create table rmpdata1
(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
          coalesce(ri.ndate,mv.ndate) as cndate
from
       RItime as ri left outer join MVtime as mv
on
       (ri.code=mv.code and ri.ndate=mv.ndate))
union
(select ri.*, mv.MV, coalesce(ri.code,mv.code) as ccode,
          coalesce(ri.ndate,mv.ndate) as cndate
from
       RItime as ri right outer join MVtime as mv
on
       (ri.code=mv.code and ri.ndate=mv.ndate));

This query takes more than twice as much time as the query in my firste-mail. Do you have a better way? Thanks.


Best,
Jia

mos wrote:

Jia,
Yes, it is a 1:1 relationship between table RItime and MVtime.However, I don't get your suggestion, "I'd recommend joining the twotables into 1 table so you don't have to join them in the firstplace." Could you elaborate that?
Sure but first I have to relate it to my own experience. I had 8tables of around 25 million rows each. They all had a 1:1 relationshipand occasionally some of the tables did not have a corresponding row.I felt it was better from a design point of view to have 8 differenttables and do the joins on the tables that I needed for each of myparticular queries. I'd be joining anywhere from 2 to 5 or 6 or evenall 8 tables at a time, using a where clause to select 15k rows at atime. This is the way to do it from a normalized point of view. All ofthe information is in its respective table and only assemble thetables for each particular query.
Well, this was sloooowww! A heck of a lot of work was done to join thetables together on a 2 column key (like yours). I also had to runmaintenance on the tables to see which tables where corrupted or weremissing rows that should have been there. The tables also repeatedcolumns from the other tables like date and product_id that is used tohelp identify each row. Well to make a long story short, it was fartoo much effort to juggle the relationships between all of these tables.
Then a colleague made the monumental announcement by saying "I'venever found the need to use more than 1 table when there was a 1:1relationship. There is a tremendous speed cost involved in piecing thedata back together. I put all of the data into 1 table". So the lightwent on for me. From then on I've merged all 8 tables into one and ifany of the subordinate table data isn't available for a row, itscolumns are set to NULL, which is the values they would have hadanyway after a left join.
I am perfectly happy know with one wide table with over 100 columns.Everything is in its place and maintenance is a dream. Queries arealso quite fast because all of the information is under one table andnot 8. I don't have to worry about optimizing the indexes for thetable joins because there aren't any joins between these tablesbecause it is all in 1 row.
So you really have to ask yourself, why spend 10 minutes each timeyour query is run? Instead you eliminate the query altogether bykeeping the data of the 2 tables into 1 table in the first place.
Mike



At 09:45 AM 9/6/2009, Jia Chen wrote:
Thanks for your reply, Mike.
Yes, 13419851 rows were added to rmpdata1. However, 10 minutes seemto be too long. I run the same join by using SQL procedure in astatistical software called SAS on a similar machine. It only takes 1minute and 3 seconds.
Yes, it is a 1:1 relationship between table RItime and MVtime.However, I don't get your suggestion, "I'd recommend joining the twotables into 1 table so you don't have to join them in the firstplace." Could you elaborate that?
I was using B-tree index. Switching to HASH does help. Now, it takes4 min 50.17 sec to run the query.
I also turn on profiling by using
mysql> set profiling = 1;
Query OK, 0 rows affected (0.01 sec)

After the query finishes, I get
mysql> show profile;
+----------------------+------------+
| Status               | Duration   |
+----------------------+------------+
| starting             |   0.000123 |
| checking permissions |   0.000010 |
| Opening tables       |   0.000044 |
| System lock          |   0.000007 |
| Table lock           |   0.000011 |
| init                 |   0.000083 |
| creating table       |   0.003428 |
| After create         |   0.000124 |
| System lock          |   0.000004 |
| Table lock           |   0.000051 |
| optimizing           |   0.000007 |
| statistics           |   0.000033 |
| preparing            |   0.000020 |
| executing            |   0.000004 |
| Sending data         | 290.153530 |
| end                  |   0.000008 |
| end                  |   0.000004 |
| query end            |   0.000003 |
| freeing items        |   0.000010 |
| closing tables       |   0.000025 |
| logging slow query   |   0.000001 |
| logging slow query   |   0.013429 |
| cleaning up          |   0.000004 |
+----------------------+------------+
23 rows in set (0.02 sec)
MySQL spends most of its time sending data. According tohttp://dev.mysql.com/doc/refman/5.0/en/general-thread-states.html,sending data means that "the thread is processing rows for a |SELECT|<http://dev.mysql.com/doc/refman/5.0/en/select.html> statement andalso is sending data to the client." Is there more room to optimizethis query? Thanks again.
Best,
Jia

mos wrote:
How many rows were added to rmpdata1 table? If it is 13.4 millionrows then it is going to take several minutes to join this many rowsfrom the 2 tables.Is there a 1:1 relationship between the two tables or a 1:Many? Ifthere is a 1:1 then I'd recommend joining the two tables into 1table so you don't have to join them in the first place.The only other thing I can suggest is to change the type of index onthe tables being joined to see if that makes a speed difference. Forexample, if you are using BTREE then switch to HASH or vice versa.See http://dev.mysql.com/doc/refman/5.0/en/create-index.html formore info.
Mike

At 10:05 AM 9/5/2009, Jia Chen wrote:
Hi there,
One simple query took more than 10 minutes. Here is how relevantrows in the slow query log looks like:
# Time: 090905 10:49:57
# u...@host: root[root] @ localhost []
# Query_time: 649  Lock_time: 0  Rows_sent: 0  Rows_examined: 26758561
use world;
create table rmpdata1 select ri.*,
mv.MV, coalesce(ri.code,mv.code) as ccode,
coalesce(ri.ndate,mv.ndate) as cndate
from RItime as ri left outer join
MVtime as mv
on (ri.code=mv.code and
ri.ndate=mv.ndate);

When I "explain" only the select clause, I get
------------+----------+-------+
| id | select_type | table | type | possible_keys | key |key_len | ref | rows | Extra |+----+-------------+-------+--------+---------------+---------+---------+------------------------------+----------+-------+
|  1 | SIMPLE      | ri    | ALL    | NULL          | NULL    | NULL
| NULL                         | 13419851 |       |
|  1 | SIMPLE      | mv    | eq_ref | PRIMARY       | PRIMARY | 11
| world.ri.code,world.ri.ndate |        1 |       |
+----+-------------+-------+--------+---------------+---------+---------+------------------------------+----------+-------+
2 rows in set (0.00 sec)
I use "show table status from world;" to get information about twotables, RItime and MVtime, in the join clause:
          Name: RItime
        Engine: MyISAM
       Version: 10
    Row_format: Dynamic
          Rows: 13419851
Avg_row_length: 31
   Data_length: 427721848
Max_data_length: 281474976710655
  Index_length: 347497472
     Data_free: 0
Auto_increment: NULL
   Create_time: 2009-09-03 10:17:57
   Update_time: 2009-09-03 12:04:02
    Check_time: NULL
     Collation: latin1_swedish_ci
      Checksum: NULL
Create_options:
       Comment:
*************************** 2. row ***************************
          Name: MVtime
        Engine: MyISAM
       Version: 10
    Row_format: Dynamic
          Rows: 13562373
Avg_row_length: 31
   Data_length: 430220056
Max_data_length: 281474976710655
  Index_length: 350996480
     Data_free: 0
Auto_increment: NULL
   Create_time: 2009-09-03 13:31:33
   Update_time: 2009-09-03 13:43:51
    Check_time: NULL
     Collation: latin1_swedish_ci
      Checksum: NULL
Create_options:
       Comment:

I also describe these two tables:
mysql> desc RItime;
+-------+------------+------+-----+------------+-------+
| Field | Type       | Null | Key | Default    | Extra |
+-------+------------+------+-----+------------+-------+
| code  | varchar(6) | NO   | PRI |            |       |
| ndate | date       | NO   | PRI | 0000-00-00 |       |
| ri    | double     | YES  |     | NULL       |       |
| time  | date       | YES  |     | NULL       |       |
| bdate | date       | YES  |     | NULL       |       |
+-------+------------+------+-----+------------+-------+
5 rows in set (0.00 sec)

mysql> desc MVtime;
+-------+------------+------+-----+------------+-------+
| Field | Type       | Null | Key | Default    | Extra |
+-------+------------+------+-----+------------+-------+
| code  | varchar(6) | NO   | PRI |            |       |
| ndate | date       | NO   | PRI | 0000-00-00 |       |
| MV    | double     | YES  |     | NULL       |       |
| time  | date       | YES  |     | NULL       |       |
| bdate | date       | YES  |     | NULL       |       |
+-------+------------+------+-----+------------+-------+
5 rows in set (0.00 sec)

Could you give me some hint on how to improve the speed of this query?
Thanks.

Best,
Jia




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[email protected]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[email protected]

Re: How to optimize a slow query?

Reply via email to